U.S. patent application number 11/216667 was filed with the patent office on 2007-03-01 for method and apparatus for visual background subtraction with one or more preprocessing modules.
This patent application is currently assigned to International Business Machines Corporation. Invention is credited to Jonathan H. Connell.
Application Number | 20070047834 11/216667 |
Document ID | / |
Family ID | 37804168 |
Filed Date | 2007-03-01 |
United States Patent
Application |
20070047834 |
Kind Code |
A1 |
Connell; Jonathan H. |
March 1, 2007 |
Method and apparatus for visual background subtraction with one or
more preprocessing modules
Abstract
Methods and apparatus are provided for visual background
subtraction using one or more preprocessing modules. One or more
effects are detected in a received image signal and one or more
blocks are selectively enabled to preprocess the image signal to
compensate for the detected one or more effects. Visual analysis is
then performed on the preprocessed signal using background
subtraction. A spatially-variant temporal smoothing of the image
signal is also disclosed. The spatially-variant temporal smoothing
can be achieved by the mixing of a new intensity value with a
previous intensity time-average as determined by a weighting
matrix. The mixing can be influenced by a dynamic bias term that is
a real-time estimate of a variance at the pixel, such as a degree
of change, and the weighting can be determined by a relative
stability of an observed value compared to a stability of the
time-average.
Inventors: |
Connell; Jonathan H.;
(Cortlandt Manor, NY) |
Correspondence
Address: |
RYAN, MASON & LEWIS, LLP
1300 POST ROAD
SUITE 205
FAIRFIELD
CT
06824
US
|
Assignee: |
International Business Machines
Corporation
Armonk
NY
|
Family ID: |
37804168 |
Appl. No.: |
11/216667 |
Filed: |
August 31, 2005 |
Current U.S.
Class: |
382/274 |
Current CPC
Class: |
G06T 5/002 20130101;
H04N 1/407 20130101; G06T 7/194 20170101; G06T 2207/10024 20130101;
G06T 7/254 20170101; G06T 2207/20182 20130101; H04N 1/6027
20130101; G06T 5/007 20130101; G06T 5/006 20130101; G06T 5/50
20130101 |
Class at
Publication: |
382/274 |
International
Class: |
G06K 9/40 20060101
G06K009/40 |
Claims
1. A method for processing an image signal, comprising: receiving
an image signal that has been corrupted by one or more effects;
detecting said one or more effects in said received image signal;
selectively enabling one or more blocks to preprocess said image
signal to compensate for said detected one or more effects; and
performing visual analysis on said preprocessed signal using
background subtraction.
2. The method of claim 1, wherein said visual analysis identifies
one or more objects in said preprocessed image signal.
3. The method of claim 1, wherein said one or more blocks performs
a jitter correction on said image signal.
4. The method of claim 1, wherein said one or more blocks performs
a color correction on said image signal.
5. The method of claim 1, wherein said one or more blocks performs
a contrast enhancement on said image signal.
6. The method of claim 1, wherein said one or more blocks performs
a cable-induced visual artifact reduction on said image signal.
7. The method of claim 1, wherein said one or more blocks performs
a spatially-variant temporal smoothing on said image signal.
8. The method of claim 1, wherein said one or more blocks performs
a lens geometry normalization on said image signal.
9. A system for processing an image signal, comprising: a memory;
and at least one processor, coupled to the memory, operative to:
receive an image signal that has been corrupted by one or more
effects; detect said one or more effects in said received image
signal; selectively enable one or more blocks to preprocess said
image signal to compensate for said detected one or more effects;
and perform visual analysis on said preprocessed signal using
background subtraction.
10. The system of claim 9, wherein said visual analysis identifies
one or more objects in said preprocessed image signal.
11. The system of claim 9, wherein said one or more blocks performs
a jitter correction on said image signal.
12. The system of claim 9, wherein said one or more blocks performs
a color correction on said image signal.
13. The system of claim 9, wherein said one or more blocks performs
a contrast enhancement on said image signal.
14. The system of claim 9, wherein said one or more blocks performs
a cable-induced visual artifact reduction on said image signal.
15. The system of claim 9, wherein said one or more blocks performs
a spatially-variant temporal smoothing on said image signal.
16. The system of claim 9, wherein said one or more blocks performs
a lens geometry normalization on said image signal.
17. An article of manufacture for processing an image signal,
comprising a machine readable medium containing one or more
programs which when executed implement the steps of: receiving an
image signal that has been corrupted by one or more effects;
detecting said one or more effects in said received image signal;
selectively enabling one or more blocks to preprocess said image
signal to compensate for said detected one or more effects; and
performing visual analysis on said preprocessed signal using
background subtraction.
18. The article of manufacture of claim 17, wherein said one or
more blocks performs one or more of a jitter correction on said
image signal, a color correction on said image signal, a contrast
enhancement on said image signal, a cable-induced visual artifact
reduction on said image signal, a spatially-variant temporal
smoothing on said image signal, or a lens geometry normalization on
said image signal.
19. A method for processing an image signal, comprising: receiving
an image signal that has been corrupted by one or more effects;
selectively enabling one or more blocks to preprocess said image
signal to compensate for said one or more effects; performing
spatially-variant temporal smoothing to further preprocess said
image signal; and presenting said preprocessed image signal for
visual analysis.
20. The method of claim 19, wherein said visual analysis identifies
one or more objects in said preprocessed image signal.
21. The method of claim 19, wherein said visual analysis uses
background subtraction.
22. The method of claim 19, wherein said visual analysis is
performed by a human watching a video screen.
23. The method of claim 19, wherein said spatially-variant temporal
smoothing is achieved by the mixing of a new intensity value with a
previous intensity time-average as determined by a weighting
matrix.
24. The method of claim 23, wherein said mixing is influenced by a
dynamic bias term that is a real-time estimate of a variance at
said pixel.
25. The method of claim 23, wherein said weighting is determined by
a relative stability of an observed value compared to a stability
of the time-average.
26. The method of claim 23, wherein an amount of said mixing is
based on a degree of change observed at said pixel.
27. The method of claim 23, wherein an amount of said mixing is
reduced as a degree of motion at said pixel increases.
28. The method of claim 19, wherein said spatially-variant temporal
smoothing is achieved by associating one or more independent Kalman
filters with each pixel position.
Description
FIELD OF THE INVENTION
[0001] The present invention relates generally to imaging
processing techniques, and, more particularly, to techniques for
visual background subtraction.
BACKGROUND OF THE INVENTION
[0002] Background subtraction is a popular technology for finding
moving objects in images of an environment. Unfortunately, there
are numerous factors that can adversely impact the efficacy of this
class of techniques. Such disturbances include changes in camera
responses due to automatic gain and color-balance corrections,
image jitter due to vibration or wind, perceptually-masked
artifacts due to video compression or cabling inadequacies, and
varying object size due to lens distortion or imaging angle.
[0003] Some of these problems have simple solutions, but they are
not optimal. While video can be transmitted and recorded in an
uncompressed state, the required bandwidth and disk-storage space
increases costs significantly. Similarly, lens distortions can be
remedied by purchasing better (albeit more expensive) optics.
Although it is possible to correct imaging geometry, this is
difficult to cope with in practice because it involves moving
cameras to optimal viewing locations. Such locations may be
inconvenient (e.g., requiring significantly longer cable runs) or
not feasible (e.g., above the ceiling level).
[0004] The solutions to other problems are not as straightforward.
When the camera shakes due to wind or other vibration, for example,
the current image acquired by the camera will not exactly line-up
with a previously captured reference image. This leads to detection
of image changes (particularly near edges or in textured regions)
that are not due to independent objects. Stabilizing the images
produced by such surveillance cameras eliminates these artificial
detections.
[0005] Stabilization can be accomplished by mechanically moving the
camera in response to inertial measurements, or by altering
portions of the optical path (e.g., sliding prisms) in response to
similar error signals. However these solutions require changing the
cameras that are already installed. Also, these solutions are
typically bulkier than an ordinary fixed camera and hence may be
difficult to install in some locations. Stabilization may also be
performed electronically (as in some camcorders) by shifting the
pixel read positions on a digital image sensor. However, these
pixel shifts are typically integer pixel shifts that are not
accurate enough to remove all the artifacts generated by background
subtraction. Another option is to use image warping based on
optical flow analysis. However, this analysis is mathematically
complicated thus necessitating either a lower video frame rate or a
more expensive computation engine.
[0006] Many cameras have built-in circuitry or algorithms for
automatic gain control (AGC) and automatic white balance (AWB).
These mechanisms typically generate video images that are more
pleasing to the human eye. Unfortunately, these corrections can
impair machine analysis of the images because there are frame to
frame variations that are not due to any true variation in the
imaged environment. Background subtraction is particularly affected
by this phenomenon that can cause large portions of the image to be
falsely declared as foreground. Some cameras allow AGC and AWB to
be disabled, however, this may not be true for all (possibly
legacy) cameras in a video surveillance system. Also, it is
sometimes desired to analyze previously recorded material where the
source camera and its parameters can not be controlled
retroactively. While it is possible to correct exposure and color
balance using techniques such as histogram stretching or contrast
stretching, these whole-image methods can be confused if the
content of the scene changes.
[0007] Furthermore, when using legacy analog video transmission
format RS-170, the color of a pixel is encoded as a phase-shifted
chrominance signal riding on top of the standard amplitude
modulated intensity signal. Unfortunately, when separating these
two signals to reconstruct the image representation, sharp changes
in the intensity signal can be interpreted as color shifts. This
can happen due to inadequate band limiting of the intensity signal
at the source, poor "comb" filtering at the receiver, or nonlinear
dispersion in the transmission medium (typically coax cable). This
aliasing results in strobing color rainbow patterns around sharp
edges. This can be disadvantageous for computer vision systems that
need to know the true colors of regions, or for object detection
and tracking systems based on background subtraction which may
erroneously interpret these color fluctuations as moving
objects.
[0008] The impact of these color artifacts can be diminished by
converting the image to monochrome (i.e., a black and white image)
so that there are no color shifts, only smaller intensity
variations. However, this processing removes potentially valuable
information from the image. For instance, in a surveillance system
it is useful to be able to discern the colors of different
vehicles, something not possible in a gray-scale video. Another
approach is to apply aggressive spatial smoothing to the image so
that the "proper" adjacent colors dominate in the problem areas.
However, this approach is sub-optimal in that the boundaries of
objects (and sometimes even their identities) can be obscured by
such blurring. Still another method would be to attempt to
reconstruct the original two-part analog signal and then employ a
more sophisticated chrominance-luminance separation filter.
Unfortunately, many times video has been subject to a lossy
compression method, such as MPEG (especially if it has been
digitally recorded), in which case the exact details of the
original waveform cannot be recovered with sufficient fidelity to
permit this re-processing.
[0009] A further problem is that video images often contain "noise"
that is annoying to humans and can be even more detrimental to
automated analysis systems. This noise comes primarily from three
sources: imager noise (e.g., pixel variations), channel noise
(e.g., interference in cabling), and compression noise (e.g., MPEG
"mosquitoes"). Effective removal or suppression of these types of
noise leads to more pleasing visuals and more accurate computer
vision systems. One standard method for noise removal is spatial
blurring, which replaces a pixel by a weighted sum of its
neighbors. Unfortunately, this tends to wash out sharp edges and
obscure region textures. Median-based filtering attempts to
preserve sharp edges, but still corrupts texture (which is
interpreted as noise) and leads to artificially "flat" looking
images. Another method, temporal smoothing, uses a weighted sum of
pixels from multiple frames over time. This works well for largely
stationary images, but moving objects often appear ghostly and
leave trails behind.
[0010] Yet another difficulty is that background subtraction
operates by comparing the current image with a reference image and
highlights any pixel changes. Unfortunately, while often the
desired result is the delineation of a number of physical objects,
shadow regions are typically also marked because the scene looks
different here as well. Eliminating or suppressing shadow artifacts
is desirable because it allows better tracking and classification
of a detected object (i.e., its forms varies less over time and
does not depend on lighting conditions).
[0011] One way to eliminate shadows is to first perform basic
background subtraction and then to more closely examine the pixels
flagged as foreground. For example, the hue, saturation, and
intensity can be computed separately for the foreground pixel and
the corresponding background pixel. If the hue and saturation
measures are a close match, the intensities are then examined to
see if they are within a plausible range of variations. If so, the
pixel is declared a shadow artifact and removed from the computed
foreground mask. Unfortunately, this method requires the
computation of hue, which is typically expensive because it
involves trigonometric operators. Moreover, hue is unstable in
regions of low saturation or intensity (e.g., shadows). Finally,
the derived hue is very sensitive to the noise in each color
channel (the more noise, the less reliable the estimate).
[0012] A need therefore exists for improved techniques for visual
background subtraction. A further need exists for methods and
apparatus for visual background subtraction that address each of
the above-identified problems using one or more software
preprocessing modules.
SUMMARY OF THE INVENTION
[0013] Generally, methods and apparatus are provided for visual
background subtraction using one or more preprocessing modules.
According to one aspect of the invention, an image signal that has
undergone previous corruption by one or more effects is processed.
The one or more effects in the received image signal are detected
and one or more blocks are selectively enabled to preprocess the
image signal to compensate for the detected one or more effects.
Thereafter, visual analysis, such as identifying one or more
objects in the preprocessed image signal, is performed on the
preprocessed signal using background subtraction.
[0014] The one or more blocks may selectively perform one or more
of a jitter correction on the image signal, a color correction on
the image signal, a contrast enhancement on the image signal, a
cable-induced visual artifact reduction on the image signal, a
spatially-variant temporal smoothing on the image signal, and a
lens geometry normalization on the image signal.
[0015] According to another aspect of the invention, a
spatially-variant temporal smoothing is performed on the image
signal. Thereafter, the processed image is presented for visual
analysis. The spatially-variant temporal smoothing can be achieved
by the mixing of a new intensity value with a previous intensity
time-average as determined by a weighting matrix. The mixing can be
influenced by a dynamic bias term that is a real-time estimate of a
variance at the pixel. The weighting can be determined by a
relative stability of an observed value compared to a stability of
the time-average and an amount of the mixing is based on a degree
of change observed at the pixel. The spatially-variant temporal
smoothing can be achieved by associating one or more independent
Kalman filters with each pixel position.
[0016] A more complete understanding of the present invention, as
well as further features and advantages of the present invention,
will be obtained by reference to the following detailed description
and drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0017] FIG. 1 is a schematic block diagram of an image correction
system incorporating features of the present invention;
[0018] FIG. 2 is a flow chart describing an exemplary
implementation of a jitter correction (stabilization) method that
may be employed by the jitter correction preprocessor of FIG.
1;
[0019] FIG. 3 is a flow chart describing an exemplary
implementation of a color correction method that may be employed by
the color correction preprocessor of FIG. 1;
[0020] FIG. 4 is a flow chart describing an exemplary
implementation of an NTSC correction process that may be employed
by the NTSC color correction preprocessor of FIG. 1;
[0021] FIG. 5 is a flow chart describing an exemplary
implementation of a temporal smoothing process that may be employed
by the temporal smoothing preprocessor of FIG. 1;
[0022] FIG. 6 is a flow chart describing an exemplary
implementation of a lens normalization process that may be employed
by the lens normalization preprocessor of FIG. 1; and
[0023] FIG. 7 is a flow chart describing an exemplary
implementation of a shadow removal process 700 that may be employed
by the shadow removal preprocessor 600 of FIG. 1.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0024] The present invention provides methods and apparatus for
visual background subtraction with one or more preprocessing
modules. An input video stream is passed through one or more
switchable, reconfigurable image correction units before being sent
on to a background subtraction module or another visual analysis
system. Depending on the environmental conditions, one or more
modules can be selectively switched on or off for various camera
feeds. For instance, an indoor camera generally does not require
wind correction. In addition, for a single camera, various
preprocessors might only be invoked at certain times. For example,
at night, the color response of most cameras is poor in which case
they revert to essentially monochrome images. Thus, during the day,
the signal from this camera might be processed to ameliorate the
effect of chroma filtering (e.g., moving rainbow stripes at sharp
edges) yet this module could be disabled at night.
[0025] The present invention copes with each of the problems
identified above through the addition of software preprocessing
modules that are easy to install and have small incremental costs
(no new hardware is involved). This architecture allows the use of
a straightforward technique of background subtraction, in
conjunction with small, efficient preprocessing engines crafted for
known shortcomings, rather than requiring the use of more elaborate
(and often slower) general-purpose techniques like optical flow
analysis. The present invention recognizes that even if a more
sophisticated object detection technology is used to account for
residual anomalies, algorithmically correcting the degradation
processes which are understood and known to occur typically reduces
the burden on the scene modeling component and can improve the
overall system response time.
[0026] FIG. 1 is a schematic block diagram of an image correction
system 100 incorporating features of the present invention. As
discussed further below, the image correction system 100 performs
visual background subtraction at stage 195, for example, to detect
one or more objects in an image, and employs one or more
preprocessing modules 200, 300, 400, 500, 600, 700, each discussed
below in conjunction with FIGS. 2 through 7, respectively. The
processed image may be obtained, for example, from a remote camera
110, and the images generally have undergone an image compression
120. The compressed image may be received, for example, over a
transmission medium or channel 125, such as a wired or wireless
link.
[0027] As shown in FIG. 1, the settings of each of the
preprocessing modules 200, 300, 400, 500, 600, 700 may optionally
be adjusted by an associated adjustment tap 134, 144, 154, 164,
174, for example, to configure each preprocessor 200, 300, 400,
500, 600, 700 with a custom set of parameters. In addition, in one
implementation, each of the preprocessing modules 200, 300, 400,
00, 600, 700 may be selectively included or excluded from the image
processing path by an associated switch 138, 148, 158, 168, 178.
The parameters and switch settings can be different for different
camera channels, and can vary over time in some scheduled or other
requested manner.
[0028] It is noted an image signal received by the image correction
system 100 may have undergone previous corruption by one or more
effects. The image correction system 100 can optionally initially
detect the one or more effects in the received image signal. This
might be done, for example, by having a human evaluate the image.
In another variation, each preprocessor 200, 300, 400, 500, 600,
700 can be applied to the image to see if one or more of the
preprocessors 200, 300, 400, 500, 600, 700 reduces the number of
objects detected by the final background subtraction system 195.
Since these are presumably false positives, such reductions
indicate that the associated preprocessor should be enabled. Of
course, the system could also use explicit detectors for the one or
more effects. Such detectors are inherent in many of the correction
blocks, i.e., if the relevant effect is absent, no correction is
applied, as would be apparent to a person of ordinary skill in the
art.
[0029] Stabilization Method
[0030] FIG. 2 is a flow chart describing an exemplary
implementation of a jitter correction (stabilization) method 200
that may be employed by the jitter correction preprocessor 200 of
FIG. 1. Generally, the proposed image stabilization subsystem 200
takes the foreground image at some spatial resolution and generates
a number of alternative images based on shifting the image an
integral number of pixels in the horizontal and/or vertical
direction. These alternative images are compared to the reference
image and a matching score is computed for each. Based on this set
of scores, a best guess at a floating point sub-pixel offset is
determined that aligns the current image with the reference image.
This offset may be applied to normalize the current image (by
shifting and interpolation) at the analyzed resolution, or at
either higher or lower spatial resolutions by appropriate linear
scaling of the offset parameters.
[0031] The illustrative embodiment of the stabilization subsystem
200 is strictly software, so it can be employed in legacy cameras
and does not require the installation of new or bulkier hardware.
The stabilization subsystem 200 is more efficient than optical flow
methods, especially since the image can be analyzed at a lower
resolution than standard, and thus requires less computational
resources. Finally, the stabilization subsystem 200 generates
sub-pixel estimates that permit the degree of correction required
by the background subtraction algorithm.
[0032] Each incoming video frame is shifted in various ways and
then compared to a stored reference image. In one preferred
embodiment, the image is converted to monochrome by averaging the
red, green, and blue channels (as is the reference image). Images
are analyzed at their standard resolution, but the comparisons are
only made at a selection of sample sites, typically evenly spaced
to yield several thousand sites (e.g. a sampling unit of every 4th
pixel in the horizontal and vertical direction). This allows fine
scale detail to be used in the estimation procedure, but
significantly reduces the computational demand. Also, because in
many situations there is more pan than tilt, a cross-shaped search
pattern is employed (rather than a full, and slower, search of all
offsets within a set of ranges).
[0033] As shown in FIG. 1, a foreground image at a given spatial
resolution is obtained during step 210. Thereafter, the jitter
correction method 200 generates a plurality of alternative images
during step 220, based on shifting the image an integral number of
pixels in the horizontal and/or vertical direction. The alternative
images are compared to a reference image during step 230, and a
matching score is computed for each alternative image.
[0034] During step 240, a globally best integral offset is
determined that aligns the foreground image with the reference
image. Finally, a floating point sub-pixel offset is computed
during step 250 that is used to generate a better aligned version
of the current image.
[0035] In one exemplary implementation, a series of horizontal
shifts in an exemplary range of .+-.4 pixels (including zero) is
performed and each resulting variation compared with the reference
image (at the same resolution and in monochrome). The comparison
metric is the average absolute difference between corresponding
selected pixel sites. The shift with the best score (least
difference) is chosen and the scores of adjacent shifts (.+-.1
pixel) are graphed and fit with a parabola. The lowest point on
this parabola is then taken as the floating point sub-pixel
horizontal shift. After this, the image is shifted by the best
integer horizontal shift, as determined above, and then subjected
to a series of additional vertical shifts in a range of typically
.+-.2 pixels (including the zero case, that was already scored). As
before, the mean absolute difference between each variation and the
reference image is computed, the best integer shift is selected,
and a floating point sub-pixel vertical estimate is formed by a
parabolic fit of adjacent scores. If an estimated shift is close to
the bounds of the search ranges, it is declared invalid and a shift
of (0.0, 0.0) is reported instead.
[0036] Because the jitter correction method 200 was intended for
use in conjunction with a background subtraction system 195, it is
easy to obtain a mask designating where (in the previous frame)
foreground objects were detected relative to the stored background
image. Sampled pixels that fall under this mask are omitted from
the mean absolute difference calculation. This keeps the estimator
from attempting to track large foreground objects (presuming they
are detected) instead of the background as intended.
[0037] If a valid, significantly small shift is estimated
(typically less than 1/10 pixel) and it has been a long time since
the reference image was initialized (typically 100 frames), then
the background reference image is updated by simply copying the
current frame. If the current image is known to be an invalid
background model in certain regions due to the presence of
foreground objects, a separate background validity image is also
stored corresponding to the current mask. The system 200 then
ignores pixel samples that fall under either this mask, or the
newly supplied mask for each input frame, as explained above.
[0038] The final stabilized image is generated by bi-linear
interpolation of the original image at the best estimated offset.
Since the input and output images are the same size, the mixing
coefficients to generate each pixel from its four nearest neighbors
are always the same. Moreover, since there are only a discrete
number of possibilities for intensity (0 to 255), it is possible to
pre-compute four tables that yield the appropriate scaled responses
for each of the four neighbors. To convert the image, an integer
pixel offset can be added to the read pointer and then the values
of four neighbors are used as indices to the pre-computed tables
and the lookup values summed to produce the desired output pixel
value. The same procedure and tables can be used for each of the
red, green, and blue channels in a color image.
[0039] Color Correction Method
[0040] FIG. 3 is a flow chart describing an exemplary
implementation of a color correction method 300 that may be
employed by the color correction preprocessor 300 of FIG. 1.
Generally, the image correction subsystem 300 operates by
estimating a multiplicative channel gain(s) to account for
differences between the current video frame (image) and a stored
reference frame (image). For a monochrome source, one gain value is
estimated. For a color video, typically three channel gains (red,
green, and blue) are estimated or, alternatively, one overall gain
(as for monochrome) and three differential channel gains (for RGB).
These multiplicative factors are then applied to each pixel in the
current frame to generate an image more similar in overall
appearance to the stored reference frame.
[0041] As shown in FIG. 3, a foreground image is initially obtained
at a spatial resolution during step 310. The foreground image is
then compared to a stored reference frame on a pixel-by-pixel basis
during step 320. The overall multiplicative gain(s) for each color
channel are estimated during step 330 to account for differences
between the foreground image and the stored reference frame.
Finally, the multiplicative factors are applied to each pixel in
the current frame during step 340 to generate a corrected
image.
[0042] In one embodiment, the global gain estimates are computed
using histograms of the individual gain estimates derived for each
pixel position. The peak (mode) of the smoothed histogram is chosen
as the optimal correction factor thereby making the system robust
to moderate scene changes (which give rise to secondary peaks
without moving the primary peak). In an alternative histogram
stretching method, the introduction of such a disturbance would
lead to an inappropriate stretching of the normalization transfer
function over the color region related to the scene change.
[0043] In an exemplary implementation, there is a stored reference
image B(x, y) and the current image V(x, y). Conceptually, for each
pixel position (x, y) a factor f(x, y)=B(x, y)/V(x, y) is computed.
These individual estimates are collected into a histogram H(f) over
a range of possible correction values, where H(f) may be optionally
smoothed by an operation such as averaging of adjacent bins.
Finally, the index f' of the bin in H(f) with the maximum value is
selected as the best gain correction factor. A new image V'(x,
y)=f*V(x, y) is then generated as a result of the process.
[0044] In one preferred embodiment of the color correction system
300, for reasons of speed, only some fraction of the pixels in B
and V are examined. Typically, several thousand pixel sites
(randomly or systematically distributed) are sufficient to generate
a valid gain estimate for images of arbitrarily large sizes. Also,
in the preferred embodiment the gain estimates f'(t) are smoothed
over time with something like a recursive filter,
f''(t)=a*f(t)+(1-a)*f''(t), to account for the slowly varying
nature of typical AGC/AWB circuitry. It is this smoothed value,
f''(t), that is used to correct the pixel intensities.
[0045] Estimates from some of the selected pixel sites can
optionally be disregarded. Since the correction method is used in
conjunction with a background subtraction object finding system,
any pixels corresponding to known foreground objects (as determined
from the previous frame) are omitted from the histogram. Similarly,
pixels with intensities that are either very high (e.g., saturated)
or low (e.g., nearly black) are omitted because these estimates
tend to be noisier than others. If too few pixels remain (in any
one of the channels), the overall gain estimate calculations are
terminated and the gain factor most recently derived is
re-used.
[0046] For color images, three separate channel gains r''(t),
g''(t), b''(t) are generated and maintained in the exemplary color
correction method 300 described above. However, they are reported
as an overall gain y(t) and differential gains dr(t), dg(t), db(t).
The overall gain y(t) is derived by taking the average of the three
channel gains, and then clipping the value to a known valid range
of gains. The individual channel differential gains are then
computed relative to this overall gain (e.g., dr(t)=r''(t)/y(t))
and similarly clipped to a different known valid range of values.
This prevents unreasonable compensation parameters from being used
in the correction phase (i.e., Vr'(x, y, t)=y(t)*dr(t)*V(x, y,
t)).
[0047] For use with background subtraction, the gains are also used
to alter a reference background image which will be compared with
the newly corrected video frame. In this operation, the value of a
pixel in some channel is limited to be less than or equal to 255
(the maximum pixel value) times the gain for that channel. The
rationale for this is that, for a gain <1.0, this value is the
largest value that could be generated for that color channel in the
newly corrected frame. This prevents the system from flagging
differences at pixels that are bright in the current image but
which could not be properly down corrected (since their true value
was unknown).
[0048] NTSC Artifact Reduction Method
[0049] FIG. 4 is a flow chart describing an exemplary
implementation of an NTSC correction process 400 that may be
employed by the NTSC color correction preprocessor 400 of FIG. 1.
Generally, the NTSC correction subsystem 400 suppresses color
information around problematic edges. The NTSC correction subsystem
400 initially directly finds sharp vertical transitions in the
corrupted image, then generates a soft mask around these areas that
is used to gradually blend in a monochrome version of the image.
This method 400 allows color information to be retained for the
bulk of the image while simultaneously minimizing the effect of
rainbow artifacts. As no spatially averaging is involved, the
resulting image retains the full resolution of the original.
Moreover, the method works equally well on native or compressed
video streams.
[0050] As shown in FIG. 4, the NTSC correction process 400
initially prepares a monochrome version of the corrupted image
during step 410. Thereafter, sharp vertical intensity transitions
are identified in the corrupted image during step 420. A soft mask
is generated around these areas during step 430, and then the soft
mask is used during step 440 to gradually blend in the monochrome
version of image with the corrupted image to generate a corrected
image.
[0051] In one preferred embodiment, step 410 is performed by
averaging the red, green, and blue color channels at each pixel.
Step 420 can be done by applying horizontal and vertical Sobel
masks and analyzing the convolution responses. Various edge pixels
are then selected based on their directions to form an intermediate
mask image. In one preferred embodiment, edge pixels with
orientations of +45 to +135 degrees or +225 to +315 degrees have
their magnitudes multiplied by a factor of 10 and gated to the
intermediate image (so a true edge magnitude of 25.5 or higher will
yield a fully white, 255, mask pixel). All other pixels are set to
zero.
[0052] The intermediate mask image is then processed by a local
averaging operator (essentially a convolution with a uniform-value
rectangular block mask). This spreads the influence of the detected
edge to adjacent pixels and also "feathers" the edges of the mask.
In one preferred embodiment, the local averaging occurs over a
5.times.3 pixel area centered on the original pixel and multiplies
the resulting value by a factor of 3.
[0053] In the final step 440, the blurred mask (M) is used to mix
together pixels of the original image (I) with pixels from a
monochrome version (G) of the image. This monochrome image can be
the same as the one used for edge finding, but conceptually it
could be formed by some different process (e.g., an unevenly
weighted combination of red, green, and blue values). I'c(x,
y)=[1-(M(x, y)/255)]*Ic(x, y)+[M(x, y)/255]*G(x, y)
[0054] This is the final output (I') of the process 400, where c is
a color channel, such as the red component of some pixel.
[0055] Temporal Smoothing Method
[0056] FIG. 5 is a flow chart describing an exemplary
implementation of a temporal smoothing process 500 that may be
employed by the temporal smoothing preprocessor 500 of FIG. 1.
Generally, the temporal smoothing subsystem 500 attempts to
preserve all the spatial detail in an image while suppressing
time-varying noise. The basic structure of the processing engine is
a set of independent Kalman filters, conceptually one at each pixel
position. New values of intensity at a pixel are mixed in with the
previous time-average using a weighting determined by the relative
stability of the observed value versus the stability of the
time-average. However, instead of just assuming a fixed noise
figure for each observation and a monotonically decreasing noise
figure for the average, the noise figure for the average is
directly tweaked on each cycle in the exemplary embodiment based on
the current observation.
[0057] According to one aspect of the invention, the disclosed
temporal smoothing process 500 applies temporal smoothing to all
pixels in an image, but the amount of smoothing depends on how much
change has been observed at that pixel. In areas where motion is
detected, the temporal smoothing is basically turned off, yet it is
reapplied once the region settles down. While this does not
suppress noise to the same extent as straight temporal averaging
would, it is much more responsive to moving objects. Moreover, the
degree of motion responsiveness is smoothly controlled by a
continuous variable rather than having the system make firm
decisions on motion or no-motion. This preserves object boundaries
better (especially where they are somewhat indistinct) and acts to
conceal any slight mistakes the system might make in its
classification.
[0058] As shown in FIG. 5, the temporal smoothing process 500
initially finds pixel-wise differences between the current image
and the previous smoothed image during step 510. Thereafter, during
step 520, the temporal smoothing process 500 computes a pixel-wise
stability estimate based on previous pixel variances and current
differences.
[0059] A pixel-wise weighting factor is generated during step 530
based on the stability estimates and a channel noise estimate. A
new smoothed image is generated during step 540 by mixing in the
current image using the pixel-wise weighting factors. Finally, a
new pixel variance is generated during step 550 using the weighting
factors and the pixel-wise stability estimates.
[0060] A Kalman filter can be described with two equations: one for
the measurement (M) and one for the process (P). measurement:
M=P+Vm, where Vm equals the variance in the measurement process:
P'=b*P+c, where c is the expected jumpiness and b is a time decay
constant.
[0061] These equations can be used by the temporal smoothing
process 500 to generate the standard Kalman update equations. The
mixing of the new observation (M) with the previous average (P)
during step 540 is determined by the Kalman matrix (here, just the
value k). After absorbing the new measurement, the system 500
retains the new estimates of the average (P') and the variance (V')
for use on the next cycle. d=M-P, where M equals the current image.
k=V/(V+n), where n equals the measurement noise constant. P'=P+k*d,
where P' equals the new average. V'=V-k*V where V' equals the new
variance.
[0062] In an exemplary embodiment, instead of using just the
computed variance of the estimate to construct the Kalman mixing
factor, a dynamic bias term is also included that is a real-time,
one sample estimate of the variance at the pixel: d=M-P, where M
equals the current image. s=V+f*(d*d-V), where f equals the mixing
constant. k=s/(s+n), where n equals the measurement noise constant.
P'=P+k*d, where P' equals the new average. V'=s-k*s, where V'
equals the new variance.
[0063] Note that d*d=(M-P) 2 equals the square of the difference
between the current observation and the longer term average. It is
this new "s" term that causes the temporal averaging to be turned
off when objects move. If the pixel is much different from what is
expected, s goes up which in turn raises k, the proportion by which
the new measurement is blended with the longer term average.
[0064] In one preferred embodiment for video at 30 frames per
second and for pixel intensities in the range of 0 to 255, f equals
0.1 and n equals 64. Also, for color images, separate versions of
the estimator are run for the red, green, and blue values at each
pixel. The "clean" image is generated by reporting the averages
(P') for each estimator in place of the original observed
intensities (M).
[0065] Lens Normalization
[0066] FIG. 6 is a flow chart describing an exemplary
implementation of a lens normalization process 600 that may be
employed by the lens normalization preprocessor 600 of FIG. 1. As
shown in FIG. 6, the corrected image is split into a collection of
independent pixel positions during step 610. Thereafter, a radial
distortion correction equation is used to determine the fractional
pixel position in the input image closest to the source for the
corrected pixel during step 620.
[0067] Standard radial lens distortion correction can be
accomplished by applying the following equations:
x'=x+sc2*r.sup.2+sc4*r.sup.4; and y'=y+sc2*r.sup.2+sc4*r.sup.4
where (x', y') is the new corrected position for a pixel, (x, y) is
the original pixel location in the distorted image, r is the
distance of the original pixel from the projection (x0, y0) of the
optical lens center on the image plane, and sc2 and sc4 are fixed
constants describing the curvature of the lens.
[0068] Interpolation is employed on values of the input image
pixels closest to the fractional pixel position to generate a value
for the corrected pixel during step 630. Finally, all independently
interpolated pixels are recombined during step 640 to generate a
corrected output image.
[0069] Shadow Removal Method
[0070] FIG. 7 is a flow chart describing an exemplary
implementation of a shadow removal process 700 that may be employed
by the shadow removal preprocessor 700 of FIG. 1. Generally, the
shadow removal subsystem 700 pre-corrects the input image for
possible shadow effects before passing the corrected image on to a
standard background subtraction algorithm. The pre-correction
involves adjusting the intensity of each pixel based on a gain
factor computed from channel intensity ratios that have been
weighted by channel noise estimates. In regions where there is
little difference between the original input and the reference, the
gain estimate will be close to one and hence there will continue to
be little difference. In regions where there are significant
differences (particularly in color), correcting the average
intensity of a pixel will not generally make its color components
match any better and hence there will still be a difference. It is
only where absolute intensity correction is appropriate (namely,
shadows and highlights) that pixels will have their determination
changed (i.e., from being a difference, to not being a
difference).
[0071] The disclosed shadow removal process 700 has a number of
advantages. The shadow removal process 700 does not require
expensive trigonometric calculations (and hence can be faster),
will work in dim and bland regions (since it remains in the RGB
color space), and will not be thrown off significantly by noisy
images or video compression artifacts (e.g. a bad blue
channel).
[0072] As shown in FIG. 7, a pixel-wise ratio between the current
image and a reference image is determined for each color channel
during step 710. The ratios are then combined during step 720 at
each pixel using estimates of the relative noise in each color
channel. The shadow removal process 700 then divides the value of
each color channel by the combined ratio estimate at each pixel
during step 730.
[0073] In one exemplary embodiment of the shadow removal process
700, for each pixel in the input image, its red, green, and blue
color values are compared to those for the corresponding pixel in
the reference image. Separate ratios are computed for each channel:
F.sub.r=S.sub.r/I.sub.r, F.sub.g=S.sub.g/I.sub.g,
F.sub.b=S.sub.b/I.sub.b. Here, F.sub.c is the gain correction
factor estimate for channel c (either r=red, g=green, or b=blue),
S.sub.c is the value of channel c for the pixel in the stable
reference image, and I.sub.c is the value of the pixel in channel c
for the input image. The three separate estimates are then each
compared to a potential valid range of correction, such as
2.0.times. to 0.8.times.. If any individual estimate is outside
these bounds, the gain for the pixel is set to one (and so no
change is made). Otherwise, the individual estimates are combined
based on the noise in each channel:
F=F.sub.r/W.sub.r+F.sub.g/W.sub.g+F.sub.b/W.sub.b, where
W.sub.c=N.sub.c*(1/N.sub.r+1/N.sub.g+1/N.sub.b), N.sub.c being the
observed noise in channel c. Once F(x, y) has been calculated for
each pixel, a corrected image is produced by multiplying through by
the derived factors: I'(x, y)=F(x, y)*I(x, y).
[0074] In one preferred embodiment, the noise estimates are
computed by comparing each input image with a reference image.
Typically, the original image is heavily subsampled (e.g., every
4th pixel in the vertical and horizontal directions) to select only
several thousand pixels for evaluation. Also, since this module 700
is typically used in conjunction with a background subtraction
system 195, pixels that are known not to correspond to the
background (i.e., pixels that are part of detected foreground
objects) are omitted from the comparison. The absolute value of
each selected pixel difference (|I.sub.c(x, y)-S.sub.c(x, y)| in a
channel c) is then accumulated into a difference histogram for that
channel.
[0075] The difference histogram itself is smoothed, using a method
such as the averaging of adjacent bins, and the primary peak
(maximum occupancy bin) is found. The falling edge of this peak is
determined by locating the lowest index bin whose occupancy is less
than some factor (e.g., 10%) of the peak value. The value (n)
associated with this bin is a new estimate of the noise in the
channel. This new value can either be reported directly or, in the
preferred implementation, combined with the previous noise estimate
using a temporal smoothing filter (e.g., N'.sub.c=(1-k)*N.sub.c+k*n
with k equal to 0.05 for 30 frames per second video).
[0076] In a further variation, one or more of the preprocessing
blocks in the system 100 can perform a contrast enhancement on the
image signal. Contrast enhancement can be implemented, for example,
by determining what part of the dynamic range of pixel values is
being used. In one implementation, a histogram is created of all
the red, green, and blue pixel values and then the 5% percentile
point and the 95% percentile point of the distribution are
identified. From these numbers, an offset and scale factor are
calculated that will translate these points to fixed values such as
20 and 240, respectively. This effectively stretches the range of
values being used without altering the hue information (which is
based on color differences, not ratios).
[0077] System and Article of Manufacture Details
[0078] As is known in the art, the methods and apparatus discussed
herein may be distributed as an article of manufacture that itself
comprises a computer readable medium having computer readable code
means embodied thereon. The computer readable program code means is
operable, in conjunction with a computer system, to carry out all
or some of the steps to perform the methods or create the
apparatuses discussed herein. The computer readable medium may be a
recordable medium (e.g., floppy disks, hard drives, compact disks,
or memory cards) or may be a transmission medium (e.g., a network
comprising fiber-optics, the world-wide web, cables, or a wireless
channel using time-division multiple access, code-division multiple
access, or other radio-frequency channel). Any medium known or
developed that can store information suitable for use with a
computer system may be used. The computer-readable code means is
any mechanism for allowing a computer to read instructions and
data, such as magnetic variations on a magnetic media or height
variations on the surface of a compact disk.
[0079] The computer systems and servers described herein each
contain a memory that will configure associated processors to
implement the methods, steps, and functions disclosed herein. The
memories could be distributed or local and the processors could be
distributed or singular. The memories could be implemented as an
electrical, magnetic or optical memory, or any combination of these
or other types of storage devices. Moreover, the term "memory"
should be construed broadly enough to encompass any information
able to be read from or written to an address in the addressable
space accessed by an associated processor. With this definition,
information on a network is still within a memory because the
associated processor can retrieve the information from the
network.
[0080] It is to be understood that the embodiments and variations
shown and described herein are merely illustrative of the
principles of this invention and that various modifications may be
implemented by those skilled in the art without departing from the
scope and spirit of the invention.
* * * * *