U.S. patent application number 13/609393 was filed with the patent office on 2013-01-03 for detection and tracking of moving objects.
This patent application is currently assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION. Invention is credited to Arun Hampapur, Jun Li, Charles A. Otto, Sharathchandra Pankanti.
Application Number | 20130002866 13/609393 |
Document ID | / |
Family ID | 45495895 |
Filed Date | 2013-01-03 |
United States Patent
Application |
20130002866 |
Kind Code |
A1 |
Hampapur; Arun ; et
al. |
January 3, 2013 |
Detection and Tracking of Moving Objects
Abstract
Techniques for performing visual surveillance of one or more
moving objects are provided. The techniques include registering one
or more images captured by one or more cameras, wherein registering
the one or more images comprises region-based registration of the
one or more images in two or more adjacent frames, performing
motion segmentation of the one or more images to detect one or more
moving objects and one or more background regions in the one or
more images, and tracking the one or more moving objects to
facilitate visual surveillance of the one or more moving
objects.
Inventors: |
Hampapur; Arun; (Norwalk,
CT) ; Li; Jun; (Marietta, GA) ; Pankanti;
Sharathchandra; (Darien, CT) ; Otto; Charles A.;
(Lansing, MI) |
Assignee: |
INTERNATIONAL BUSINESS MACHINES
CORPORATION
Armonk
NY
|
Family ID: |
45495895 |
Appl. No.: |
13/609393 |
Filed: |
September 11, 2012 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
12972836 |
Dec 20, 2010 |
|
|
|
13609393 |
|
|
|
|
Current U.S.
Class: |
348/143 ;
348/E7.085 |
Current CPC
Class: |
G06T 7/254 20170101;
G06T 2207/30241 20130101; G06T 2207/20076 20130101; G06T 2207/30232
20130101; G06T 7/246 20170101; G06T 7/215 20170101; G06T 7/207
20170101; G03B 15/16 20130101; G06T 2207/10016 20130101; G06T 7/277
20170101; G06T 7/223 20170101; G06T 7/248 20170101 |
Class at
Publication: |
348/143 ;
348/E07.085 |
International
Class: |
H04N 7/18 20060101
H04N007/18 |
Claims
1. A method for performing visual surveillance of one or more
moving objects, wherein the method comprises: registering one or
more images captured by one or more cameras, wherein registering
the one or more images comprises region-based registration of the
one or more images in two or more adjacent frames; performing
motion segmentation of the one or more images to detect one or more
moving objects and one or more background regions in the one or
more images; and tracking the one or more moving objects to
facilitate visual surveillance of the one or more moving
objects.
2. The method of claim 1, wherein registering one or more images
comprises recursive global and local geometric registration of the
one or more images.
3. The method of claim 1, wherein registering one or more images
comprises using one or more sub-pixel image matching
techniques.
4. The method of claim 1, wherein performing motion segmentation of
the one or more images comprises forward and backward frame
differencing.
5. The method of claim 4, wherein forward and backward frame
differences comprises automatic dynamic threshold estimation based
on at least one of temporary filtering and spatial filtering.
6. The method of claim 4, wherein forward and backward frame
differences comprises removing one or more false moving pixels
based on independent motions of one or more image features.
7. The method of claim 4, wherein forward and backward frame
differences comprises performing a morphological operation and
generating one or more motion pixels.
8. The method of claim 1, wherein tracking the one or more moving
objects comprises performing hybrid target tracking, wherein hybrid
target tracking comprises using a Kanade-Lucas-Tomasi feature
tracker and meanshift, using auto kernel scale estimation and
updating, and using one or more feature trajectories.
9. The method of claim 1, wherein tracking the one or more moving
objects comprises using one or more multi-target tracking
algorithms based on feature matching and distance matrices for one
or more targets.
10. The method of claim 1, wherein tracking the one or more moving
objects comprises: generating a motion map; identifying one or more
moving objects; performing object initialization and object
checking; identifying one or more object regions in the motion map;
extracting one or more features; setting a search region in the
motion map; identifying one or more candidate regions in the motion
map; meanshift tracking; identifying one or more moving objects in
the one or more candidate regions; performing Kanade-Lucas-Tomasi
feature matching; performing an affine transform; making a final
regions determination via the Bhattacharyya coefficient; and
updating a target model and trajectory information.
11. The method of claim 1, wherein tracking the one or more moving
objects comprises reference plane-based registration and
tracking.
12. The method of claim 1, further comprising relating each camera
view with one or more other camera views.
13. The method of claim 1, further comprising forming a panoramic
view from the one or more images captured by one or more
cameras.
14. The method of claim 13, further comprising estimating motion of
each camera based on video information of one or more static
objects in the panoramic view.
15. The method of claim 13, further comprising estimating one or
more background structures in the panoramic view based on linear
structure detection and statistical analysis of the one or more
moving objects over a period of time.
16. The method of claim 1, further comprising automatic feature
extraction, wherein automatic feature extraction comprises: framing
an image; performing a Gaussian smoothing operation; using a canny
detector to extract one or more feature edges; implementing a hough
transformation for feature analysis; determining a maximum response
finding for reducing an influence of multiple peaks in a transform
space; determining if a length of a feature is greater than a
certain threshold, and if the length of the feature is greater than
the threshold, performing feature extraction and pixel removal.
17. The method of claim 16, wherein automatic feature extraction
further comprises performing frame differencing and verification
via motion history images.
18. The method of claim 1, further comprising performing outlier
removal to remove one or more incorrect moving object matches.
19. The method of claim 1, further comprising false blob filtering,
wherein false blob filtering comprises: generating a motion map;
applying a connected component process to link each blob data;
creating a motion blob table; extracting one or more features for
each blob in a previously registered frame; and applying a
Kanade-Lucas-Tomasi method to estimate motion of each blob, and, if
no motion occurs for a blob, deleting the blob from the blob
table.
20. The method of claim 1, further comprising updating a target
model on at least one of a temporal domain and a spatial
domain.
21. The method of claim 1, further comprising creating an index of
object appearances and object tracks in a panoramic view.
22. The method of claim 21, further comprising determining a
similarity metric between a query and an entry in the index.
23. The method of claim 1, further comprising providing a system,
wherein the system comprises one or more distinct software modules,
each of the one or more distinct software modules being embodied on
a tangible computer-readable recordable storage medium, and wherein
the one or more distinct software modules comprise a geometric
registration module, a motion extraction module and an object
tracking module executing on a hardware processor.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. patent
application Ser. No. 12/972,836, filed Dec. 20, 2010, incorporated
by reference herein.
FIELD OF THE INVENTION
[0002] Embodiments of the invention generally relate to information
technology, and, more particularly, to object detection.
BACKGROUND OF THE INVENTION
[0003] In recent years, reconnaissance, surveillance, disaster
relief, search and rescue, agriculture information gathering and
fast remote sensing mapping has gained increasingly attentions in
civilian and military purposes. For example, due to their small
size and low-cost sensor platform, Unmanned Aerial Vehicle (UAV)
can be an attractive platform for executing such operations.
However, UAV introduces some significant challenges when used in
surveillance systems. For an instance, the background significantly
changes as the camera has a fast motion and an irregular rotation,
and the motion of a UAV vehicle is usually not smooth. Further,
frame rate is very low (for example, 1 frame per second) so as to
increase the difficulties of detecting and tracking ground moving
targets, and small object size will bring another challenge for
object detection and tracking. Also, a camera's strong illumination
change and stripe noise can create some hard problems to separate
true moving objects from the background.
[0004] Existing approaches also include object initialization
issues, and are additionally unable to obtain high-accuracy
registration results, to handle rotation and scale variation of a
target, and to deal with similar distribution between target and
background.
SUMMARY OF THE INVENTION
[0005] Principles and embodiments of the invention provide
techniques for detection and tracking of moving objects. An
exemplary method (which may be computer-implemented) for performing
visual surveillance of one or more moving objects, according to one
aspect of the invention, can include steps of registering one or
more images captured by one or more cameras, wherein registering
the one or more images comprises region-based registration of the
one or more images in two or more adjacent frames, performing
motion segmentation of the one or more images to detect one or more
moving objects and one or more background regions in the one or
more images, and tracking the one or more moving objects to
facilitate visual surveillance of the one or more moving
objects.
[0006] One or more embodiments of the invention or elements thereof
can be implemented in the form of a computer product including a
tangible computer readable storage medium with computer useable
program code for performing the method steps indicated.
Furthermore, one or more embodiments of the invention or elements
thereof can be implemented in the form of an apparatus including a
memory and at least one processor that is coupled to the memory and
operative to perform exemplary method steps. Yet further, in
another aspect, one or more embodiments of the invention or
elements thereof can be implemented in the form of means for
carrying out one or more of the method steps described herein; the
means can include (i) hardware module(s), (ii) software module(s),
or (iii) a combination of hardware and software modules; any of
(i)-(iii) implement the specific techniques set forth herein, and
the software modules are stored in a tangible computer-readable
storage medium (or multiple such media).
[0007] These and other objects, features and advantages of the
present invention will become apparent from the following detailed
description of illustrative embodiments thereof, which is to be
read in connection with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] FIG. 1 is a diagram illustrating sub-pixel position
estimation, according to an embodiment of the present
invention;
[0009] FIG. 2 is a diagram illustrating sub-region selection,
according to an embodiment of the present invention;
[0010] FIG. 3 is a diagram illustrating forward and backward
geometric registration, according to an embodiment of the present
invention;
[0011] FIG. 4 is a flow diagram illustrating forward and backward
frame differencing, according to an embodiment of the present
invention;
[0012] FIG. 5 is a flow diagram illustrating false blob filtering,
according to an embodiment of the present invention;
[0013] FIG. 6 is a flow diagram illustrating multi-object tracking,
according to an embodiment of the present invention;
[0014] FIG. 7 is a diagram illustrating reference plane-based
registration and tracking, according to an embodiment of the
present invention;
[0015] FIG. 8 is a flow diagram illustrating automatic urban road
extraction, according to an embodiment of the present
invention;
[0016] FIG. 9 is a block diagram illustrating architecture of an
object detection and tracking system, according to an aspect of the
invention;
[0017] FIG. 10 is a flow diagram illustrating techniques for
performing visual surveillance of one or more moving objects,
according to an embodiment of the invention; and
[0018] FIG. 11 is a system diagram of an exemplary computer system
on which at least one embodiment of the invention can be
implemented.
DETAILED DESCRIPTION OF EMBODIMENTS
[0019] Principles of the invention include detection, tracking, and
searching of moving objects in visual surveillance. In an example
setting including moving objects and one or more moving cameras,
one or more embodiments of the invention include motion
segmentation (motion blobs versus background region), multiple
object tracking (for example, consistently tracking in over-time)
and reference plane-based registration and tracking. As detailed
herein, one or more embodiments of the invention include using
multiple cameras (for example, registered with each other) mounted,
for example, on mobile platforms (for example, unmanned aerial
vehicle (UAV) videos) to detect, track and search for moving
objects by forming a panoramic view from the images received from
the cameras based on global/local geometric registration, motion
segmentation, moving object tracking, reference plane-based
registration and tracking and automatic urban road extraction.
[0020] The techniques described herein include recursive geometric
registration, which includes region-based image registration for
adjacent frames instead of for an entire frame, sub-pixel image
matching techniques, and region-based geometric transformation for
handling lens geometric distortion. Also, one or more embodiments
of the invention include two-way motion detection and hybrid target
tracking using colors and features. Two-way motion detection
includes forward and backward frame differencing, automatic dynamic
threshold estimation based on temporary and/or spatial filtering,
as well as false moving pixel removal based on independent motions
of features. Hybrid target tracking includes Kanade-Lucas-Tomasi
feature tracker (KLT) and meanshift, auto kernel scale estimation
and updating, and consistently tracking in over-time using coherent
motion of feature trajectories.
[0021] Further, the techniques detailed herein include multi-target
tracking algorithms based on feature matching and distance matrices
for small targets, as well as, for example, a UAV surveillance
system implementation with Low frame rate (1 f/s) for detecting and
tracking the targets with small size (for example, without any
known shape model).
[0022] As noted herein, one or more embodiments of the invention
include local/global geometric registration of videos (for example,
UAV videos). In order to reduce the camera motion effect, a
frame-to-frame video registration process is implemented. An
accurate way to register two images can include matching every
pixel in each image. However, the high computation is not feasible.
An efficient way is to find a relatively small set of feature
points in the image that will be easy to find again and use only
those points to estimate a frame-to-frame homography. By way of
example only, 500-600 feature points can be extracted for an image
of 1280.times.1280 pixels.
[0023] Harris corner detector can be applied to image registration
and motion detection due to its invariance to scale, rotation and
illumination variation. In one or more embodiments of the
invention, Harris corner detector can be used as a feature point
detector. Its algorithm can be described as follows:
[0024] 1. For a pixel in an image I, compute its x- and
y-directional derivatives I x and I y, and I xy=I x I y.
[0025] 2. Apply a window function A, that is, hx=AI x, hy=AI y,
hxy=AI xy.
[0026] 3. Compute
H=h.sub.xh.sub.y-h.sub.xy.sup.2-.kappa.(h.sub.x+h.sub.y).sup.2
(.kappa. is a constant) to measure variations in both
directions.
[0027] 4. Threshold H and find local maxima to obtain a corner.
[0028] To compare the windows, one or more embodiments of the
invention include using a normalized correlation coefficient, which
is an efficient statistical method. The actual feature matching is
achieved by maximizing the correlation coefficient over small
windows surrounding the points. The correlation coefficient is
given by:
.rho. = r = 1 R c = 1 C [ g 1 ( r , c ) - u 1 ] [ g 2 ( r , c ) - u
2 ] r = 1 R c = 1 C [ g 1 ( r , c ) - u 1 ] 2 r = 1 R c = 1 C [ g 2
( r , c ) - u 2 ] 2 ; - 1 .ltoreq. .rho. .ltoreq. 1 ( 1 )
##EQU00001##
where: g.sub.1(r,c) represents individual gray values of template
matrix; u.sub.1 represents average gray value of template matrix;
g.sub.2(r,c) represents individual gray values of corresponding
part of search matrix; u.sub.2 represents average gray value of
corresponding part of search matrix; and R, C represents number of
rows and columns of template matrix.
[0029] Therefore, the block matching process can be achieved as
follows. For each point in a reference frame, all points in the
chosen frame are examined and its most similar point is chosen.
Next, it is tested whether the achieved correlation is reasonably
high. The point with maxima correlation coefficient is taken as a
candidate point.
[0030] Video registration requires real-time implementation. In one
or more embodiments of the invention, the block-matching algorithm
is only implemented for the features. As such, the computational
expense can be significantly reduced.
[0031] One or more embodiments of the invention also include
corresponding features checking and outlier removal. Feature-based
block matching can sometimes cause a mismatch. To avoid a
mismatching problem, one or more embodiments of the invention
include using forward searching to process the mismatching data
which cases are one too many, keeping the candidate corresponding
feature with the maximum gradient value and removing the others.
Also, backward searching is employed to solve the remaining
mismatching problem using the same approach.
[0032] In many instances, a pair of features with similar
attributes is accepted as a match. Nevertheless, some false matches
may occur. Therefore, in one or more embodiments of the invention,
a random sample consensus (RANSAC) outlier removal procedure is
performed to remove incorrect matches and improve the registration
precision.
[0033] The techniques detailed herein can additionally include
coarse-to-fine feature matching. Multi-resolution feature matching
can reduce searching space and false matching. At a coarsest
resolution layer, feature matching is performed and the searching
scope is determined. At the current resolution layer, the matching
results at the last layer can be taken as initial results and the
matching process can be performed by using equation (1) noted
above. In one or more embodiments of the invention, a search scope
is limited to 1-3 pixel(s). Further, the same operation can be
repeated until the highest resolution layer is reached.
[0034] As additionally described herein, one or more embodiments of
the invention include accurate position determination. For video
registration and motion detection purposes, pixel level accuracy
may not enough. In such instances, a sub-pixel position approach is
considered, and a distance-based weighting interpolation is
determined to the peak. The horizontal and vertical locations of
the peak can be separately estimated for the feature. Also, the
one-dimensional horizontal and vertical correlation curves can be
obtained. Further, the correlation value in x,y directions is
interpolated separately, and the accurate location of the peak is
computed. By way of example, FIG. 1 is a diagram illustrating
sub-pixel position estimation, according to an embodiment of the
present invention.
[0035] The techniques described herein also include local geometric
registration. By way of example, a sub-region geometric
registration can be selected, and the entire frame can be divided
into 2.times.2 sub-regions. FIG. 2 illustrates two selection
models.
[0036] FIG. 2 is a diagram illustrating sub-region selection,
according to an embodiment of the present invention. By way of
illustration, FIG. 2 depicts sub-region selection model 202 and
sub-region selection model 204.
[0037] One or more embodiments of the invention also include an
affine-based local transformation, such as, for example, the
following:
[ x y ] = [ a 0 + a 1 u + a 2 v b 0 + b 1 u + b 2 v ]
##EQU00002##
Where (x, y) is the new transformed coordinate of (u, v), and
(a.sub.j, b.sub.k) (j, k=1, 2, 3) is the set of transformation
parameters. Further, to determine the local transformation
parameters for each sub-region, one or more embodiments of the
invention include using a least squares technique to compute the
transformation parameters.
[0038] One or more embodiments of the invention also include
forward/backward frame-to-frame registration. For example, with
instances of rapid camera motion, strong illumination variation and
heavy stripe noise, to avoid residual error propagation,
forward/backward frame-to-frame registration is carried out for
multi-frame differencing. FIG. 3 illustrates an approach.
[0039] FIG. 3 is a diagram illustrating forward and backward
geometric registration, according to an embodiment of the present
invention. By way of illustration, FIG. 3 depicts frame 302
(F.sub.i-1), frame 304 (F.sub.i) and frame 306 (F.sub.1+1). To
estimate object motion at frame 304 (F.sub.i), which is taken as a
reference frame, previous frame 302 (F.sub.i-1) and next frame 306
(F.sub.i+1) are geometrically registered to the reference frame.
Motion estimation for each frame is carried out in such a
fashion.
[0040] Forward/backward frame differencing can also be implemented
for motion detection. A diagram of the approach used in one or more
embodiments of the invention is illustrated in FIG. 4. FIG. 4 is a
flow diagram illustrating forward and backward frame differencing,
according to an embodiment of the present invention. After
forward/backward frame-to-frame images (for example, frame 402,
frame 404 and frame 406) are geometrically registered and aligned
in steps 408 and 410, difference images are calculated. Instead of
using simple subtraction between the aligned frames, one or more
embodiments of the invention use forward/backward frame
differencing in steps 412 and 414 to reduce motion noise and
compensate the illustration variation such as automatic gain
control.
[0041] Additionally, step 416 includes performing image arithmetic
via I.sub.new=.DELTA.I.sub.i-1,i AND .DELTA..sub.i,i+1. Step 418
includes median filtering, which can reduce random motion noise. To
extract moving pixels of object moving objects, automatic dynamic
threshold estimation based on spatial filtering in step 420 is
carried out. Further, step 422 includes performing a morphological
operation to remove small isolated spots and fill holes in
foreground image and step 424 includes generating motion pixels
(for example, a motion map).
[0042] To further reduce random noise and illumination variation
effect, logical AND operation is implemented for forward/backward
difference images to get a final difference image.
{ D i - 1 , i ( x , y ) = F i - 1 ( x , y ) - F i ( x , y ) ; D i ,
i + 1 ( x , y ) = F i ( x , y ) - F i + 1 ( x , y ) ; D i ( x , y )
= D i - 1 , i ( x , y ) D i , i + 1 ( x , y ) ; i = 1 , 2 , , N
##EQU00003##
[0043] A threshold for each pixel is statistically calculated
automatically in terms of statistical characteristics and spatial
high frequency data of difference image. Further, a morphology step
can be applied to remove small isolated spots and fill holes in the
foreground image.
[0044] As described herein, one or more embodiments of the
invention also include motion verification. FIG. 5 is a flow
diagram illustrating false blob filtering, according to an
embodiment of the present invention. Step 502 includes generating a
motion map. Step 504 includes applying a connected component
process to link each blob data. Step 506 includes creating a motion
blob table. Step 508 includes performing an optical flow
estimation. Step 510 includes making a displacement determination.
If there is displacement, the process proceeds to step 512, which
includes performing post-processing such as, for example, data
association, object tracking, trajectory maintenance and track data
management. If there is no displacement, the process proceeds to
step 514, which includes filtering false blobs.
[0045] Accordingly, after a blob table is created, in order to
remove false motion blobs from the blob table, each blob data is
verified. One or more embodiments of the invention apply a KLT
process to estimate the motion of each blob after forward/backward
frame-to-frame registration is done. A false blob will be deleted
from the blob table. The process steps can include, for example,
applying a connected component process to link each blob data,
creating a blob table, extracting features for each blob in a
previous registered frame, applying the KLT method to estimate the
motion of each blob, and if no motion occurs, the blob is deleted
from the blob table. Also, the above-noted steps can be repeated
for all blobs.
[0046] As also detailed herein, one or more embodiments of the
invention include multi-object tracking. FIG. 6 is a flow diagram
illustrating multi-object tracking, according to an embodiment of
the present invention. Step 602 includes generating a motion map.
Step 604 includes identifying moving blobs. Step 606 includes
object initialization and step 608 includes object checking. Step
610 includes identifying object regions. Step 612 includes
identifying candidate regions. Also, step 614 includes meanshift
tracking and step 616 includes identifying new locations.
[0047] Additionally, after identifying object regions in step 610,
features can be extracted in step 618. Once a search region is set
in step 620, moving blobs can be found as potential object
candidates in step 622. KLT matching is performed in step 624 and
outlier removal based on an affine transform with RANSAC is
performed in step 626. A new region candidate is identified in step
628. Meanwhile, Meanshift is applied in step 614 to compute the
inter-frame translation. This yields a candidate region location in
step 616. From steps 628 and 616, the process can proceed to step
630, which determines the final region location based on the
Bhattacharyya coefficient. Also, step 632 includes target model
updating for solving drift issues, and step 634 includes trajectory
updating. Also, to track moving objects, a hybrid tracking model
based on the combination of KLT and Meanshift method is applied
from step 618 to 630.
[0048] As noted, the techniques described herein include object
initialization. The motion detection results from forward/backward
frame differencing can contain some correct real moving objects and
some false objects, and miss some true objects. By way of example,
for an UAV video with low frame rate (for example, 1 frame/second),
a moving object does not have any overlapping regions between two
consecutive frames so that traditional methods for object
initialization will not work. To efficiently isolate promising
moving objects among all detection results for current frame, one
or more embodiments of the invention include combining a distance
matrix with a similarity measure to initialize moving objects. The
processing steps can include, for example, the following.
[0049] A search radius is set, matching score threshold and minimum
length of tracked history. The distance matrix between the objects
(including object candidates) and all the blobs in the table is
computed. If the length of object trajectory is less than the
preset value, a Kernel-based algorithm is applied to find the match
between the object candidate and blobs in terms of a preset
matching score. Also, if the object candidate appears in several
consecutive frames, this candidate will be initialized and stored
on the object table. Otherwise, the object candidate will be
considered as a false object.
[0050] From the previous frame, one or more embodiments of the
invention include projecting the previous blob set into a current
frame after geometrical registration. The motion of each object
according to its previous position can be estimated by a KLT
tracking process. In a KLT tracking process, a motion model is
approximately represented by an affine transformation, such that,
I.sub.curr(A x+T)=I.sub.prev(x), where A is a two-dimensional (2D)
transformation matrix and T is the translation vector.
[0051] In one or more embodiments of the invention, affine
transformation parameters can be computed from as few as four
feature points. To determine these parameters, a least squares
technique can be used to compute them.
[0052] Accuracy estimation can be performed, for example, when the
number of mismatched pairs occurs. One measure of tracking accuracy
is the root mean square error (RMSE) between the matched points
before and after the affine transformation formula. This measure is
used as a criterion to eliminate the matches that are considered
imprecise.
[0053] Additionally, to eliminate the outliers, one or more
embodiments of the invention includes performing the RANSAC
algorithm to sequentially remove mis-matches in an iterative
fashion until the RMSE value is lower than the desired
threshold.
[0054] The techniques detailed herein additionally include
meanshift tracking and object representation. By way of example,
for a UAV tracking system, traditional intensity-based target
representation is no longer suitable for multi-object tracking due
to large scale variation and perspective geometric distortion. To
efficiently characterize the object, histogram-based feature space
can be chosen. In one or more embodiments of the invention, a
metric based on the Bhattacharyya coefficient is used to define a
similarity measure between a reference object and a candidate for
multi-object tracking. Given an object region histogram q in the
reference frame, the Bhattacharyya coefficient based objective
function is given by:
.rho. ( p , q ) = u = 1 M p u ( x ) q u ( x 0 ) ##EQU00004##
where M is the histogram dimension, and x.sub.0 is the 2D
center.
[0055] The candidate region histogram p.sub.u(x) at 2D center x in
the current frame is defined as:
p u ( x ) = k ( x - x i h 2 ) .delta. ( b ( x i ) , u ) k ( x - x i
h 2 ) ##EQU00005##
[0056] Here, u=1, 2, . . . , M. k(x) denotes a non-negative,
non-increasing and piecewise-differentiable kernel profile which
weights the pixel location, h is 2D bandwidth vector of k(x),
.delta. is the Kronecker delta function and each pixel value is
denoted by b(x.sub.i).
[0057] Additionally, in one or more embodiments of the invention,
in determining a similarity measure between distributions, the
Bhattacharyya distance can include B(I.sub.x, I.sub.y)= {square
root over (1-.rho.(p.sub.x, p.sub.y))}, where .rho.(p.sub.x,
p.sub.y)=.intg. {square root over ({circumflex over
(p)}.sub.x(u){circumflex over (p)}.sub.y(u))} du, and where
.rho..sub.x and p.sub.y represent the target and the candidate
distributions, respectively.
[0058] The techniques described herein can additionally include
object positioning. To search the location corresponding to the
object from one frame to the next, one or more embodiments of the
invention include applying a meanshift tracking algorithm that is
based on a gradient ascent optimization rather than an exhaustive
search. Strengths of the meanshift method include computational
effectiveness and suitability to real-time application. However, a
target can be lost, for example, due to an intrinsic limitation of
exploring local maxima, especially when the tracked object moves
quickly. The candidate region histogram p.sub.u(x) can be obtained
from the above equation.
[0059] The new location of the tracked object can be estimated
as:
y ^ 1 = i = 1 n X i .omega. i g ( y ^ 0 - X i h 2 ) i = 1 n .omega.
i g ( y ^ 0 X i h 2 ) ##EQU00006##
where:
.omega. 1 = u = 1 m .delta. [ b ( X i ) - u ] q ^ u p ^ u ( y ^ 0 )
##EQU00007##
[0060] g(x)=-k(x), that the derivative of k(x).
[0061] One or more embodiments of the invention can also include
target model updating on a temporal domain. In some circumstances,
a meanshift approach without target model updating can suffer from
abrupt changes in target model. On the other hand, the model
updating for every frame can result in decreasing the reliability
of the tracking results due to cluttered environment, occlusion,
random noise, etc. One way to change the target model is to
periodically update the target distributions.
[0062] To obtain a precise tracking result, the target model can be
updated dynamically. Accordingly, one or more embodiments of the
invention include model updating that use both recent tracking
results and older target model to impact a current target model for
object tracking. The updating procedure is formulized as:
q.sub.u.sup.new=(1-.alpha.)q.sub.u.sup.old+.alpha.p.sub.u.sup.s
[0063] Here, the superscripts of new and old denote the newly
obtained target model and the old model, respectively. s represents
the recent tracking result. .alpha. weights the contribution of the
recent tracking result (normally <0.1). q and p represent the
target model and the candidate model, respectively.
[0064] Further, one or more embodiments of the invention include
target model updating on a spatial domain. Normally, meanshift
based tracking hardly provides precise boundary position of the
tracked object due to lack of utilizing spatial data. Fortunately,
detection results derived from KLT tracker and motion detection
results can provide much more accurate information, such as the
precise position and object size compared with meanshift
tracker.
[0065] Each individual algorithm may unable to do a perfect job on
multi-object tracking. Thus, fusion among their data can be used in
a multi-object tracking procedure. According to the strengths of
each method, one or more embodiments of the invention use the
following merging method:
Output = { result by motion detector ; if Overlapping .gtoreq. T
KLT result ; if Outlier for MS occurs result by meanshift ;
otherwise ##EQU00008##
where overlapping represents the degree of overlapping region.
[0066] FIG. 7 is a diagram illustrating reference plane-based
registration and tracking, according to an embodiment of the
present invention. By way of illustration, FIG. 7 depicts a
geo-reference plane 702. The first frame 704 is registered to
geo-reference plane 702, and the second frame 706 is registered to
the geo-reference 702 from the first registered frame and
corresponding inter-frame transformation parameters TC.sub.i
(equation 712 in FIG. 7). In such fashion, frames 708 and 710 are
registered to the geo-reference 702, respectively. Moreover, each
object is projected into geo-reference 702 using navigation
data.
[0067] FIG. 8 is a flow diagram illustrating automatic urban road
extraction, according to an embodiment of the present invention.
Step 802 includes framing an image. Step 804 includes performing a
Gaussian smoothing operation. Also, step 806 includes using a canny
detector and step 808 includes implementing a hough transformation.
Step 810 includes determining a maximum response finding. Step 812
includes determining if the length of the stripe is greater than a
pre-defined threshold. If the length of the stripe is not greater
than the threshold, the process stops at step 814. If the length of
the stripe is greater than the threshold, the process continues to
step 816, which includes performing a straight line extraction.
Further, step 818 includes performing stripe pixels removal (which
can, for example, lead to a return to step 808).
[0068] As also depicted in FIG. 8, step 820 includes performing
frame differencing, and step 822 includes verification via motion
history images (MHI) (which can, for example, lead to a return to
step 816). Additionally, one or more embodiments of the invention
can also include extraction of road stripes via iterative hough
transform.
[0069] As detailed herein, one or more embodiments of the invention
include recursive geometric registration with sub-pixel matching
accuracy that can handle various geometrical residual errors from
un-calibrated camera. Additionally, the techniques detailed herein
include motion detection based on forward/backward frame
differencing that can efficiently separate moving objects from
background. Further, a hybrid object tracker can be implemented
that uses colors, features and intensity statistical
characteristics overtime to detect and track multiple small
objects.
[0070] FIG. 9 is a block diagram illustrating architecture of an
object detection and tracking system, according to an aspect of the
invention. An example software architecture construction for a
detection and tracking system (for example, a UAV system) can be
built on multiple services to provide a track database for object
search and intelligent analysis. As illustrated in FIG. 9, the
software architecture can include multiple sensor modules 904,
video streaming service modules 906, tracking suite service modules
908, a track database (DB) server module 910, a user interface
module 902 and a visualization console 912. A video streaming
module 906 serves to capture and make available imagery from
multiple sensors. The acquired images are used by a tracking suite
module 908 as the basis for multi-object detection and tracking.
Tracking suite modules 908 includes a geometric registration
sub-module 914, a motion extraction sub-module 916, an object
tracking sub-module 918, a tracking data sub-module 920 and a
geo-coordinate mapping sub-module 922.
[0071] By processing the real-time imagery from multiple sensors,
sophisticated transformation of data to track information is
achieved. Track DB server 910 serves track metadata management.
Visualization console 912 creates graphical overlays, indexes them
to the imagery on the display, and presents them to a user. These
overlays can be any type of graphical information that supports the
higher level components, such as, for example, class types, moving
directions, trajectories and object sizes. User interface 902
provides data access and operation by the user.
[0072] FIG. 10 is a flow diagram illustrating techniques for
performing visual surveillance of one or more moving objects,
according to an embodiment of the present invention. Step 1002
includes registering one or more images captured by one or more
cameras, wherein registering the one or more images comprises
region-based registration of the one or more images in two or more
adjacent frames. This step can be carried out, for example, using a
geometric registration sub-module 914 in tracking suite service
module 908. Registering images can include recursive global and
local geometric registration of the one or more images (for
example, region-based geometric transformation for handling lens
geometric distortion). Registering images can also include using
sub-pixel image matching techniques.
[0073] Step 1004 includes performing motion segmentation of the one
or more images to detect one or more moving objects and one or more
background regions in the one or more images. This step can be
carried out, for example, using a motion extraction sub-module 916
in tracking suite service module 908. Performing motion
segmentation of the images can include forward and backward frame
differencing. Forward and backward frame differences can include,
for example, automatic dynamic threshold estimation based on
temporary filtering and/or spatial filtering, removing false moving
pixels based on independent motions of image features, and
performing a morphological operation and generating motion
pixels.
[0074] Step 1006 includes tracking the one or more moving objects
to facilitate visual surveillance of the one or more moving
objects. This step can be carried out, for example, using an object
tracking sub-module 918 in tracking suite service module 908.
Tracking the moving objects can include performing hybrid target
tracking, wherein hybrid target tracking includes using a
Kanade-Lucas-Tomasi feature tracker and meanshift, using auto
kernel scale estimation and updating, and using feature
trajectories. One or more embodiments of the invention can also
include using colors for tracking. Tracking moving objects can
additionally include using multi-target tracking algorithms based
on feature matching and distance matrices for one or more (small)
targets.
[0075] Also, tracking moving objects can include generating a
motion map, identifying one or more moving objects (blobs),
performing object initialization and object checking, identifying
object regions in the motion map, extracting features, setting a
search region in the motion map, identifying candidate regions in
the motion map, meanshift tracking, identifying moving objects in
the candidate regions, performing Kanade-Lucas-Tomasi feature
matching, performing an affine transform (with RANSAC), making a
final regions determination via the Bhattacharyya coefficient, and
updating a target model and trajectory information. Tracking moving
objects can additionally include reference plane-based registration
and tracking.
[0076] The techniques depicted in FIG. 10 can also include relating
each camera view with one or more other camera views, and forming a
panoramic view from the images captured by one or more cameras. One
or more embodiments of the invention additionally include
estimating motion of each camera based on video information of
static objects in the panoramic view, as well as estimating one or
more background (for example, road) structures in the panoramic
view based on linear structure detection and statistical analysis
of the moving objects over a period of time.
[0077] Further, the techniques depicted in FIG. 10 include
automatic feature (for example, a road) extraction, wherein
automatic feature extraction includes framing an image, performing
a Gaussian smoothing operation, using a canny detector to extract
one or more feature (for example, road) edges, implementing a hough
transformation for feature (for example, road stripe) analysis,
determining a maximum response finding for reducing an influence of
multiple peaks in a transform space, determining if a length of a
feature (for example, a road stripe) is greater than a certain
threshold, and if the length of the feature is greater than the
threshold, performing feature extraction and pixel removal.
Automatic feature extraction can additionally include performing
frame differencing and verification via motion history images.
[0078] One or more embodiments of the invention also include
performing outlier removal to remove incorrect moving object
matches (and improve the registration precision). The techniques
depicted in FIG. 10 can additionally include false blob filtering.
False blob filtering includes generating a motion map, applying a
connected component process to link each blob data, creating a
motion blob table, extracting features for each blob in a
previously registered frame, and applying a Kanade-Lucas-Tomasi
method to estimate motion of each blob, and, if no motion occurs
for a blob, deleting the blob from the blob table.
[0079] Additionally, one or more embodiments of the invention can
include updating a target model on a temporal domain and/or a
spatial domain, as well as creating an index (for example, a
searchable index) of object appearances and object tracks in a
panoramic view. Also, the object appearance and tracks template
index can be stored in a template data store with a pointer to the
corresponding video segments for easy retrieval. Further, one or
more embodiments of the invention can include determining a
similarity metric between a query and an entry in the index, which
can facilitate searching for the object appearance and tracks in a
template data store/index based on the similarity metric, and
outputting/listing the search results for a human operator based on
similarity of the query.
[0080] The techniques depicted in FIG. 10 can also, as described
herein, include providing a system, wherein the system includes
distinct software modules, each of the distinct software modules
being embodied on a tangible computer-readable recordable storage
medium. All the modules (or any subset thereof) can be on the same
medium, or each can be on a different medium, for example. The
modules can include any or all of the components shown in the
figures. In one or more embodiments, the modules include sensor
modules, video streaming service modules, tracking suite service
modules (including the sub-modules detailed herein), a track
database (DB) server module, a user interface module and a
visualization console module that can run, for example on one or
more hardware processors. The method steps can then be carried out
using the distinct software modules of the system, as described
above, executing on the one or more hardware processors. Further, a
computer program product can include a tangible computer-readable
recordable storage medium with code adapted to be executed to carry
out one or more method steps described herein, including the
provision of the system with the distinct software modules.
[0081] Additionally, the techniques depicted in FIG. 10 can be
implemented via a computer program product that can include
computer useable program code that is stored in a computer readable
storage medium in a data processing system, and wherein the
computer useable program code was downloaded over a network from a
remote data processing system. Also, in one or more embodiments of
the invention, the computer program product can include computer
useable program code that is stored in a computer readable storage
medium in a server data processing system, and wherein the computer
useable program code are downloaded over a network to a remote data
processing system for use in a computer readable storage medium
with the remote system.
[0082] As will be appreciated by one skilled in the art, aspects of
the present invention may be embodied as a system, method or
computer program product. Accordingly, aspects of the present
invention may take the form of an entirely hardware embodiment, an
entirely software embodiment (including firmware, resident
software, micro-code, etc.) or an embodiment combining software and
hardware aspects that may all generally be referred to herein as a
"circuit," "module" or "system." Furthermore, aspects of the
present invention may take the form of a computer program product
embodied in one or more computer readable medium(s) having computer
readable program code embodied thereon.
[0083] One or more embodiments of the invention, or elements
thereof, can be implemented in the form of an apparatus including a
memory and at least one processor that is coupled to the memory and
operative to perform exemplary method steps.
[0084] One or more embodiments can make use of software running on
a general purpose computer or workstation. With reference to FIG.
11, such an implementation might employ, for example, a processor
1102, a memory 1104, and an input/output interface formed, for
example, by a display 1106 and a keyboard 1108. The term
"processor" as used herein is intended to include any processing
device, such as, for example, one that includes a CPU (central
processing unit) and/or other forms of processing circuitry.
Further, the term "processor" may refer to more than one individual
processor. The term "memory" is intended to include memory
associated with a processor or CPU, such as, for example, RAM
(random access memory), ROM (read only memory), a fixed memory
device (for example, hard drive), a removable memory device (for
example, diskette), a flash memory and the like. In addition, the
phrase "input/output interface" as used herein, is intended to
include, for example, one or more mechanisms for inputting data to
the processing unit (for example, mouse), and one or more
mechanisms for providing results associated with the processing
unit (for example, printer). The processor 1102, memory 1104, and
input/output interface such as display 1106 and keyboard 1108 can
be interconnected, for example, via bus 1110 as part of a data
processing unit 1112. Suitable interconnections, for example via
bus 1110, can also be provided to a network interface 1114, such as
a network card, which can be provided to interface with a computer
network, and to a media interface 1116, such as a diskette or
CD-ROM drive, which can be provided to interface with media
1118.
[0085] Accordingly, computer software including instructions or
code for performing the methodologies of the invention, as
described herein, may be stored in one or more of the associated
memory devices (for example, ROM, fixed or removable memory) and,
when ready to be utilized, loaded in part or in whole (for example,
into RAM) and implemented by a CPU. Such software could include,
but is not limited to, firmware, resident software, microcode, and
the like.
[0086] A data processing system suitable for storing and/or
executing program code will include at least one processor 1102
coupled directly or indirectly to memory elements 1104 through a
system bus 1110. The memory elements can include local memory
employed during actual implementation of the program code, bulk
storage, and cache memories which provide temporary storage of at
least some program code in order to reduce the number of times code
must be retrieved from bulk storage during implementation.
[0087] Input/output or I/O devices (including but not limited to
keyboards 1108, displays 1106, pointing devices, and the like) can
be coupled to the system either directly (such as via bus 1110) or
through intervening I/O controllers (omitted for clarity).
[0088] Network adapters such as network interface 1114 may also be
coupled to the system to enable the data processing system to
become coupled to other data processing systems or remote printers
or storage devices through intervening private or public networks.
Modems, cable modem and Ethernet cards are just a few of the
currently available types of network adapters.
[0089] As used herein, including the claims, a "server" includes a
physical data processing system (for example, system 1112 as shown
in FIG. 11) running a server program. It will be understood that
such a physical server may or may not include a display and
keyboard.
[0090] As noted, aspects of the present invention may take the form
of a computer program product embodied in one or more computer
readable medium(s) having computer readable program code embodied
thereon. Any combination of one or more computer readable medium(s)
may be utilized. The computer readable medium may be a computer
readable signal medium or a computer readable storage medium. A
computer readable storage medium may be, for example, but not
limited to, an electronic, magnetic, optical, electromagnetic,
infrared, or semiconductor system, apparatus, or device, or any
suitable combination of the foregoing. Media block 1118 is a
non-limiting example. More specific examples (a non-exhaustive
list) of the computer readable storage medium would include the
following: an electrical connection having one or more wires, a
portable computer diskette, a hard disk, a random access memory
(RAM), a read-only memory (ROM), an erasable programmable read-only
memory (EPROM or Flash memory), an optical fiber, a portable
compact disc read-only memory (CD-ROM), an optical storage device,
a magnetic storage device, or any suitable combination of the
foregoing. In the context of this document, a computer readable
storage medium may be any tangible medium that can contain, or
store a program for use by or in connection with an instruction
execution system, apparatus, or device.
[0091] A computer readable signal medium may include a propagated
data signal with computer readable program code embodied therein,
for example, in baseband or as part of a carrier wave. Such a
propagated signal may take any of a variety of forms, including,
but not limited to, electro-magnetic, optical, or any suitable
combination thereof. A computer readable signal medium may be any
computer readable medium that is not a computer readable storage
medium and that can communicate, propagate, or transport a program
for use by or in connection with an instruction execution system,
apparatus, or device.
[0092] Program code embodied on a computer readable medium may be
transmitted using any appropriate medium, including but not limited
to wireless, wireline, optical fiber cable, radio frequency (RF),
etc., or any suitable combination of the foregoing.
[0093] Computer program code for carrying out operations for
aspects of the present invention may be written in any combination
of one or more programming languages, including an object oriented
programming language such as Java, Smalltalk, C++ or the like and
conventional procedural programming languages, such as the "C"
programming language or similar programming languages. The program
code may execute entirely on the user's computer, partly on the
user's computer, as a stand-alone software package, partly on the
user's computer and partly on a remote computer or entirely on the
remote computer or server. In the latter scenario, the remote
computer may be connected to the user's computer through any type
of network, including a local area network (LAN) or a wide area
network (WAN), or the connection may be made to an external
computer (for example, through the Internet using an Internet
Service Provider).
[0094] Aspects of the present invention are described herein with
reference to flowchart illustrations and/or block diagrams of
methods, apparatus (systems) and computer program products
according to embodiments of the invention. It will be understood
that each block of the flowchart illustrations and/or block
diagrams, and combinations of blocks in the flowchart illustrations
and/or block diagrams, can be implemented by computer program
instructions. These computer program instructions may be provided
to a processor of a general purpose computer, special purpose
computer, or other programmable data processing apparatus to
produce a machine, such that the instructions, which execute via
the processor of the computer or other programmable data processing
apparatus, create means for implementing the functions/acts
specified in the flowchart and/or block diagram block or
blocks.
[0095] These computer program instructions may also be stored in a
computer readable medium that can direct a computer, other
programmable data processing apparatus, or other devices to
function in a particular manner, such that the instructions stored
in the computer readable medium produce an article of manufacture
including instructions which implement the function/act specified
in the flowchart and/or block diagram block or blocks.
[0096] The computer program instructions may also be loaded onto a
computer, other programmable data processing apparatus, or other
devices to cause a series of operational steps to be performed on
the computer, other programmable apparatus or other devices to
produce a computer implemented process such that the instructions
which execute on the computer or other programmable apparatus
provide processes for implementing the functions/acts specified in
the flowchart and/or block diagram block or blocks.
[0097] The flowchart and block diagrams in the figures illustrate
the architecture, functionality, and operation of possible
implementations of systems, methods and computer program products
according to various embodiments of the present invention. In this
regard, each block in the flowchart or block diagrams may represent
a module, component, segment, or portion of code, which comprises
one or more executable instructions for implementing the specified
logical function(s). It should also be noted that, in some
alternative implementations, the functions noted in the block may
occur out of the order noted in the figures. For example, two
blocks shown in succession may, in fact, be executed substantially
concurrently, or the blocks may sometimes be executed in the
reverse order, depending upon the functionality involved. It will
also be noted that each block of the block diagrams and/or
flowchart illustration, and combinations of blocks in the block
diagrams and/or flowchart illustration, can be implemented by
special purpose hardware-based systems that perform the specified
functions or acts, or combinations of special purpose hardware and
computer instructions.
[0098] It should be noted that any of the methods described herein
can include an additional step of providing a system comprising
distinct software modules embodied on a computer readable storage
medium; the modules can include, for example, any or all of the
components shown in FIG. 9. The method steps can then be carried
out using the distinct software modules and/or sub-modules of the
system, as described above, executing on one or more hardware
processors 1102. Further, a computer program product can include a
computer-readable storage medium with code adapted to be
implemented to carry out one or more method steps described herein,
including the provision of the system with the distinct software
modules.
[0099] In any case, it should be understood that the components
illustrated herein may be implemented in various forms of hardware,
software, or combinations thereof; for example, application
specific integrated circuit(s) (ASICS), functional circuitry, one
or more appropriately programmed general purpose digital computers
with associated memory, and the like. Given the teachings of the
invention provided herein, one of ordinary skill in the related art
will be able to contemplate other implementations of the components
of the invention.
[0100] The terminology used herein is for the purpose of describing
particular embodiments only and is not intended to be limiting of
the invention. As used herein, the singular forms "a," "an" and
"the" are intended to include the plural forms as well, unless the
context clearly indicates otherwise. It will be further understood
that the terms "comprises" and/or "comprising," when used in this
specification, specify the presence of stated features, integers,
steps, operations, elements, and/or components, but do not preclude
the presence or addition of one or more other features, integers,
steps, operations, elements, components, and/or groups thereof.
[0101] The corresponding structures, materials, acts, and
equivalents of all means or step plus function elements in the
claims below are intended to include any structure, material, or
act for performing the function in combination with other claimed
elements as specifically claimed. The description of the present
invention has been presented for purposes of illustration and
description, but is not intended to be exhaustive or limited to the
invention in the form disclosed. Many modifications and variations
will be apparent to those of ordinary skill in the art without
departing from the scope and spirit of the invention. The
embodiment was chosen and described in order to best explain the
principles of the invention and the practical application, and to
enable others of ordinary skill in the art to understand the
invention for various embodiments with various modifications as are
suited to the particular use contemplated.
[0102] At least one embodiment of the invention may provide one or
more beneficial effects, such as, for example, automatic dynamic
threshold determination based on temporary and/or spatial
domain.
[0103] It will be appreciated and should be understood that the
exemplary embodiments of the invention described above can be
implemented in a number of different fashions. Given the teachings
of the invention provided herein, one of ordinary skill in the
related art will be able to contemplate other implementations of
the invention. Indeed, although illustrative embodiments of the
present invention have been described herein with reference to the
accompanying drawings, it is to be understood that the invention is
not limited to those precise embodiments, and that various other
changes and modifications may be made by one skilled in the
art.
* * * * *