U.S. patent application number 11/052598 was filed with the patent office on 2006-08-10 for method of extracting and searching integral histograms of data samples.
Invention is credited to Fatih M. Porikli.
Application Number | 20060177131 11/052598 |
Document ID | / |
Family ID | 36209734 |
Filed Date | 2006-08-10 |
United States Patent
Application |
20060177131 |
Kind Code |
A1 |
Porikli; Fatih M. |
August 10, 2006 |
Method of extracting and searching integral histograms of data
samples
Abstract
A computer implemented method extracts an integral histogram
from sampled data, such as time series data, images, and volumetric
data. First, a set of samples is acquired from a real-word signal.
The set of samples is scanned in a predetermined order. For each
current sample, an integral histogram integrating a histogram of
the current sample and integral histograms of previously scanned
samples is constructed.
Inventors: |
Porikli; Fatih M.;
(Watertown, MA) |
Correspondence
Address: |
Patent Department;Mitsubishi Electric Research Laboratories, Inc.
201 Broadway
Cambridge
MA
02139
US
|
Family ID: |
36209734 |
Appl. No.: |
11/052598 |
Filed: |
February 7, 2005 |
Current U.S.
Class: |
382/168 |
Current CPC
Class: |
G06T 7/20 20130101; G06K
9/4642 20130101; G06T 2207/10016 20130101; G06K 9/6212 20130101;
G06F 17/18 20130101; G06T 2207/30201 20130101 |
Class at
Publication: |
382/168 |
International
Class: |
G06K 9/00 20060101
G06K009/00 |
Claims
1. A computer implemented method for extracting an integral
histogram from sampled data, comprising: acquiring a set of samples
from a real-word signal; scanning the set of samples in a
predetermined order; and constructing, for each current sample, an
integral histogram, the integral histogram integrating a histogram
of the current sample and the integral histograms of previously
scanned samples.
2. The method of claim 1, in which the scanning is in a left to
right and then a top to bottom order.
3. The method of claim 1, in which the set of samples is a
d-dimensional array, and in which a range of values for each
dimension is N.sub.d with associated k-dimensional tensors.
4. The method of claim 1, in which the integral histogram includes
a plurality of bins, and a size of each bin is an integer
number.
5. The method of claim 4, in which the size is a power of two.
6. The method of claim 1, in which the set of samples is a
one-dimensional time series.
7. The method of claim 1, in which the set of samples is a
two-dimensional gray-level image.
8. The method of claim 1, in which the set of samples is a color
image.
9. The method of claim 1, in which the set of samples is volumetric
data.
10. The method of claim 1, in which the set of samples is a video,
and further comprising: constructing a similarity map from the
integral histogram.
11. The method of claim 10, in which the similarity map is used to
detect an object in the image.
12. The method of claim 7, in which the similarity map is used to
detect textures in the image.
13. The method of claim 1, further comprising: combining the
integral histogram spatially.
14. The method of claim 1, further comprising: combining the
integral histogram hierarchally.
15. The method of claim 1, further comprising: combining the
integral histogram according to a model.
16. The method of claim 1, in which the set of samples is an image,
and further comprising: specifying target regions in the image
according to corner points in a Cartesian space; determining the
integral histogram for the target regions; and normalizing the
integral histogram with respect to a size of the target regions in
the image to obtain a normalized histogram.
17. The method of claim 17, further comprising: adding bin values
of the integral histogram of a lower-right corner point that
correspond to a largest target region in the image to bin values of
the integral histogram of an upper-left corner point in the image
that correspond to a smallest target region, and subtracting the
bin values of the upper-right and lower-left corner point integral
histograms.
18. The method of claim 17, further comprising: determining a
distance between the normalized histogram and the integral
histograms of the target regions.
19. The method of claim 1, in which the set of samples is an image,
and further comprising: constructing higher-level features by
combining the integral histograms of intensity, color, texture,
gradient, motion, orientation, template matching, and image filter
responses of the image.
Description
FIELD OF THE INVENTION
[0001] The invention relates generally to representing and
searching data samples of real-world signals, and more particularly
to representing and searching with histograms extracted from the
data samples to detect objects.
BACKGROUND OF THE INVENTION
[0002] As shown in FIG. 1, a histogram 100 is an array of `bins`
101. Each bin corresponds to a range 102 of values of a sampled
data set. The bin `counts` the frequency 103 of occurrences of
sample values in a particular range. In other words, the histogram
represents a frequency distribution of the samples in the data
set.
[0003] For example, a histogram of a sampled color image `counts`
the number of pixels that have the same color values in each bin.
Thus, the histogram is a mapping from the sampled data set to a set
of non-negative real numbers +R.
[0004] From a probabilistic point of view, a normalization of the
histogram results in a discrete function that resembles a
probability density function of the data set. Histograms can be
used to determine statistical properties of the data set, such as
distribution, spread, and outliers.
[0005] Histograms are used in many computer vision applications,
such as object based indexing and retrieval, C. Carson, M. Thomas,
S. Belongie, J. M. Hellerstein, and J. Malik, "Blobworld: A system
for region-based image indexing and retrieval", Proceedings of
ICVS, 1999 and J. Huang, S. Kumar, M. Mitra, W. J. Zhu, and R.
Zabih, "Image indexing using color correlograms", Proceedings of
CVPR, 1997; image segmentation, D. A. Forsyth and J. Ponce.
"Computer Vision: A Modem Approach", Prentice Hall, 2002 and S.
Ruiz-Correa, L. G. Shapiro, and M. Meila, "A new paradigm for
recognizing 3-D object shapes from range data", Proceedings of
CVPR, 2003; object detection, C. Papageorgiou, M. Oren, and T.
Poggio, "A general framework for object detection," Proceedings of
ICCV, 1998; and object tracking, D. Comaniciu, V. Ramesh, and P.
Meer, "Real-time tracking of nonrigid objects using mean shift,"
Proceedings of CVPR, 2000.
[0006] A face detector is described by P. Viola and M. Jones,
"Robust real-time face detection", Proceedings of ICCV, page II:
747, 2001. As described by Viola et al., it is possible determine
the sum of the intensity values within rectangular windows scanned
over an image in linear time without repeating the summation
operator for each possible window. For each rectangular sum, a
constant number of operations is required to determine the sums
over distinct rectangles multiple times. This defines a cumulative
or integral intensity image, where each pixel holds the sum of all
values to the left of and above the pixel including the value of
the pixel itself. The integral intensity image can be determined
for the entire image with only four arithmetic operations per
pixel. One starts the scan with the window in the top left corner
pixel of the image, going first to the right and then down. A
function determines the value of the current pixel in the integral
image to be the sum of all pixel intensities above and to the left
of the current pixel minus the pixel values to the upper left. The
sum of an image function in a rectangle can be determined with
another four arithmetic operations with appropriate modifications
at the border. Thus, with a linear amount of operations, the sum of
the image functions over any rectangle can be determined in linear
time to construct the integral image.
[0007] Unfortunately, it is time consuming to extract and search
conventional histograms. Only an exhaustive search can provide a
global optimum. Sub-optimal searches, such as a gradient descent
and application specific constraints can accelerate the search.
However, computer vision applications that rely on the optimal
solutions, such as object detection and tracking, demand a
theoretical breakthrough in histogram extraction.
[0008] Conventionally, an exhaustive search is required to measure
all distances between a particular histogram and histograms of all
possible target regions. This process requires generation of
histograms for the regions centered at every possible point, e.g.,
pixels. In cases where the search is performed at different scales,
i.e., different target region scale (sizes), the process is
repeated as many times as the number of scales.
[0009] FIG. 2 shows the pseudocode 200 of a conventional histogram
search.
[0010] Up to now, this conventional approach is the only known
solution that guarantees finding a global optimum for a
histogram-based search.
[0011] It is desired to improve the speed of histogram extraction
and searching histograms by several orders of magnitude.
SUMMARY OF THE INVENTION
[0012] The invention provides a method for extracting integral
histograms from possible target regions in a Cartesian sampled data
space, and to search the extracted histograms.
[0013] The invention exploits the spatial arrangement of data
points, and recursively propagates an aggregated histogram by
starting from an origin and scanning through the remaining points
along a predetermined scan-line. The histogram of a rectangular
region is determined by intersecting the integral histogram at four
corner points. At each step, a single bin is updated using the
values of the integral histogram at the previously processed
neighboring points. After the integral histogram is propagated, the
histogram of any target region can be constructed using just a
small number of simple arithmetic operations.
[0014] The method according to the invention has three distinct
advantages. The method is extremely fast when compared to
conventional approaches. The method can employ an exhaustive search
process in real-time, which has been impractical up to now for most
complex vision applications. The method can be extended to higher
data dimensions, uniform and non-uniform bin formations, and
multiple target scales without sacrificing its advantages. The
method also enables a description of higher-level histogram
features that enable integration of spatial information within the
histogram.
[0015] Numerical analysis with different number of bins, data
dimensions, and data structures proves that the integral histogram
method according to the invention drastically decreases the number
of required operations.
[0016] The method can be used to detect objects in a video in
real-time.
BRIEF DESCRIPTION OF THE DRAWINGS
[0017] FIG. 1 is a prior art histogram;
[0018] FIG. 2 is a pseudocode of a prior art histogram extraction
method;
[0019] FIG. 3 is a pseudocode for a histogram extraction method
according to the invention;
[0020] FIG. 4 is a block diagram of a scan line through pixels in
an image;
[0021] FIG. 5 is a block diagram of a recursive integral histogram
construction;
[0022] FIG. 6 is a block diagram of a recursive integral histogram
construction;
[0023] FIG. 7 is a diagram of mapping a target traffic sign in an
input image to a similar similarity map;
[0024] FIG. 8 is a diagram of mapping textures in an input image to
similarity maps;
[0025] FIG. 9 compares object tracking in videos using conventional
mean-shift object tracking and tracking with the integral histogram
method according to the invention;
[0026] FIG. 10 is a block diagram of spatial combinations of
integral histograms according to the invention;
[0027] FIG. 11 is a block diagram of hierarchical combinations of
integral histograms according to the invention; and
[0028] FIG. 12 is a block diagram of model-based combinations of
integral histograms according to the invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
[0029] Integral Histogram
[0030] An integral histogram according to the invention is
extracted from sampled data by a recursive propagation method. The
method works in Cartesian spaces. The method can be extended into
any dimensional data space and any tensor representations.
[0031] A function f such as x.fwdarw.f(x), is a defined in a
d-dimensional real valued Cartesian space R.sup.d, where
x=[x.sub.1, . . . , x.sub.d] are sample points in the space. The
functionfmaps to a k-dimensional tensor, i.e., f(x)=[g.sub.1, . . .
, g.sub.k]. The d-dimensional data space is bounded within a range
N.sub.1, . . . , N.sub.d, i.e.,
0.ltoreq.x.sub.i.ltoreq.N.sub.i.
[0032] An integral histogram H(x, b) is defined along a scanline of
points x.sub.0, x.sub.1, . . . , such that: H .function. ( x , b )
= p = 0 x .times. Q .function. ( f .function. ( p ) ) , ( 1 )
##EQU1## where Q(.) gives the corresponding bin of a current point,
and U is the union operator that is defined as follows.
[0033] The value of the bin b of the histogram H(x, b) is equal to
the sum of bin values of previously scanned points of the
histogram, i.e., the sum of all Q(f(p)), while p<x. In other
words, H(x, b) is the histogram of a larger Cartesian region
`between` the origin and a current point, and
0.ltoreq.p.sub.1.ltoreq.x.sub.1, 0.ltoreq.p.sub.2.ltoreq.x.sub.2, .
. . , etc.
[0034] Note that H(N, b) is equal to the histogram of all data
points in the space, because N=[N.sub.1, . . . , N.sub.d] is the
boundary of the space.
[0035] Therefore, the integral histogram can be obtained
recursively as: H(x)=H(x-1).orgate.Q(f(x)) (2) using the initial
condition H(0)=0, i.e., all of the bins of the histogram are empty
initially.
[0036] Then, the histogram of a target region T=[p.sup.-, p.sup.+],
where p.sup.-<p.sup.+ determines the propagated integral
histogram values at the bounding points of the region as: h
.function. ( T , b ) = H .function. ( p + , b ) - i .noteq. j d
.times. H .function. ( [ p i - , p j + ] , b ) + ( d - 1 ) .times.
H .function. ( p - , b ) , ( 3 ) ##EQU2## which becomes h(T,
b)=H(p.sup.+.sub.1, p.sup.+.sub.2, b)-H(p.sup.+.sub.1,
p.sup.-.sub.2, b)+H(p.sup.-.sub.1, p.sup.-.sub.2, b) for a 2-D data
set. Note that the region is bounded by
p.sup.-.sub.1.ltoreq.x.sub.1.ltoreq.p.sup.+.sub.1, . . . ,
p.sup.-.sub.d.ltoreq.x.sub.N.ltoreq.p.sup.+.sub.d.
[0037] As opposed to conventional histogram construction, the
integral histogram method according to the invention does not
repeat the histogram extraction for each possible region.
[0038] FIG. 3 shows the pseudocode 300 of a method for extracting
an integral histogram from sampled data according to the
invention.
[0039] For each possible point, and for each target point, get the
current value, find the bin, and increase the bin value. Then, for
each possible scale, for each possible point, and for each bin,
compute the intersection with previous bins, normalize, and compute
distances between histograms.
[0040] In case of a 2-D data, e.g., a N.sub.1.times.N.sub.2 gray
level image, the parameters. are d=2, k=1, and the data space is
limited within the vertical and horizontal image sizes N.sub.1,
N.sub.2.
[0041] As shown in FIG. 4, the scanline 401 for the image 402 can
be assigned to pixels 403 in a left to right, and top to bottom
order.
[0042] As shown in FIGS. 5 and 6, the recursion can be expressed
as: H(x.sub.1, x.sub.2, b)=H(x.sub.1-1, x.sub.2, b)+H(x.sub.1,
x.sub.2-1, b)-H(x.sub.1-1, x.sub.2-1, b)+Q(f(x.sub.1, x.sub.2)) (4)
for all b=1, . . . , B.
[0043] This propagation assigns the histogram bins of the current
point by using an intersection of the bins of the three previous
histograms 501-503, to the left, top, and upper-left, respectively,
with an increment of the value of the bin that belongs to the
current data point (pixel) I(x, y) 504.
[0044] The following sections analyze the computational cost of
extracting the integral histogram when compared with conventional
histograms. The analysis is included because the analysis
dramatically shows that the integral histogram method is extremely
superior to conventional histogram extraction methods.
[0045] Integer Data
[0046] In this case, the sampled input data is a d-dimensional
array, wherein the range of values for each dimension is N.sub.d
with associated k-dimensional tensors. The histograms are
k-dimensional with B identical bins for each dimension. The bin
size is an integer number. Furthermore, a target window for the
histogram corresponds to a size of a target object, M.sub.1.times.
. . . .times.M.sub.d.
[0047] Conventional histogram matching algorithm requires 7d-3+k
operations to determine current values in the d-dimensional input
tensor, 75 k operations to determine the corresponding bin indices,
and one operation to increase the bin value. Bin indices can be
determined by a floating-point multiplication and then
float-to-integer conversion. The cost of this option, (109 k), is
higher than the division itself (75 k). After all the
M.sub.1.times. . . . .times.M.sub.d points in the target window
have been processed, the histogram bins are normalized with the
number of points, which requires B.sup.k floating point
multiplications, thus 4 B.sup.k operations in terms of the relative
cost. These operations are repeated for each of the N.sub.1.times.
. . . .times.N.sub.d histograms matches. [ ( 7 .times. d + 76
.times. k - 2 ) .times. j d .times. M j + 4 .times. B k ] .times. j
d .times. N j . ( 5 ) ##EQU3##
[0048] Note that, for different window size combinations M.sub.s=1,
. . . , S.sub.s, where S.sub.s represents a maximum size of the
range for the dimension s, the above process is repeated so that
the total number of operations for the conventional method is [ ( 7
.times. d + 76 .times. k - 2 ) .times. j d .times. M j + 4 .times.
B k ] .times. i d .times. N j .times. s d .times. S j . ( 6 )
##EQU4##
[0049] The number of operations required for propagation with the
integral histogram is 3(7 k-3)+2 k=23 k-9, in addition to the cost
of getting the current value of the tensor values (7 d-3+k),
finding the indices of the corresponding bin (75 k), and
accumulating the obtained bin value, which is repeated for all
points in the data space.
[0050] Then, the number of operations required to extract the
histograms is [ ( 7 .times. d + 99 .times. k - 11 ) ] .times. i d
.times. N i . ##EQU5##
[0051] The histogram intersection uses 4(7 k-3)+3 k=31 k-12
operations. Normalizing the result uses B.sup.k floating point
divisions, i.e., 4 B.sup.k operations, for each histogram. Then,
the cost of all N.sub.1.times. . . . .times.N.sub.d histograms and
all possible search window dimension matches is only [ 7 .times. d
+ 99 .times. k - 11 + ( 31 .times. k - 12 + 4 .times. B k ) .times.
s d .times. S s ] .times. i d .times. N i . ( 7 ) ##EQU6##
[0052] A ratio of the computational load of the conventional
approach versus the integral histogram method is r = [ ( 7 .times.
d + 76 .times. k - 2 ) .times. j d .times. M j + 4 .times. B k ]
.times. s d .times. S s 7 .times. d + 99 .times. k - 11 + ( 31
.times. k - 12 + 4 .times. B k ) .times. s d .times. S s . ( 8 )
##EQU7##
[0053] Floating Point Data
[0054] Floating point data increases the number of operations for
each division for each point from 75 k to 100 k. The bin value
increment cost becomes four, which was one before. The total cost
for the conventional approach becomes: [ ( 7 .times. d + 101
.times. k + 1 ) .times. j d .times. M j + 4 .times. B k ] .times. i
d .times. N i .times. s d .times. S s . ( 9 ) ##EQU8##
[0055] For the integral histogram method, the complexity of the
step for finding bin indices increases to 100 k. In the propagation
stage, the cost of additions increases from 2 k to 8 k. In the
intersection computation, the cost becomes: [ 7 .times. d + 130
.times. k - 11 .times. ( 40 .times. k - 12 + 4 .times. B k )
.times. s d .times. S s ] .times. i d .times. N i . ( 10 )
##EQU9##
[0056] Power-of-Two Bin Sizes
[0057] Note that optimization is possible by using a bin size that
is a power of two. Using bit-wise shift operator, a division
operator can be achieved at a fraction of the cost. For instance,
instead of dividing by 64, the number can be shifted six bits to
the right. The computation of the bin indices drops from 75 k to 2
k, on average and depending the number of bit shifts. Then, the
total number of operations for integer data using the conventional
approach becomes [ ( 7 .times. d + 3 .times. k - 2 ) .times. j d
.times. M j + 4 .times. B k ] .times. i d .times. N i .times. s d
.times. S s . ( 11 ) ##EQU10##
[0058] For the integral histogram with bin sizes that are powers of
two, the total cost drops to [ 31 .times. k + 7 .times. d + 1 + (
43 .times. k + 1 + 100 .times. B k ) .times. s d .times. S s ]
.times. i d .times. N i . ( 12 ) ##EQU11##
[0059] Matching Without Normalization
[0060] For some applications, the target object is searched in its
original size without scaling, or with scaling factors of half
sizes that correspond to down-sampling by powers of two, i.e., half
size, quarter size, etc. In such cases, further computational
reduction is possible because no histogram normalization is needed
for the same size matches, and significant reduction is achieved
for smaller half-sizes because the division can be done using
bit-wise right shift operator. For a scaling factor of 2.sup.-s,
where s=0 stand for no scaling, s.ltoreq.1 for downsizing, the
necessary computations of the conventional approach with integer
data becomes [ ( 7 .times. d + 35 .times. k + 4 ) .times. j d
.times. M j + 5 .times. .times. ( 1 - .delta. .times. .times. ( s )
) .times. B k ] .times. i d .times. N i . ( 13 ) ##EQU12##
[0061] The cost for the integral histogram becomes [ 7 .times. d +
26 .times. k - 11 + ( 31 .times. k - 12 + 4 .times. B k ) .times. s
d .times. S s ] .times. i d .times. N i . ( 14 ) ##EQU13##
[0062] Note that, in addition to the above costs, the conventional
approach has another important disadvantage. After each
computation, the histogram array values are initialized anew. This
creates additional overhead.
[0063] Applications
[0064] Time Series Data
[0065] For 1D-sampled data, such as a time series of an audio
signal with a length M and a histogram having a total bin number B,
and a target size range up to S data points, the parameters of the
above analysis become d=1 and k=1. The ratio becomes r 1 = ( 81
.times. M + 4 .times. B ) .times. S 95 + ( 19 + 4 .times. B )
.times. S . ( 15 ) ##EQU14##
[0066] Surprisingly, the integral histogram improves the processing
time of time series sampled data up to the 3.5.times.10.sup.4 times
over the conventional method. For instance, a common task that
requires searching time series data that contains 10.sup.4 points
with a 32-bins histogram is 3,347 times faster than the
conventional method.
[0067] Gray Level Images
[0068] For a M.sub.1.times.M.sub.2 gray level image and a search
window size range S.sub.1, S.sub.2, the parameters of the above
analysis become d=2 and k=1, and the ratio is r 2 = [ 88 .times. M
1 .times. M 2 + 4 .times. B ] .times. S 1 .times. S 2 102 + ( 50 +
4 .times. B ) .times. S 1 .times. S 2 . ( 16 ) ##EQU15##
[0069] Two-dimensional data is very common in vision applications
that use gray-level surveillance videos and monochrome aerial
imagery. For example, the problem is to find a 64.times.64 target
pattern at three different hierarchical resolutions, e.g.,
64.times.64, 32.times.32, and 16.times.16, using a 16-bins
histogram. The method according to the invention finds the target
pattern 2,435 times faster. With other optimizations as described
above, the entire process can speed up by a factor of
6.times.10.sup.4 compared to the conventional method.
[0070] Color Images
[0071] For a color image with a 3D histogram, where each point has
three color values in a tensor form, the parameters become d=2 and
k=3. If the search is with a template window size of S.sub.1,
S.sub.2 in image dimensions, the ratio is: r 3 = [ 240 .times. M 1
.times. M 2 + 4 .times. B 3 ] .times. S 1 .times. S 2 300 + ( 81 +
4 .times. B 3 ) .times. S 1 .times. S 2 . ( 17 ) ##EQU16##
[0072] Even for a regular model matching task that searches a
100.times.100 object models at twenty scales using histograms for
each color channel coded in four bits, i.e., sixteen bins, the
process is accelerated 146 times. The savings can go up to
7.times.10.sup.5 depending on the number of bins and target
size.
[0073] Volumetric Data
[0074] For volumetric data, the parameters are d=3 and k=1.
Searching in higher dimensional spaces is essential in feature
selection and classification problems. The corresponding ratio is r
4 = [ 95 .times. M 1 .times. M 2 .times. M 3 + 4 .times. B ]
.times. S 1 .times. S 2 .times. S 3 109 + ( 81 + 4 .times. B )
.times. S 1 .times. S 2 .times. S 3 . ( 18 ) ##EQU17##
[0075] The integral histogram method becomes much more advantageous
in higher dimensions. The savings can reach up to
15.times.10.sup.7. For searching a
10.sup.3.times.10.sup.3.times.10.sup.3 target volume searched in
its original size (S=1) using a 100-bins histogram, the invention
can achieve an amazing 1.6.times.10.sup.8 times improvement.
[0076] Object Detection
[0077] As shown in FIG. 7, an object detection application takes as
input an image 701. The target is a traffic sign 702. The search
for the target object uses a 15-bins color histogram for each
channel. The integral histogram is used to construct a similar
similarity map 703, which is similar to a conventional similarity
map. However, the integral histogram method runs in 63 msecs, while
the conventional method requires two minutes on a conventional 3.2
Ghz processor, an almost 2000 times improvement.
[0078] Texture Detection
[0079] As shown in FIG. 8, the integral histogram method can also
be used for a texture detection application. This application takes
as input an image of textures 801. The task is to detect textures
802 and 803. The detected textures are shown in the corresponding
similarity maps 804 and 805. The integral histogram has 24-bins
histogram of gradient orientations. The integral histogram method
takes 88 msecs and the conventional method requires more than five
minutes of processing time, an increase by a factor of 3400. Note
that even such a simple histogram provides sufficient information
for texture segmentation. It is also possible to combine histograms
to define higher level features such as Haar wavelets.
[0080] We determine pixel-wise texture features and construct
tensors for image data. Each tensor is a vector that includes
corresponding texture components such as gradient magnitude,
orientation, color, and edge, as well as other image filter
responses such as Gabor filters, discrete Fourier, and cosine
transform coefficients.
[0081] We determine a histogram of texture using the tensors. For
instance, if we have k different texture components, then our
tensor is a [1.times.k] tensor. Each element in the tensor
indicates the value of the corresponding texture feature for the
current pixel. We also specify quantization steps for each element
in the sensor such as K.sub.1, K.sub.2, . . . , K.sub.d for the
1.sup.st, 2.sup.nd, . . . , d.sup.th features, where d can be a
large positive integer. The integral histogram then is
K.sub.1.times.K.sub.2.times. . . . .times.K.sub.d. The construction
of such a higher dimensional histogram for each target region/data
range using the conventional method requires exponential time,
which is prohibitive for most texture detection applications.
However, integral histogram method provides higher reduction in
computational load especially as the dimensionality of the data
increases.
[0082] Higher Level Features
[0083] As shown in FIGS. 10-12, higher level features can easily be
determined using spatial, hierarchical, and model-based
combinations of the integral histogram according to the
invention.
[0084] FIG. 10 shows a spatial combination of histograms h.sub.1,
h.sub.2, h.sub.3, and h.sub.4 around a center point p 1010. The
combined histogram h(p) 1020 is given by
(h.sub.1-h.sub.2+h.sub.3-h.sub.4). The higher level features are
constructed by summation or subtraction of corresponding histograms
of regions in an image. FIG. 11 shows a hierarchical combination of
histograms h.sub.1, h.sub.2, and h.sub.3 centered at a point p
1105. Instead of constructing a histogram on a single scale, the
higher level features are constructed from multiple histograms
within different cocentric regions. The histograms are combined to
form an aggregated histogram according to h(p) 1110 which is
h.sub.1.orgate.h.sub.2.orgate.h.sub.3, to capture multi-scale
properties of the underlying data distribution.
[0085] FIG. 12 shows a model-based combination of histograms for a
face that includes hair color 1201, eye color 1202, face texture
1203, skin color 1204, and lip color 1205. The combined histogram
h(p) 1210 is .orgate..sub.ih.sub.i.
[0086] These combinations of enable integration of spatial
information as well as the distribution of the data values.
[0087] Tracking Examples
[0088] FIG. 9 compares the integral histogram method with a
conventional histogram method for tracking objects in an input
video, e.g., a pedestrian in a street scene. The input is a
sequence of frames 901. The sequence 902 shows the result of
conventional mean-shift tracking. Note that the conventional method
fails to track the pedestrian, and instead tracks a stationary
shadow. The sequence 903 shows the tracked object, as correctly
tracked in the sequence 904 using the integral histogram
method.
[0089] After initialization of an object, the color histogram
similarity scores between the original histogram and the histograms
of the object windows centered on every pixels are determined. Note
that, such a similarity determination is very slow using the
conventional method. The integral histogram method is compared with
a gradient descent based method known as mean-shift, see Comaniciu
et al., above.
[0090] The mean-shift method evaluates the histogram similarity, in
most cases using a Bhattacharya distance, only within its original
kernel, that is the window of the object. The mean-shift iterations
uses 16-bins histograms for each color channel. However, the
mean-shift only evaluates similarity within a limited search
region. As a result, for the cases in which object relocation is
large between frames, and there is no overlap between the object
windows in the consecutive frames, it is bound to fail, as shown in
the sequence 902.
[0091] The integral histogram enables one to determine similarities
over the entire image plane in a relatively constant small amount
of time, e.g., 55 msecs. Thus, with the integral histogram method
it is now possible to track the objects accurately at high frame
rates.
EFFECT OF THE INVENTION
[0092] The invention provides a computationally efficient method
for extracting and searching histograms of all possible regions in
a Cartesian space. The integral histogram provides an optimum and
complete solution for histogram-based applications.
[0093] The integral histogram method can expedite the search
process more than thousands of times in comparison to conventional
methods.
[0094] The method can be extended to any dimensional data space and
any tensor representations.
[0095] In addition, the method enables the construction of advanced
histogram features for further feature selection and classification
purposes.
[0096] Many computer vision applications, such as video object
detection and tracking, where the real-time requirement have been a
bottleneck up to now, can benefit from the integral histogram
method.
[0097] Although the invention has been described by way of examples
of preferred embodiments, it is to be understood that various other
adaptations and modifications may be made within the spirit and
scope of the invention. Therefore, it is the object of the appended
claims to cover all such variations and modifications as come
within the true spirit and scope of the invention.
* * * * *