U.S. patent application number 13/309551 was filed with the patent office on 2012-10-04 for image segmentation for distributed target tracking and scene analysis.
This patent application is currently assigned to THE TRUSTEES OF THE UNIVERSITY OF PENNSYLVANIA. Invention is credited to Camillo Jose Taylor.
Application Number | 20120250984 13/309551 |
Document ID | / |
Family ID | 46926741 |
Filed Date | 2012-10-04 |
United States Patent
Application |
20120250984 |
Kind Code |
A1 |
Taylor; Camillo Jose |
October 4, 2012 |
IMAGE SEGMENTATION FOR DISTRIBUTED TARGET TRACKING AND SCENE
ANALYSIS
Abstract
Pixels in a feature space can be divided using a hashing method.
Random planes are selected within a feature space and a pixel's
relationship to each plane determines a bit of a hash code.
Clusters of pixels can be identified by local maxima in the hash
cells in the feature space. Nearby pixels in the feature space can
be further assigned to these local maxima based on hamming
distance. An image can be segmented by observing adjacent pixels
sharing a common hash code.
Inventors: |
Taylor; Camillo Jose;
(Philadelphia, PA) |
Assignee: |
THE TRUSTEES OF THE UNIVERSITY OF
PENNSYLVANIA
Philadelphia
PA
|
Family ID: |
46926741 |
Appl. No.: |
13/309551 |
Filed: |
December 1, 2011 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61418799 |
Dec 1, 2010 |
|
|
|
61418789 |
Dec 1, 2010 |
|
|
|
61418805 |
Dec 1, 2010 |
|
|
|
Current U.S.
Class: |
382/162 ;
382/173 |
Current CPC
Class: |
G06K 9/00771 20130101;
G06T 7/11 20170101; G06T 2207/30241 20130101; G06T 7/292 20170101;
H04N 7/181 20130101 |
Class at
Publication: |
382/162 ;
382/173 |
International
Class: |
G06K 9/00 20060101
G06K009/00; G06K 9/34 20060101 G06K009/34 |
Claims
1. A method for identifying salient segments in an image comprising
the steps of: choosing a plurality of planes that divide a feature
space into a plurality of cells; generating a first set of hash
codes for a first set of pixels in an image based on a location in
the feature space of a feature vector associated with each pixel in
the first set, whereby the location of each feature vector relative
to each of the plurality of planes contributes a binary value to
each hash code; determining a second set of hash codes selected
from the first set of hash codes, wherein the second set of hash
codes indicates local maxima of clusters of pixels; assigning each
of the first set of pixels to one of the second set of hash codes
based on the hamming distance between a first hash code from the
first set of hash codes assigned to each pixel and each of the
second set of hash codes; and identifying segments in an image
based on groups of adjacent pixels sharing common hash codes.
2. The method of claim 1, wherein the step of choosing a plurality
of planes further comprises selecting a predetermined number of
random planes in the feature space.
3. The method of claim 1, wherein the feature space is a
three-dimensional color space.
4. The method of claim 1, wherein the step of choosing a plurality
of planes further comprises training the plurality of planes based
on human interpretations of images.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims priority to provisional
patent applications 61/418,789, 61/418,805, and 61/418,799 which
are incorporated by reference in their entirety.
[0002] The present application relates to co-pending patent
applications entitled "Scene Analysis Using Image and Range Data"
and "Distributed Target Tracking Using Self Localizing Smart Camera
Networks Technology Field" both of which are incorporated by
reference in their entirety and filed on the same day as the
present application entitled "Image Segmentation for Distributed
Target Tracking and Scene Analysis."
TECHNOLOGY FIELD
[0003] The present invention relates generally to machine vision
systems and methods and specifically to image segmentation to
determine salient features in an image. The present invention
relates generally to machine vision systems and methods and
specifically to object tracking and scene analysis for distributed
or mobile applications.
BACKGROUND
[0004] Segmentation is a method of breaking an image into coherent
regions and is a common problem in Computer Vision. Many known
methods of segmentation are computationally intensive, making them
unsuitable for fast, low power, or low cost applications, such as
for use with distributed or mobile devices. However, there is a
need for an algorithm that is more amenable to real-time
implementation.
[0005] To date, most of the approaches that have been developed to
tackle the segmentation problem can be broadly divided into two
groups. The first group consists of algorithms that view the image
as a graph and use various metrics to measure the difference in
appearance between neighboring pixels or regions. Once the problem
has been formulated in this way, the algorithms center on the
problem of dividing this graph into pieces so as to maximize
coherence. The Normalized Cut algorithm developed by Shi and Malik
proceeds by recasting the graph segmentation problem in terms of a
spectral analysis. (See Jianbo Shi and Jitendra Malik, "Normalized
cuts and image segmentation," IEEE Transactions on Pattern Analysis
and Machine Intelligence, 22:888-905, 1997.) This approach involves
computing the distance between the pixels in the image and then
solving a series of large but sparse eigenvector problems.
[0006] Felzenszswalb and Huttenlocher proposed an efficient
approach to grouping pixels in an image by making use of a spanning
tree and showed that locally greedy grouping decisions can yield
plausible results. (Pedro F. Felzenszwalb and Daniel P.
Huttenlocher, "Efficient graph-based image segmentation," Int. J.
Comput. Vision, 59(2): 167-181, 2004. ISSN 0920-5691.) This
approach also revolves around the computation of multiple pairwise
distance values. There remains a need for avoiding the
computational costs associated with distance computation.
[0007] Another broad class of segmentation schemes are termed
feature based methods because these proceed by associating a
feature vector with each pixel in the image. (See Dorin Comaniciu,
Peter Meer, and Senior Member, "Mean shift: A robust approach
toward feature space analysis," IEEE Transactions on Pattern
Analysis and Machine Intelligence, 24:603-619, 2002; W Y Ma and B.
S. Manjunath, "Texture features and learning similarity," Computer
Vision and Pattern Recognition, IEEE Computer Society Conference
on, 0:425, 1996, ISSN 1063-6919. doi:
http://doi.ieeecomputersociety.org/10.1109/CVPR.1996.517107; and
Eduard Vazquez, Joost Weijer and Ramon Baldrich, "Image
segmentation in the presence of shadows and highlights," In ECCV
'08: "Proceedings of the 10th European Conference on Computer
Vision," pages 1-14, Berlin, Heidelberg, 2008, Springer-Verlag.
ISBN 978-3-540-88692-1. doi:
http://dx.doi.org/10.1007/978-3-540-88693-8.sub.--1.)
[0008] Another clustering method is the k-means algorithm, which
seeks to divide the population into k-clusters using an Expectation
Maximization approach. (See Morten Rufus Blas, Motilal Agrawal,
Aravind Sundaresan, and Kurt Konolige, "Fast color/texture
segmentation for outdoor robots." In IROS, pages 4078-1085, 2008.
K-means clustering is a method of cluster analysis which aims to
partition n observations into k clusters in which each observation
belongs to the cluster with the nearest mean. One issue that one
needs to be addressed in applying this algorithm to segmentation
problems is the question of choosing an appropriate value for k,
which is typically not known beforehand. A second issue is the fact
that the k-means scheme involves repeated rounds of distance
computations. This means that the computational complexity grows
with the number of pixels, the dimension of the feature space and
the number of clusters. K-means clustering is typically considered
an NP-hard problem. While some approaches have been proposed to
mitigate this problem including the method developed by Elkan,
which seeks to accelerate the process by invoking the triangle
inequality, and Locality Sensitive Hashing schemes that search for
near neighbors in the feature space, these have been unable to
fully mitigate the distance computations required. (See Charles
Elkan, "Using the triangle inequality to accelerate k-means," In
International Conference on Machine Learning, 2003; Piotr Indyk and
Rajeev Motwani. "Approximate nearest neighbors: towards removing
the curse of dimensionality," In STOC '98: Proceedings of the
thirtieth annual ACM symposium on Theory of computing, pages
604-613, New York, N.Y., USA, 1998. ACM. ISBN 0-89791-962-9. doi:
http://doi.acm.org/10.1145/276698.276876.)
[0009] There have also been attempts to apply Mean Shift
segmentation algorithm to subdivide color images into regions.
(Dorin Comaniciu, Peter Meer and Senior Member, "Mean shift: A
robust approach toward feature space analysis," IEEE Transactions
on Pattern Analysis and Machine Intelligence, 24:603-619, 2002.)
This feature based approach proceeds by searching for modes of the
distribution in the feature space using a Parzen Window based
approach (sometimes referred to as the Parzen-Rosenblatt window,
which provides a well known non-parametric way of estimating the
probability density function of a random variable). The method
involves tracing the paths of various feature vectors as they
evolve under the mean shift rule. This non-parameteric estimation
scheme can be very time consuming which makes it less useful in
situations where real time response is desired. The Parzen Window
density estimation scheme employed in this approach also limits the
dimension of the feature spaces to which it can be applied
effectively. In contrast the method proposed in this application
can be applied to arbitrary feature spaces and has been implemented
in real time on modest hardware.
[0010] When labeled image data is available, algorithms that learn
how to classify pixels and segment images have also been proposed.
(See J. Shotton, M. Johnson, and R. Cipolla, "Semantic text on
forests for image categorization and segmentation," In Computer
Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference
on, pages 1-8, 2008; Michael Maire, Pablo Arbelaez, Charles
Fowlkes, and Jitendra Malik, "Using contours to detect and localize
junctions in natural images," In Computer Vision and Pattern
Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1-8, June
2008.) These approaches rely on leveraging the training data to
associate semantic labels with pixels and segments.
[0011] With the emergence of low-cost, ubiquitous cameras, such as
cameras, there exists a need for developing image segmentation
methods that can be reliably used with low-cost computation, which
can handle large volumes of image data. Similarly, it is desirable
for a segmentation method to be operated without the need for
structured training data, which may not be readily available for
use with low-cost or large volume cameras and processors.
[0012] When labeled image data is available, algorithms that learn
how to classify pixels and segment images have also been proposed.
(See J. Shotton, M. Johnson, and R. Cipolla, "Semantic text on
forests for image categorization and segmentation," In Computer
Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference
on, pages 1-8, 2008; Michael Maire, Pablo Arbelaez, Charles
Fowlkes, and Jitendra Malik, "Using contours to detect and localize
junctions in natural images," In Computer Vision and Pattern
Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1-8, June
2008.) These approaches rely on leveraging the training data to
associate semantic labels with pixels and segments.
SUMMARY OF THE INVENTION
[0013] Embodiments of the present invention address and overcome
one or more of the above shortcomings and drawbacks, by providing
devices, systems, and methods for segmenting images. This
technology is particularly well-suited for, but by no means limited
to, real-time segmentation of images for identifying salient
features of an image.
[0014] Embodiments of the present invention are directed to a
method for identifying salient segments in an image comprising the
steps of choosing a plurality of planes that divide a feature space
into a plurality of cells, generating a first set of hash codes for
a first set of pixels in an image based on a location in the
feature space of a feature vector associated with each pixel in the
first set whereby the location of each feature vector relative to
each of the plurality of planes contributes a binary value to each
hash code, determining a second set of hash codes selected from the
first set of hash codes, wherein the second set of hash codes
indicates local maxima of clusters of pixels, assigning each of the
first set of pixels to one of the second set of hash codes based on
the hamming distance between a first hash code from the first set
of hash codes assigned to each pixel and each of the second set of
hash codes, and identifying segments in an image based on groups of
adjacent pixels sharing common hash codes. The step of choosing a
plurality of planes may further comprise selecting a predetermined
number of random planes in the feature space. The feature space may
be a three-dimensional color space. The step of choosing a
plurality of planes may further comprise training the plurality of
planes based on human interpretations of images.
[0015] Additional features and advantages of the invention will be
made apparent from the following detailed description of
illustrative embodiments that proceeds with reference to the
accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] The patent or application file contains at least one drawing
executed in color. Copies of this patent or patent application
publication with color drawing(s) will be provided by the Office
upon request and payment of the necessary fee.
[0017] The foregoing and other aspects of the present invention are
best understood from the following detailed description when read
in connection with the accompanying drawings. For the purpose of
illustrating the invention, there is shown in the drawings
embodiments that are presently preferred, it being understood,
however, that the invention is not limited to the specific
instrumentalities disclosed. Included in the drawings are the
following Figures:
[0018] FIG. 1A is a two dimensional view of a simplified 2D version
of the randomized hashing scheme in a feature space fractured into
regions by a set of randomly chosen splitting planes Each region is
associated with a hash code indicating where it falls with respect
to the splitting planes;
[0019] FIG. 1B is a four-dimensional view of the relationship
between hash codes and associated with the vertices of a hypercube
in an exemplary embodiment;
[0020] FIG. 2 is a flow chart showing operation of an exemplary
algorithm for use with segmenting images in accordance with some
embodiments of the random hashing scheme;
[0021] FIG. 3 shows two histograms of the distributions of the GCE
and Rand Index metrics over a sample data set as a result of the
application of certain embodiments of the random hash scheme.
[0022] FIG. 3 includes sample images along with the human
segmentation and the machine segmentation.
[0023] FIG. 4 shows a plurality sample images, along with the human
segmentation and the machine segmentation, including both prior art
and an embodiment of the random hashing scheme.
DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
[0024] The segmentation scheme described in this application
employs a feature based approach. Each pixel in the image is
described by a feature vector which encodes a set of properties
used to describe that pixel. Embodiments of the present invention
can employ a simple color descriptor vector, which is an example of
a feature vector, but some embodiments also use more sophisticated
feature vectors such as a histogram of color values or a vector of
texture coefficients. Some embodiments can employ an approach to
segmenting natural images which leverages the idea of randomized
hashing. The procedure aims to replace the problem of finding
clusters in the feature space with the problem of finding local
maxima in a graph whose topology approximates the geometry of the
underlying feature space. In so doing the method can bypass the
computational effort associated with computing distances between
feature vectors which can comprise a significant fraction of the
effort in other techniques such as k-means clustering and mean
shift segmentation.
[0025] The method can be controlled by a few parameters namely, the
number of random splitting planes, n, The Hamming Distance
threshold, k, and the window size that is used to average the color
vectors, w. By adjusting these parameters the algorithm can be made
to produce over segmentations or under segmentations of the input
imagery. Importantly the number of segments that are produced is
implicitly controlled by these parameters rather than explicitly
provided as an input to the algorithm.
[0026] These techniques can be employed in a computing environment
available to a person of ordinary skill in the art, including
performing the prescribed calculations on a PC, embedded processor,
mobile device, cloud computing environment, client-server
environment, DSP, or dedicated hardware circuit capable of
performing the methods disclosed herein.
[0027] Given a set of feature vectors (e.g. pixels having
predetermined properties, where each selected property is a
dimension in a feature space), the goal of the segmentation
procedure is to divide them into a set of clusters which capture
the most salient groupings in the distribution. An example of a
feature space for use with some embodiments includes a 3D space
with orthogonal axes for each color: red, green, and blue (RGB). In
some embodiments, the feature space uses axes for Y, Pb, and Pr. In
some embodiments, more than three dimensions can be used. For
example, other dimensions can include motion vectors, depth, or
other information that can be obtained related to each pixel, so
that each pixel (or group of pixels) can be mapped to the feature
space.
[0028] Entries in this feature vector characterize salient
properties of the region surrounding that pixel such as color (e.g.
RGB, YPbPr, or multispectral properties, such as IR, UV, or
information from a FLIR camera), texture, frequency content, and/or
motion properties. Once all of the pixels have been mapped to the
feature space, the segmentation process is treated as a clustering
problem where the goal can be to identify salient clusters in the
population of feature vectors.
[0029] FIG. 1A depicts a simplified 2D version of the randomized
hashing scheme in a feature space fractured into regions by a set
of randomly chosen splitting planes Each region is associated with
a hash code indicating where it falls with respect to the splitting
planes. The set of all hash codes can be associated with the
vertices of a hypercube as shown in Figure (b) here the shading of
the nodes indicates how many feature vectors are hashed to that
code. The segmentation scheme proceeds by identifying local maxima
in this hash code space.
[0030] This scheme employs a series of randomly chosen splitting
planes. FIGS. 1A and 1B show a simplified view of this procedure in
two dimensions. Here the random splitting planes 0, 1, 2, and 3 are
used to hash the feature vectors into a set of disjoint cells based
on their location.
[0031] On can hash a set of vectors into a set of discrete bins in
order to accelerate the search for nearest neighbors. One can
further leverage the fact that this randomized hashing procedure
tends to preserves locality so points that are near to each other
in the feature space are hashed to the same bin with high
probability. The proposed segmentation scheme leverages these
phenomena to cluster the feature vectors into groups. In FIG. 1A,
the n=4 spitting planes fracture the feature space into a set of
2.sup.n disjoint convex cells each of which corresponds to an n-bit
hash code. It should be understood that each splitting plane can
also be considered a normal vector and a decision point (such as a
mean) in the feature space. By projecting each pixel on the that
normal vector, and determining its position relative to the
decision point, a bit associated with each normal vector can be
assigned.
[0032] More specifically, each sample vector (e.g. pixel or object)
in the feature space v.sub.j is assigned an n-bit hash code where
the ith bit in the code, b.sub.ij is derived from the ith splitting
plane as follows b.sub.ij=(v.sub.ju.sub.i)>S.sub.i where u.sub.i
denotes the normal associated with the ith splitting plane and si
denotes the corresponding splitting value. Neighboring cells in the
feature space differ by a single bit so the Hamming distance
between the codes provides some indication of the distance between
vectors in the feature space. More generally we can construct a
correspondence between the set of all possible hash codes and the
vertices of an n-dimensional hypercube. The topology of the
hypercube 100 reflects the structure of the feature space since
neighboring cells in feature space will correspond to neighboring
vertices in the hypercube. In this example, the shaded nodes, 1001,
0000, and 0111 are those bins that have local maxima clusters in
FIG. 1A.
[0033] For each of the hash codes the clustering procedure can
record how many feature vectors are mapped to that code. In some
embodiments, clusters in feature space will induce population
maxima in the code space. That is, if we consider the hypercube as
a graph we would expect to observe that some of the hash codes have
a greater population than their neighbors. This allows us to
replace the original problem of clustering vectors in the feature
space in favor of the simpler problem of looking for population
maxima in the code space graph.
[0034] For every populated code in the hypercube the algorithm
interrogates all of the codes that differ from the current code by
k bits or fewer. This parameter, k, is referred to as the Hamming
Distance Threshold. If the code under consideration has a
population greater than all of its neighbors it is declared a local
maxima and a cluster center. In this way the number of clusters
recovered by the procedure is determined automatically based on the
data as opposed to being imposed a priori as in k-means. Note that
this scheme can be used to distinguish up to 2.sup.(n-k) local
maxima.
[0035] The normals associated with the splitting planes, u.sub.i,
can be chosen randomly or based on a priori knowledge. The
splitting values, s.sub.i, can be chosen by considering the
distribution of the projected values, (v.sub.iu.sub.i). In some
embodiments, the mean of the distribution, which corresponds to
casting all of the splitting planes through the centroid of the
distribution, is used. In some embodiments, the median value, and
the value midway between the maximum and minimum projected values,
is used as the splitting value (i.e. the decision point for
assigning bit values to pixels projected on the normal vector.
These schemes tend to produce similar results in practice.
[0036] After the local maxima have been identified, each of the
feature vectors is labeled with the hash code of the closest local
maxima based on the Hamming Distance. In the case where a feature
vector is equidistant from two or more local maxima based on
Hamming Distance the Euclidean distance between the feature vector
and the mean cluster vector is used to break the tie and decide the
label. Once each of the pixels has been labeled with the index of
its local maxima, a connected components procedure is run to divide
the image into coherent connected regions.
[0037] The entire scheme is outlined below in pseudo-code. This
algorithm is elaborated in FIG. 2.
[0038] Algorithm 1 Segmentation via Randomized Hashing
[0039] 1: Hash each feature vector to an n-bit code using the n
randomly chosen splitting planes
[0040] 2: Maintain a count of the number of feature vectors mapped
to each hash code
[0041] 3: Identify local maxima in the code space--these are the
cluster centers
[0042] 4: Assign each feature vector to the closest local
maxima
[0043] 5: Run connected components on the labeled pixels to
identify coherent connected components.
[0044] FIG. 2 shows method 300 for segmenting an image. At step 302
a feature space is divided by a predetermined number, n, of
splitting planes. This step can include randomly assigning and
feature planes and using a calibration image or a first image to
the splitting plane for a given orientation. For example, the
position of each plane can be chosen such that feature vectors
(e.g. those feature vectors associated with each pixel in a test
image) within the feature space are evenly divided on either side
of the splitting plane. As discussed above, each splitting plane
can be created by choosing a random normal vector and assigning a
decision point along that vector such that the decision point is at
the mean or median of the distribution of feature vectors (e.g.
pixels projected on that normal vector). This can be done using a
calibration image, the image under test, or the previous image
under test. In some embodiments, step 302 takes into account
predetermined splitting planes that have been created via prior
training or have been manually assigned. At step 302, the splitting
planes can include a combination of random splitting planes and
preassigned splitting planes.
[0045] At step 304, each feature vector image (or a subset of the
feature vectors in image) is hashed using the splitting planes.
This process can be computationally simple as each bit in the hash
simply determines which side of the splitting plane the feature
vector resides. Because only a bit is used for this hash,
additional computational overhead needed for deriving the Euclidean
distance from the splitting claim is not necessary. Step 304 can be
an iterative loop whereby each feature vector is taken and compared
in succession to each of the n splitting planes. It will be
appreciated that massive parallelism and may be possible using the
right processor, such as a DSP or graphics processor, to perform
this step.
[0046] Once each feature vector is hashed into the cells created by
the splitting planes, the algorithm proceeds to step 306. At step
306, the number of feature vectors resident in each cell in the
feature space is counted. At step 308, the population counts for
each cell in the feature space are compared to choose a number of
local maxima. In some embodiments the number of local maxima, M, is
predetermined prior to image processing. In other embodiments the
number of local maxima, M, is derived dynamically from the image
based on the results of the residency counts of each cell in the
feature space. As discussed, the maximum number of local maxima, M,
can be determined based on the number of splitting planes used and
the Hamming distance requirements for clusters used to image
segmentation. Once the local maxima are identified, these can be
used as the center of each cluster used for image segmentation.
[0047] At step 310 each feature vector under test can be assigned
to each of the cluster centers determined in step 308. In some
embodiments, this assignment has low computational is and where. In
overhead because the hash of each feature vector is compared to
each of the cluster centers and the cluster having the nearest
Hamming distance to the hash of the vector is selected as the
cluster to which the feature vector will be assigned. In some
embodiments, ties between competing clusters (i.e. clusters that
are the same Hamming distance away from the hashed feature vector)
can be resolved by estimating the Euclidean distance between the
center of each cluster and the current feature vector. It will be
appreciated that other techniques for resolving conflicts between
equidistant clusters can be used, including assigning each feature
vector to the equidistant cluster that has the least number of
feature vectors currently assigned, or the most number.
[0048] Once each of the feature vectors is assigned to a cluster,
image segments can be identified within the image at step 312. For
example, adjacent (i.e. connected) pixels in the image plane that
have had their feature vectors assigned to the same cluster can be
considered part of the same image segment. Any known technique can
be used, such as minimum threshold sizes for segments of adjacent
pixels. In this way, image segments can be rapidly deduced from the
image by scanning pixels in the image plane and determining whether
they share the same cluster in the feature space.
[0049] In some embodiments, algorithm 300 repeats after step 312,
with the next captured image. In some embodiments, step 314 may
optionally adjust the splitting planes based on the results in step
312. For example, in a training scheme the results of step 312 can
be compared via a consistency score with a desired segmentation
result. Based on this result, the splitting planes can be adjusted
to improve this consistency score. In some embodiments, incremental
improvement of splitting planes is done in real time based on other
criteria, such as reducing the number of identified segments in
step 312 or increasing the number of identified segments in step
312 to reach a desired range of segments. Once the splitting planes
are adjusted the algorithm can return to step 302.
[0050] The proposed scheme can be similar in spirit to the Mean
Shift segmentation algorithm which also seeks to identify modes in
the distribution of feature vectors. Where the mean shift scheme
uses a Parzen Window based scheme to estimate density in feature
space, the proposed scheme uses hashing, which can be randomized or
tailored to apriori information, to identify salient groupings of
feature vectors.
[0051] Like Locality Sensitive Hashing, the segmentation scheme can
make implicit use of the Johnsson-Lindenstrauss theorem which
justifies the use of random projection by bounding the distortion
of the relative distances between the feature vectors induced by
the projection process. Additionally and alternatively, the hashing
scheme may use planes that are predetermined, selected from a
bounded group of planes, estimated or calculated based on some
selected criteria. For example, if a particular color is known to
be important, the hashing scheme can take this into account, by
using at least some predetermined feature planes to create the hash
or introduce a bias into the selection process, which might
otherwise be random. The hashing scheme may also include an
adaptive selection algorithm to learn how to select more
appropriate feature planes in future video frames. This learning
process can be by any scheme known to a person of ordinary skill in
the art, including using human feedback to help train parameters,
genetic algorithms, decision trees, pruning, beam searching, or the
like.
[0052] From a computational perspective the principal effort
revolves around computing the hash codes which involves O(nmN)
operations where n denotes the number of projection directions, m
denotes the dimension of the feature space and N denotes the total
number of pixels or feature vectors. Note that the scheme can avoid
the explicit distance computations between the feature vectors that
one uses in most agglomerative and k-means segmentation schemes in
favor of randomized hashing.
[0053] In searching for the local maxima in the code space one can
simply store the hash code populations in an array with 2n entries.
For each populated hash code the procedure involves interrogating
on the order of (n/k) neighboring codes. For example to run the
local maxima detection algorithm on n=12 dimensions with a Hamming
distance threshold, k=2, one can construct a table with 212=4096
entries and each hash code would have (12/1)+(12/2)=12+66=78
neighbors. Typically many of the hash bins are empty which further
simplifies processing. For larger value of n one could employ a
binary tree data structure to store and query the contents of the
hash table efficiently.
[0054] In some embodiments, the proposed segmentation scheme can be
carried out using training. For example, the Berkeley Segmentation
Database which contains 1633 manual segmentations of 300 color
images that can be used for this purpose. By comparing the results
of random hash segmentation one or more of the images to one or
more of the manual segmentation results, the selected splitting
planes can be improved. For example, training can be used to assist
in selecting the appropriate number of splitting planes or in
selecting specific splitting planes to include. The manual
segmentations provided by the users can be compared with the
segmentations produced by the algorithm using two different
measures, the Global Consistency Error and the Rand Index. The
Global Consistency Error (GCE) developed by Martin, Fowlkes, Tal
and Malik can capture the difference between two segmentations in a
single number between 0 and 1 where lower numbers indicate lower
error. (See David Martin, Charless Fowlkes, Doron Tal, and Jitendra
Malik, "A database of human segmented natural images and its
application to evaluating segmentation algorithms and measuring
ecological statistics," In In Proc. 8th Int'l Conf. Computer
Vision, pages 416-423, 2001.) The measure can be designed such that
if one segmentation is a refinement of the other the score will be
zero. This is a useful feature since it accounts for the fact that
human subjects often choose to segment scenes to various levels,
however, it also implies that machine segmentations that are
strongly over or under segmented can also yield very low GCE scores
which can be misleading.
[0055] To provide a different but related perspective on the
algorithm one can record and report the Rand Index for each
segmentation. This measure is commonly used in statistics to
measure the quality of clustering algorithms. (See Ranjith
Unnikrishnan, Caroline Pantofaru and Martial Hebert, "Toward
objective evaluation of image segmentation algorithms," IEEE
Transactions on Pattern Analysis and Machine Intelligence,
29(6):929-944, 2007. ISSN 0162-8828. doi:
http://doi.ieeecomputersociety.org/10.1109/TPAMI.2007.1046.) In
order to compute the Rand Index one can consider every pair of
pixels in the image and determine whether they are labeled
consistently in the human and machine segmentations. That is, if
the two pixels have the same label in the human segmentation they
should have the same label in the machine segmentation and vice
versa. The Rand Index represents the fraction of the pixel pairs
that are labeled consistently in the two segmentations, values that
are closer to 1 indicate better segmentations. Unlike the GCE the
Rand Index will suffer if the machine segmentation is over or under
segmented with respect to the human segmentation.
[0056] A series of segmentation experiments can be carried out
using feature spaces based on color information. In some
embodiments, the color values can be averaged over a square window
of width w centered around each pixel. Effectively, this averaging
is a preprocessing step that can aid in smoothing noise out of the
image. Increasing the size of this window can increase the level of
smoothing and leads to a coarser segmentation. It will be
appreciated that other filtering schemes can be used to smooth or
preprocess the image before performing image segmentation.
[0057] A first set of experiments that was carried out was designed
to determine how the performance of the segmentation scheme varied
as we varied the color space. Experiments were carried out using
the standard RGB values, the HSV color space, the LAB color space
and a color vector that concatenated the RGV and HSV values into a
six dimensional color vector. These experiments were carried out
using a randomly chosen subset of 150 of the segmentations in the
database. The average GCE and Rand index values are reported. In
all of these experiments the value of n was fixed at 12 the value
of k was fixed at 1 and the value of w was fixed at 3. Table 1
summarizes the results of these experiments and indicates that the
RGBHSV color space offers the best performance with respect to the
two metrics.
TABLE-US-00001 TABLE 1 Results of running the segmentation
procedure using various color spaces Color GCE Rand RGB 0.2805
0.7327 HSV 0.2421 0.7527 LAB 0.2578 0.7351 RGBHSV 0.2014 0.7614
[0058] A second set of experiments explored how the performance of
the scheme varied as we varied the number of splitting planes, n,
and the Hamming Distance Threshold used to find local minima, k.
The experiments were carried out using the HSV color space on the
same subset of 150 segmentations from the database. The mean GCE
and Rand Index values were recorded for every combination of
parameters and the results are summarized in Table 2. In practice
increasing values of the n parameter provide more ways to
distinguish between feature vectors can lead to over segmentation
while increasing the k parameter decreases the number of local
maxima detected in the code space and leads to under
segmentation.
TABLE-US-00002 TABLE 2 Results of running the segmentation
procedure using various values for the n and k parameters n k GCE
Rand Index 8 1 0.1952 0.7632 8 2 0.2511 0.7443 8 3 0.2450 0.7104 12
1 0.1652 0.7535 12 2 0.2250 0.7640 12 3 0.2482 0.7417 16 1 0.1005
0.7438 16 2 0.1670 0.7583 16 3 0.2236 0.7527
[0059] A third set of experiments was carried out to investigate
how the performance of the scheme varied as the window size
parameter, w, was varied. The experiments were carried out using
the HSV color space with the n and k parameters fixed at 12 and 1
respectively. Table 3 contains the results of these trials.
Increasing the value of w leads to increases the level of smoothing
which typically leads to under segmentation.
TABLE-US-00003 TABLE 3 Results of running the segmentation
procedure using various values for the size of the smoothing
window, w in pixels w GCE Rand Index 3 0.1219 0.7479 5 0.1350
0.7494 7 0.1477 0.7513 11 0.1660 0.7520 21 0.1992 0.7537
[0060] A fourth experiment was run to compare the results of the
automated segmentation procedure to each of the 1633 human
segmentations in the database. For this experiment the HSV color
space was employed, the number of splitting planes, n was 12, the
Hamming Distance threshold, k was 2 and the window size, w was 3.
These parameter values were chosen to produce a visually pleasing
over segmentation of the images rather than to optimize the GCE or
Rand Index values. Over the entire database the mean GCE value was
0.2235 and the median GCE value was 0.2157 the mean Rand Index
value was 0.7370 and the median Rand Index was 0.7833.
[0061] FIG. 3 shows histograms of the distributions of the GCE and
Rand Index metrics over the entire data set. The first graph on the
left of FIG. 2 indicates the distribution of the GCE values over
all of the segmentations in the database while the graph on the
right denotes the distribution of the Rand Index values.
[0062] FIG. 4 shows a few of the images from the data set along
with the human segmentation and the machine segmentation. This
figure compares the output of the automated segmentation procedure
to human labeled segmentations. The first and fourth rows contain
the input imagery, the second and fifth rows contain human
segmentations while the third and sixth rows contain machine
segmentations.
[0063] FIG. 5 provides a direct comparison of segmentations
produced by the methods of the present invention with those
produced by the Mean Shift procedure for a few randomly chosen
images in the data set. This figure compares the output of the
proposed segmentation scheme with the results obtained using the
Edison segmentation tool. The first row corresponds to the input
image, the second to a human segmentation, the third to the mean
shift result and the fourth to the randomized hash result. The
parameters used for the Edison tool were
(h.sub.s,h.sub.r,M)=(7,6.5,20) and the parameters used for the
randomized method were (n,k,w)=(12,2,3).
[0064] One advantage of the proposed segmentation scheme is that
the computational effort required scales linearly in the number of
pixels and the operations required are simple and regular. In order
to demonstrate this a real time version of the scheme was
implemented on a Macbook Pro laptop computer. This implementation
was used to segment 640 by 480 video frames at a rate of 10 frames
per second using a single core of an Intel Core 2 Duo processor
running at 2.33 GHz. This rate includes the time taken for all
phases of the algorithm, image acquisition, randomized hashing,
local maxima detection and connected components processing. Since
almost all of the steps in the procedure are embarrassingly
parallel, the algorithm is a well suited to implementation on
modern multi-core processors and GPUs and should be amenable to
further acceleration.
Applications
[0065] The proposed algorithm can be highly parallelizable and can
be implemented in real time on modest hardware. This is an
advantage since it means that the method could be used as a cheap
preprocessing step in a variety of image interpretation
applications much as edge detection is used today. In some
embodiments, the segmentation algorithm is used instead of edge
detection in image processing in a real time environment. This call
allow objects to be identified when coupled with pattern matching
or relative motion to a background.
[0066] The method can be used on a mobile robot to produce a fast,
rough segmentation of the scene into sky ground, road and tree
regions. This mobile robot can use image segmentation as described
herein as well as a ranging mechanism to model its surrounding
environment as described in concurrently filed application titled
"Scene Analysis Using Image And Range Data," by C. J. Taylor, which
is incorporated herein by reference.
[0067] Similarly, the segmentation scheme could be used as part of
the loop in real time tracking applications where it would allow
the system to automatically delineate targets. The lower processing
requirements of hashing could make object detection faster and
cheaper for real-time tracking. For example, if used in a
distributed camera network, the segmentation scheme could be
performed by CPUs on the cameras to detect an object and track it.
A suitable camera network with tracking ability is described in
concurrently filed application titled "Distributed Target Tracking
Using Self Localizing Smart Camera Networks," by C. J. Taylor,
which is incorporated herein by reference.
[0068] In any of these applications, the real time segmentation
scheme being used as a pre-processing step which would suggest
possible groupings in the image to higher level interpretation
algorithms. In this way a person of ordinary skill in the art could
use this segmentation scheme as a preprocessing stage to any
preferred image processing technique suitable for the application.
The system could, for instance, focus its attention on regions
based on their, size, shape, texture or position in the image.
[0069] While the segmentation algorithm has been discussed and
implemented in the context of color descriptors, it could equally
easily be applied to feature spaces with higher dimension and can
involve any combination of features, including both color and
texture values and/or motion vectors.
[0070] It should be readily apparent that the image processing
techniques taught herein are suitable for execution in a computing
environment that includes at least one processor. Suitable
processing environments can include Intel, PowerPC, ARM, or other
CPU-based systems having memory and a processor, but can also
include any suitable embedded systems, DSP, GPU, APU, or other
multi-core processing environment including related hardware and
memory. Similarly, the algorithms taught herein can be implemented
by dedicated logic. Similarly, execution of these algorithm and
techniques is not limited to a single processor environment, and
can, in some contemplated embodiments, be performed in a client
server environment, a cloud computing environment, multicore
environment, multithreaded environment, mobile device or devices,
etc.
[0071] Although the invention has been described with reference to
exemplary embodiments, it is not limited thereto. Those skilled in
the art will appreciate that numerous changes and modifications may
be made to the preferred embodiments of the invention and that such
changes and modifications may be made without departing from the
true spirit of the invention. It is therefore intended that the
appended claims be construed to cover all such equivalent
variations as fall within the true spirit and scope of the
invention.
* * * * *
References