U.S. patent application number 11/070613 was filed with the patent office on 2005-10-20 for method and apparatus for differentiating pedestrians, vehicles, and other objects.
Invention is credited to Camus, Theodore Armand, Chang, Peng.
Application Number | 20050232491 11/070613 |
Document ID | / |
Family ID | 34922707 |
Filed Date | 2005-10-20 |
United States Patent
Application |
20050232491 |
Kind Code |
A1 |
Chang, Peng ; et
al. |
October 20, 2005 |
Method and apparatus for differentiating pedestrians, vehicles, and
other objects
Abstract
A method and apparatus for classifying an object in an image is
disclosed. Edges of an object are detected within a region of
interest. Edge analysis is performed on a plurality of sub-regions
within the region of interest to generate an edge score. The object
is classified based on the edge score.
Inventors: |
Chang, Peng; (West Windsor,
NJ) ; Camus, Theodore Armand; (Marlton, NJ) |
Correspondence
Address: |
MOSER, PATTERSON & SHERIDAN, LLP
/SARNOFF CORPORATION
595 SHREWSBURY AVENUE
SUITE 100
SHREWSBURY
NJ
07702
US
|
Family ID: |
34922707 |
Appl. No.: |
11/070613 |
Filed: |
March 2, 2005 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60549203 |
Mar 2, 2004 |
|
|
|
60605339 |
Aug 27, 2004 |
|
|
|
Current U.S.
Class: |
382/199 ;
382/103 |
Current CPC
Class: |
G06T 2207/10016
20130101; G06K 9/4642 20130101; G06T 7/11 20170101; G06K 9/00362
20130101; G06T 2207/30261 20130101 |
Class at
Publication: |
382/199 ;
382/103 |
International
Class: |
G06K 009/00; G06K
009/48 |
Claims
1. A method of classifying an object in an image, comprising:
detecting edges of said object within a region of interest;
performing edge analysis on a plurality of sub-regions within said
region of interest to generate an edge score; and classifying said
object based on said edge score.
2. The method of claim 1, wherein detecting said edges comprises
performing Canny edge detection.
3. The method of claim 1, wherein said plurality of sub-regions
comprise a top region, a bottom region, a left region, and a right
region.
4. The method of claim 1, wherein said plurality of sub-regions
comprise a head region, a left upper body region, a right upper
body region, a left lower body region, and a right lower body
region.
5. The method of claim 1, wherein said edge analysis determines an
edge energy for each subregion.
6. The method of claim 5, wherein said edge score comprises a sum
of the edge energy for each sub-region.
7. The method of claim 1, wherein said object is classified in
accordance with a threshold for said edge score.
8. An apparatus for classifying an object in an image, comprising:
means for detecting edges of said object within a region of
interest; means for performing edge analysis on a plurality of
sub-regions within said region of interest to generate an edge
score; and means for classifying said object based on said edge
score.
9. The apparatus of claim 8, wherein detecting said edges comprises
performing Canny edge detection.
10. The apparatus of claim 8, wherein said plurality of sub-regions
comprise a top region, a bottom region, a left region, and a right
region.
11. The apparatus of claim 8, wherein said plurality of sub-regions
comprise a head region, a left upper body region, a right upper
body region, a left lower body region, and a right lower body
region.
12. The apparatus of claim 8, wherein said edge analysis determines
an edge energy for each subregion.
13. The apparatus of claim 12, wherein said edge score comprises a
sum of the edge energy for each sub-region.
14. The apparatus of claim 8, wherein said object is classified in
accordance with a threshold for said edge score.
15. A computer-readable medium having stored thereon a plurality of
instructions, the plurality of instructions including instructions
which, when executed by a processor, cause the processor to perform
the steps of a method of classifying an object in an image,
comprising: detecting edges of said object within a region of
interest; performing edge analysis on a plurality of sub-regions
within said region of interest to generate an edge score; and
classifying said object based on said edge score.
16. The computer-readable medium of claim 15, wherein detecting
said edges comprises performing Canny edge detection.
17. The computer-readable medium of claim 15, wherein said
plurality of sub-regions comprise a top region, a bottom region, a
left region, and a right region.
18. The computer-readable medium of claim 15, wherein said
plurality of sub-regions comprise a head region, a left upper body
region, a right upper body region, a left lower body region, and a
right lower body region.
19. The computer-readable medium of claim 15, wherein said edge
analysis determines an edge energy for each subregion.
20. The computer-readable medium of claim 15, wherein said object
is classified in accordance with a threshold for said edge score.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. provisional
patent application Nos. 60/549,203 filed, Mar. 2, 2004 and Ser. No.
60/605,339, filed Aug. 27, 2004, which are herein incorporated by
reference.
BACKGROUND OF THE INVENTION
[0002] 1. Field of the Invention
[0003] The present invention relates to artificial or computer
vision systems, e.g. vehicular vision systems. In particular, this
invention relates to a method and apparatus for detecting
automobiles and pedestrians in a manner that facilitates collision
avoidance.
[0004] 2. Description of the Related Art
[0005] Collision avoidance systems utilize a sensor system for
detecting objects in front of an automobile or other form of
vehicle or platform. In general, a platform can be any of a wide
range of bases, including a boat, a plane, an elevator, or even a
stationary dock or floor. The sensor system may include radar, an
infrared sensor, or another detector. In any event the sensor
system generates a rudimentary image of the scene in front of the
vehicle. By processing that imagery, objects can be detected.
Collision avoidance systems generally identify when an object is in
front of a vehicle, but usually do not classify the object or
provide any information regarding the movement of the object.
[0006] Therefore, there is a need in the art for a method and
apparatus that provides for differentiating detected objects.
SUMMARY OF THE INVENTION
[0007] The present invention describes a method and apparatus for
classifying an object in an image. In one embodiment, edges of an
object are detected within a region of interest. Edge analysis is
performed on a plurality of sub-regions within the region of
interest to generate an edge score. The object is classified based
on the edge score.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] So that the manner in which the above recited features of
the present invention are attained and can be understood in detail,
a more particular description of the invention, briefly summarized
above, may be had by reference to the embodiments thereof which are
illustrated in the appended drawings.
[0009] It is to be noted, however, that the appended drawings
illustrate only typical embodiments of this invention and are
therefore not to be considered limiting of its scope, for the
invention may admit to other equally effective embodiments.
[0010] FIG. 1 depicts one embodiment of a schematic view of a
vehicle utilizing the present invention;
[0011] FIG. 2 depicts a block diagram of a vehicular vision system
in accordance with one embodiment of the present invention;
[0012] FIG. 3 depicts a block diagram of functional modules of the
vision system of FIG. 2 in accordance with one embodiment of the
present invention;
[0013] FIG. 4 illustrates a flow diagram in accordance with a
method of the present invention;
[0014] FIG. 5 illustrates a car located within a region of interest
in accordance with one embodiment of the present invention;
[0015] FIG. 6 illustrates an isometric original edge map in
accordance with one embodiment of the present invention;
[0016] FIG. 7 illustrates a vertical edge map in accordance with
one embodiment of the present invention;
[0017] FIG. 8 illustrates a horizontal edge map in accordance with
one embodiment of the present invention;
[0018] FIG. 9 illustrates a pedestrian located within a region of
interest in accordance with one embodiment of the present
invention;
[0019] FIG. 10 illustrates an isometric original edge map in
accordance with one embodiment of the present invention;
[0020] FIG. 11 illustrates a vertical edge map in accordance with
one embodiment of the present invention;
[0021] FIG. 12 illustrates a car model in accordance with one
embodiment of the present invention;
[0022] FIG. 13 illustrates a pedestrian model in accordance with
one embodiment of the present invention;
[0023] FIG. 14 illustrates detected edges of a car in accordance
with one embodiment of the present invention; and
[0024] FIG. 15 illustrates detected edges of a pedestrian in
accordance with one embodiment of the present invention.
DETAILED DESCRIPTION
[0025] The present invention discloses in one embodiment method and
apparatus for classifying an object in a region of interest based
on one or more features of the object. Detection and classification
of pedestrians, vehicles, and other objects are important, e.g.,
for automotive safety devices, since these devices may deploy in a
particular fashion only if a target of the particular type (i.e.,
pedestrian or car) is about to be impacted. In particular, measures
employed to mitigate the injury to a pedestrian may be very
different from those employed to mitigate damage and injury from a
vehicle-to-vehicle collision.
[0026] FIG. 1 depicts a schematic diagram of a vehicle 100 having a
target differentiation system 102 that differentiates a pedestrian
(or pedestrians) 110 within a scene 104 that is proximate the
vehicle 100. It should be understood that target differentiation
system 102 is operable to detect pedestrians, automobiles, or other
objects. While in the illustrated embodiment scene 104 is in front
of vehicle 100, other object detection systems may image scenes
that are behind or to the side of vehicle 100. Furthermore, target
differentiation system 102 need not be related to a vehicle, but
can be used with any type of platform, such as a boat, a plane, an
elevator, or even stationary streets, docks, or floors. Target
differentiation system 102 comprises a sensor array 106 that is
coupled to an image processor 108. The sensors within the sensor
array 106 have a field of view that includes one or more
targets.
[0027] The field of view in a practical object detection system 102
may be .+-.12 meters horizontally in front of the vehicle 100
(e.g., approximately 3 traffic lanes), with a .+-.3 meter vertical
area, and have a view depth of approximately 5-40 meters. (Other
fields of view and ranges are possible, depending on camera optics
and the particular application.) Therefore, it should be understood
that the present invention can be used in a pedestrian detection
system or as part of a collision avoidance system.
[0028] FIG. 2 depicts a block diagram of hardware used to implement
the target differentiation system 102. The sensor array 106
comprises, for example, a pair of cameras 200 and 202. In some
applications an optional secondary sensor 204 can be included. The
secondary sensor 204 may be radar, a light detection and ranging
(LIDAR) sensor, an infrared range finder, a sound navigation and
ranging (SONAR) senor, and the like. The cameras 200 and 202
generally operate in the visible wavelengths, but may be augmented
with infrared sensors, or the cameras may themselves operate in the
infrared range. The cameras have a known, fixed relation to one
another such that they can produce a stereo image of the scene 104.
Therefore, the cameras 200 and 202 will sometimes be referred to
herein as stereo cameras.
[0029] Still referring to FIG. 2, the image processor 108 comprises
an image preprocessor 206, a central processing unit (CPU) 210,
support circuits 208, and memory 212. The image preprocessor 206
generally comprises circuitry for capturing, digitizing and
processing the imagery from the sensor array 106. The image
preprocessor may be a single chip video processor such as the
processor manufactured under the model Acadia I.TM. by Pyramid
Vision Technologies of Princeton, N.J.
[0030] The processed images from the image preprocessor 206 are
coupled to the CPU 210. The CPU 210 may comprise any one of a
number of presently available high speed microcontrollers or
microprocessors. CPU 210 is supported by support circuits 208 that
are generally well known in the art. These circuits include cache,
power supplies, clock circuits, input-output circuitry, and the
like. Memory 212 is also coupled to CPU 210. Memory 212 stores
certain software routines that are retrieved from a storage medium,
e.g., an optical disk, and the like, and that are executed by CPU
210 to facilitate operation of the present invention. Memory 212
also stores certain databases 214 of information that are used by
the present invention, and image processing software 216 that is
used to process the imagery from the sensor array 106. Although the
present invention is described in the context of a series of method
steps, the method may be performed in hardware, software, or some
combination of hardware and software (e.g., an ASIC). Additionally,
the methods as disclosed can be stored on a computer readable
medium.
[0031] FIG. 3 is a functional block diagram of modules that are
used to implement the present invention. The stereo cameras 200 and
202 provide stereo imagery to a stereo image preprocessor 300. The
stereo image preprocessor is coupled to a depth map generator 302
which is coupled to a target processor 304. Depth map generator 302
may be utilized to define a region of interest (ROI), i.e., an area
of the image that potentially contains a target 110. In some
applications the depth map generator 302 is not used. In
applications where depth map generator 302 is not used, ROIs would
be determined using image-based methods. The following will
describe the functional block diagrams under the assumption that a
depth map generator 302 is used. The target processor 304 receives
information from a target template database 306 and from the
optional secondary sensor 204. The stereo image preprocessor 300
calibrates the stereo cameras, captures and digitizes imagery,
warps the images into alignment, performs pyramid wavelet
decomposition, and performs stereo matching, which is generally
well known in the art, to create disparity images at different
resolutions.
[0032] For both hardware and practical reasons, creating disparity
images having different resolutions is beneficial when detecting
objects. Calibration provides for a reference point and direction
from which all distances and angles are determined. Each of the
disparity images contains the point-wise motion from the left image
to the right image and each corresponds to a different image
resolution. The greater the computed disparity of an imaged object,
the closer the object is to the sensor array.
[0033] The depth map generator 302 processes the multi-resolution
disparity images into a two-dimensional depth image. The depth
image (also referred to as a depth map) contains image points or
pixels in a two dimensional array, where each point represents a
specific distance from the sensor array to point within the scene.
The depth image is then processed by the target processor 304
wherein templates (models) of typical objects encountered by the
vision system are compared to the information within the depth
image. As described below, the template database 306 comprises
templates of objects (e.g., automobiles, pedestrians) located at
various positions and depth with respect to the sensor array.
[0034] An exhaustive search of the template database may be
performed to identify a template that most closely matches the
present depth image. The secondary sensor 204 may provide
additional information regarding the position of the object
relative to the vehicle, velocity of the object, size or angular
width of the object, etc., such that the target template search
process can be limited to templates of objects at about the known
position relative to the vehicle. If the secondary sensor is a
radar sensor, the sensor can, for example, provide an estimate of
both object position and distance. The target processor 304
produces a target list that is then used to identify target size
and classification estimates that enable target tracking and the
identification of each target's position, classification and
velocity within the scene. That information may then be used to
avoid collisions with each target or perform pre-crash alterations
to the vehicle to mitigate or eliminate damage (e.g., lower or
raise the vehicle, deploy air bags, and the like).
[0035] FIG. 4 depicts a flow diagram of a method 400 for verifying
an object in an image. The method 400 begins at step 405 and
proceeds to step 410. In step 410, edges are detected for an object
within a region of interest (ROI). The present invention describes
the use of a depth based method to find a ROI where a target 110
may be located, however, ROIs may also be determined using image
based methods.
[0036] In one embodiment, edge detection may be performed for a
car. FIG. 5 illustrates a car located within a region of interest
defined by box 505. A Canny edge detector is performed on the
original image. An isometric original edge map produced by the
Canny edge detector is shown in FIG. 6. The edge detector then
determines left and right boundaries of the car body as shown in
the vertical edge map of FIG. 7. The edge detector also determines
the top and bottom boundaries of the car body as shown in the
horizontal edge map of FIG. 8.
[0037] In one embodiment, edge detection may be performed for a
pedestrian. FIG. 9 illustrates a pedestrian located within a region
of interest defined by box 905. A Canny edge detector is performed
on the original image. An isometric original edge map produced by
the Canny edge detector is shown in FIG. 10. For pedestrian cases,
parts of a human model such as a head, and the upper and lower
torso are looked for in the edge map. The upper and lower torso
correspond to left and right upper body boundaries, and left and
right lower body boundaries, respectively, as shown in the vertical
edge map of FIG. 11.
[0038] In step 415, edge analysis is performed on a plurality of
sub-regions within the region of interest to generate an edge
score. A model based approach is utilized to detect objects. FIGS.
12 and 13 illustrate the car model 1200 and the pedestrian model
1300 used for classification, respectively. It should be apparent
to one having skill in the art that similar models may be designed
for other objects. Three types of edge maps are computed from the
original images, isometric original, horizontal, and vertical. In
one embodiment, the edge maps are represented as binary images;
i.e., each pixel in the edge image set to "1" represents a detected
edge, and each pixel set to "0" represents no edge found at that
location in the original image.
[0039] Referring to FIGS. 12 and 13, the edge strength is computed
in each of the edge boxes, e.g., solid rectangles 1205, 1210, 1215,
1220, 1305, 1310, 1315, 1320, 1325. Each solid rectangle 1205,
1210, 1215, 1220, 1305, 1310, 1315, 1320, 1325 is shifted around
its local neighborhood to find the maximum output. The dashed
rectangles 1225, 1230, 1235, 1240, 1330, 1335, 1340, 1345, 1350 are
the search regions for the edge sum boxes. Boxes 1205, 1210 are
computed in the horizontal edge maps. Boxes 1215, 1220, 1310, 1315,
1320, 1325 are computed in the vertical edge maps. Box 1305 is
computed in the original isometric edge map. Each box sum is the
sum of the edge map regions, normalized by the area of the box in
question.
[0040] An edge score may be determined from the edge analysis. The
edge score is a summation of the individual edge scores of each
edge box and is determined in accordance with the following
formula: 1 EdgeScore i = k BoxSum k
[0041] where i represents the type of model used, k represents the
number of edge boxes, and BoxSum represents an edge strength for
each edge box.
[0042] FIG. 14 illustrates the detected left 1405, right 1410, top
1415, and bottom 1420 of a car. The left 1405, right 1410, top
1415, and bottom 1420 of the car body is defined as the locations
with the highest edge point density. As stated above, the car
detector returns a score, which is the sum of the density, e.g.,
the edge density in the four regions.
[0043] FIG. 15 illustrates the detected head 1505, upper body
boundaries 1510, 1515 and lower body boundaries 1520, 1525 of a
pedestrian. The upper body boundaries corresponding to the arms and
the lower body boundaries correspond to the legs of a pedestrian.
The head, upper body boundaries, and lower body boundaries of the
pedestrian are defined as the locations with the highest edge point
density. As stated above, the pedestrian detector returns a score,
which is the sum of the edge density in the five regions.
[0044] In step 425, the object is classified based on the edge
score. In one embodiment, the object is classified in accordance
with a threshold for the edge score. The thresholds for each target
type, e.g. vehicle or pedestrian, are typically determined
empirically or by a learning process. In one embodiment, a
threshold of 1.6 is used for the pedestrian class and 1.8 is used
for the vehicle class.
[0045] While the foregoing is directed to embodiments of the
present invention, other and further embodiments of the invention
may be devised without departing from the basic scope thereof, and
the scope thereof is determined by the claims that follow.
* * * * *