U.S. patent application number 14/852806 was filed with the patent office on 2017-03-16 for classifying objects detected by 3d sensors for autonomous vehicle operation.
The applicant listed for this patent is Toyota Motor Engineering & Manufacturing North America, Inc.. Invention is credited to Michael J. Delp.
Application Number | 20170075356 14/852806 |
Document ID | / |
Family ID | 58017693 |
Filed Date | 2017-03-16 |
United States Patent
Application |
20170075356 |
Kind Code |
A1 |
Delp; Michael J. |
March 16, 2017 |
CLASSIFYING OBJECTS DETECTED BY 3D SENSORS FOR AUTONOMOUS VEHICLE
OPERATION
Abstract
A method of autonomous driving includes generating, with a 3D
sensor, 3D points representing objects in the environment
surrounding a vehicle. The method further includes, with a
computing device, identifying, from the 3D points, a temporal
series of clusters of 3D points representing the same object in the
environment surrounding the vehicle as a track, identifying
cluster-based classifiers for the object based on identified local
features for the clusters in the track, identifying track-based
classifiers for the object based on identified global features for
the track, combining the cluster-based classifiers and the
track-based classifiers to classify the object, with the
cluster-based classifiers being weighted based on an amount of
information on the clusters from which they are identified, and
with the weight increasing with increasing amounts of information,
and driving the vehicle along a route based on the object's
classification.
Inventors: |
Delp; Michael J.; (Ann
Arbor, MI) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Toyota Motor Engineering & Manufacturing North America,
Inc. |
Erlanger |
KY |
US |
|
|
Family ID: |
58017693 |
Appl. No.: |
14/852806 |
Filed: |
September 14, 2015 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G05D 1/0242 20130101;
G06K 9/00805 20130101; G05D 2201/0213 20130101; G05D 1/0251
20130101; G06K 9/00201 20130101; G05D 1/024 20130101 |
International
Class: |
G05D 1/02 20060101
G05D001/02; G06K 9/00 20060101 G06K009/00 |
Claims
1. A method of autonomous driving, comprising: generating, with a
3D sensor, 3D points representing objects in an environment
surrounding a vehicle; identifying, with a computing device, from
the 3D points, a temporal series of clusters of 3D points
representing the same object in the environment surrounding the
vehicle as a track; identifying, with the computing device,
cluster-based classifiers for the object based on identified local
features for the clusters in the track; identifying, with the
computing device, track-based classifiers for the object based on
identified global features for the track; combining, by using the
computing device, the cluster-based classifiers and the track-based
classifiers to classify the object, with the cluster-based
classifiers being weighted based on an amount of information on the
clusters from which they are identified, and with the weight
increasing with increasing amounts of information; and driving the
vehicle, using the computing device, along a route based on the
object's classification.
2. The method of autonomous driving of claim 1, wherein the amount
of information on the clusters is an amount of 3D points in the
clusters from which the cluster-based classifiers are
identified.
3. The method of autonomous driving of claim 1, wherein when the
amount of information on the cluster from one of the cluster-based
classifiers is below a threshold, the cluster-based classifier is
weighted to zero.
4. The method of autonomous driving of claim 1, wherein each
cluster-based classifier includes a prediction of which of a
plurality of object classes the object belongs to.
5. The method of autonomous driving of claim 1, wherein each
cluster-based classifier includes at least one of a one-vs-all
log-odds that the object belongs to one of a plurality of object
classes or a probability that the object belongs to one of the
plurality of object classes.
6. The method of autonomous driving of claim 1, wherein each
track-based classifier includes a prediction of which of a
plurality of object classes the object belongs to.
7. The method of autonomous driving of claim 1, wherein each
track-based classifier includes at least one of a one-vs-all
log-odds that the object belongs to one of a plurality of object
classes or a probability that the object belongs to one of the
plurality of object classes.
8. The method of autonomous driving of claim 1, wherein the
combination of the track-based classifiers and the weighted
cluster-based classifiers includes a probability of which of a
plurality of object classes the object belongs to.
9. The method of autonomous driving of claim 1, wherein the
combination of the track-based classifiers and the weighted
cluster-based classifiers includes at least one of a one-vs-all
log-odds that the object belongs to one of a plurality of object
classes or a probability that the object belongs to one of the
plurality of object classes.
10. The method of autonomous driving of claim 1, further
comprising: identifying, with the computing device, the local
features based on the clusters in the track.
11. The method of autonomous driving of claim 10, wherein the local
features are identified based on an appearance of the clusters in
the track.
12. The method of autonomous driving of claim 1, further
comprising: identifying, with the computing device, the global
features based on the track.
13. The method of autonomous driving of claim 12, wherein the
global features are identified based on the motion of the clusters
in the track.
14. The method of autonomous driving of claim 1, further
comprising: for each cluster in the track, with the computing
device: identify a bounding box of the cluster; and identify the
height, width and length of the identified bounding box as a local
feature for the cluster.
15. The method of autonomous driving of claim 1, further
comprising: for each cluster in the track, with the computing
device: identify a bounding box of the cluster; and identify the
volume of the identified bounding box as a local feature for the
cluster.
16. The method of autonomous driving of claim 1, further
comprising: for each cluster in the track, with the computing
device: identify a bounding box of the cluster; identify a centroid
of the identified bounding box; and identify a distance to the
identified centroid of the identified bounding box as a local
feature for the cluster.
17. The method of autonomous driving of claim 1, further
comprising: for the track, with the computing device: identify a
velocity of the clusters in the track; and identify the identified
velocity as a global feature for the track.
18. The method of autonomous driving of claim 1, further
comprising: for the track, with the computing device: identify an
acceleration of the clusters in the track; and identify the
identified acceleration as a global feature for the track.
Description
TECHNICAL FIELD
[0001] The embodiments disclosed herein generally relate to
autonomous operation systems for vehicles and, more particularly,
to the classification of objects detected by 3D sensors in
autonomous operation systems.
BACKGROUND
[0002] Some vehicles include an autonomous operation system with an
operational mode in which the vehicle is driven along a travel
route with minimal or no input from a human driver. In these
vehicles, the autonomous operation system is configured to detect
information about the environment surrounding the vehicle,
including the presence of objects, and process the detected
information in order to plan how to drive the vehicle along a
travel route while avoiding the objects.
[0003] In real world traffic situations, as a part of this
detection and planning, it is desirable for the autonomous
operation system to classify the objects in the environment
surrounding the vehicle in order to account not only for other
vehicles, but also for pedestrians and bicycles, among other
objects. Improving the autonomous operation system's classification
of the objects in the environment surrounding the vehicle is the
subject of ongoing research.
SUMMARY
[0004] Disclosed herein are systems and methods for autonomous
driving. In one aspect, a method of autonomous driving includes
generating, with a 3D sensor, 3D points representing objects in the
environment surrounding a vehicle. The method further includes,
with a computing device, identifying, from the 3D points, a
temporal series of clusters of 3D points representing the same
object in the environment surrounding the vehicle as a track,
identifying cluster-based classifiers for the object based on
identified local features for the clusters in the track,
identifying track-based classifiers for the object based on
identified global features for the track, combining the
cluster-based classifiers and the track-based classifiers to
classify the object, with the cluster-based classifiers being
weighted based on an amount of information on the clusters from
which they are identified, and with the weight increasing with
increasing amounts of information, and driving the vehicle along a
route based on the object's classification.
[0005] These and other aspects will be described in additional
detail below.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] The various features, advantages and other uses of the
present embodiments will become more apparent by referring to the
following detailed description and drawing in which:
[0007] FIG. 1 is a schematic representation of a vehicle including
an autonomous operation system whose operation is supported by a 3D
sensor;
[0008] FIG. 2 is a schematic representation of the system
architecture of a detection module for the autonomous operation
system;
[0009] FIG. 3 is a flowchart showing the operations of a process
for a classifier thread in the detection module;
[0010] FIGS. 4A and 4B show example estimations of the principle
direction of a cluster of 3D points representing a vehicle;
[0011] FIG. 5 shows example spin images from clusters of 3D points
representing different objects;
[0012] FIGS. 6A-C show example virtual orthographic images of a
cluster of 3D points representing a vehicle;
[0013] FIG. 7 is an example graphical model encoding the
probabilistic independencies between local features for a track's
clusters of 3D points and global features for the track;
[0014] FIG. 8 shows aspects of an example weighing factor for a
cluster-based classifier; and
[0015] FIG. 9 shows aspects of a classifier confidence over time
for track.
DETAILED DESCRIPTION
[0016] This disclosure teaches a vehicle that includes an
autonomous operation system that, in operation, classifies objects
in the environment surrounding the vehicle represented by temporal
series of clusters of 3D points, and drives the vehicle along a
route based on the objects' classifications. The autonomous
operation system classifies the objects based on both a track-based
classifier and cluster-based classifiers, but weighs the
cluster-based classifiers based on the amount of information on the
clusters from which they are identified.
[0017] FIG. 1 shows a vehicle 10 including an autonomous operation
system 20 whose operation is supported by a LIDAR sensor 22 and one
or more optional auxiliary sensors 24. The LIDAR sensor 22 and the
auxiliary sensors 24 are mounted on the vehicle 10 and positioned
to have fields of view in the environment surrounding the vehicle
10. Although the vehicle 10 is provided as a non-limiting example
of a mobile platform, it will be understood that the autonomous
operation system 20 could be implemented in other mobile platforms.
Additionally, although the LIDAR sensor 22 is provided as a
non-limiting example of a 3D sensor, it will be understood that
this description is applicable in principle to other 3D
sensors.
[0018] The LIDAR sensor 22 is configured to scan the environment
surrounding the vehicle 10 and generate signals, including but not
limited to 3D points, representing the objects in the environment
surrounding the vehicle 10.
[0019] Generally, the LIDAR sensor 22 can include a transmitter and
a receiver. The transmitter can be a component or group of
components operable to transmit laser signals (e.g., laser light
energy). As an example, the transmitter may be a laser, laser
rangefinder, LIDAR, and/or laser scanner. The laser signals may
have any suitable characteristics. In one or more arrangements, the
laser signals may be from any suitable portion of the
electromagnetic spectrum, such as from the ultraviolet, visible, or
near infrared portions of the electromagnetic spectrum. The laser
signals may be eye safe.
[0020] The laser signals may be transmitted into the environment
surrounding the vehicle 10, where they impinge upon objects therein
that are located in the path of the laser signals. The laser
signals may be transmitted in series of 360 degree spins around a
vertical Z axis of the vehicle 10, for example. Generally, when the
laser signals impinge upon an object, a portion of the laser
signals is returned (e.g., by reflection) to the LIDAR sensor 22.
The returned portion of the laser signals can be captured at the
LIDAR sensor 22 by its receiver, which may be, or include, one or
more photodetectors, solid state photodetectors, photodiodes or
photomultipliers, or any combination of these.
[0021] Responsive to capturing the returned laser signals, the
LIDAR sensor 22 may be configured to output signals representing
objects, or the lack thereof, in the environment surrounding the
vehicle 10. The LIDAR sensor 22 may include a global positioning
system (GPS) or other positioning system for identifying its
position, and an inertial measurement unit (IMU) for identifying
its pose. According to this configuration, the signals may include
3D points representing the location in space of the points from
which the returned laser signals are received, and therefore, the
location in space of points of objects on which the laser signals
impinged. The LIDAR sensor 22 may determine the location in space
of points of objects based on the distance from the LIDAR sensor 22
to the points, as well as the position and pose of the LIDAR sensor
22 associated with the returned laser signals. The distance to the
points may be determined from the returned laser signals using the
time of flight (TOF) method, for instance. The signals may also
represent the locations in space from which no returned laser
signals are received, and therefore, the lack of points of objects
in those locations in space on which the laser signals would
otherwise have impinged.
[0022] The signals may further represent other aspects of the
returned laser signals, which, in turn, may represent other
properties of points of objects on which the incident laser signals
impinged. These aspects of the returned laser signals can include
their intensity or reflectivity, for instance, or any combination
of these.
[0023] The auxiliary sensors 24 may also be configured to scan the
environment surrounding the vehicle 10 and generate signals
representing objects, or the lack thereof, in the environment
surrounding the vehicle 10.
[0024] The auxiliary sensors 24 may have fields of view
individually, or collectively, common to the field of view of the
LIDAR sensor 22 in the environment surrounding the vehicle 10.
Generally, the auxiliary sensors 24 can be, or include, one or more
image sensors configured for capturing light or other
electromagnetic energy from the environment surrounding the vehicle
10. These image sensors may be, or include, one or more
photodetectors, solid state photodetectors, photodiodes or
photomultipliers, or any combination of these. Optionally, the
environment can be illuminated by the transmitter of the LIDAR
sensor 22. Responsive to capturing light or other electromagnetic
energy, the auxiliary sensors 24 may be configured to output
signals representing objects, or the lack thereof, in the
environment surrounding the vehicle 10.
[0025] The vehicle 10 includes a computing device 30 to which the
LIDAR sensor 22 and the auxiliary sensors 24 are communicatively
connected through one or more communication links 32. Although the
computing device 30 and either or both of the LIDAR sensor 22 and
the auxiliary sensors 24 may be dedicated to the autonomous
operation system 20, it is contemplated that some or all of these
could also support the operation of other systems of the vehicle
10.
[0026] The computing device 30 may include a processor 40
communicatively coupled with a memory 42. The processor 40 may
include any device capable of executing machine-readable
instructions, which may be stored on a non-transitory
computer-readable medium, for example the memory 42. The processor
40 may include a controller, an integrated circuit, a microchip, a
computer, and/or any other computing device. The memory 42 may
include any type of computer readable medium suitable for storing
data and algorithms. For example, the memory 42 may include RAM,
ROM, a flash memory, a hard drive, and/or any device capable of
storing machine-readable instructions.
[0027] The computing device 30 may also include an input/output
interface 44 for facilitating communication between the processor
40 and the LIDAR sensor 22 and the auxiliary sensors 24. Although
the computing device 30 is schematically illustrated as including a
single processor 40 and a single memory 42, in practice the
computing device 30 may include a plurality of components, each
having one or more memories 42 and/or processors 40 that may be
communicatively coupled with one or more of the other components.
The computing device 30 may be a separate standalone unit or may be
configured as a part of a central control system for the vehicle
10.
[0028] The various algorithms and data for the autonomous operation
system 20 and the other systems of the vehicle 10 may reside in
whole or in part in the memory 42 of the computing device 30. In
operation of the autonomous operation system 20, the signals output
by the LIDAR sensor 22 and the auxiliary sensors 24 are stored in
the memory 42. As described in additional detail below, the
algorithms and data for the autonomous operation system 20 include
a detection module 50 and a planning module 52.
[0029] Although the various algorithms and data for the autonomous
operation system 20 are described with reference to the computing
device 30 onboard the vehicle 10 for simplicity, it will be
understood that these may reside in whole or in part in a memory of
a computing device separate from the vehicle 10. In these cases,
the vehicle 10 may also include an integrated mobile communication
system 60 with variously configured communication hardware for
wirelessly transmitting data between the computing device 30 and a
mobile network, such as a cellular network. The mobile
communication system 60 and the mobile network together enable the
computing device 30 to wirelessly communicate with other devices
connected to the mobile network, such as a remote server that may
similarly be, or include, a computing device including one or more
processors and one or more memories, or another vehicle that may
similarly include an object detection system with a computing
device including one or more processors and one or more
memories.
[0030] The mobile communication system 60 of the vehicle 10 may
include an integrated mobile network transceiver 62 configured to
transmit and receive data over the mobile network. The mobile
network transceiver 62 may be communicatively connected to the
computing device 30 though a mobile network transceiver
communication link 64, with the input/output interface 44
facilitating communication between the processor 40 and the memory
42 and the mobile network transceiver 62. The mobile network
transceiver 62 includes a transmitter for wirelessly transferring
data from the computing device 30 to the mobile network and a
receiver for wirelessly transferring data from the mobile network
to the computing device 30.
[0031] The overall operations of performing object detection in the
detection module 50 of the autonomous operation system 20 of the
vehicle 10 are introduced with reference to FIG. 2.
[0032] As shown, the detection module 50 of the autonomous
operation system 20 has a perception and tracking thread and a
classifier thread. In the perception and tracking thread, as the
LIDAR sensor 22 scans the environment surrounding the vehicle 10,
3D points and other signals output by the LIDAR sensor 22 and
representing the objects in the environment surrounding the vehicle
10 are received. A digital map containing a 3D road network with
positions for each lane and associated traffic rules (e.g., speed
limits, the priority of each road at intersections and roundabouts,
and stop line positions) may also be received.
[0033] In the perception and tracking thread, the 3D points
representing the objects in the environment surrounding the vehicle
10 are received over multiple timesteps each corresponding, for
instance, to a 360 degree spin around the vertical Z axis of the
vehicle 10. For each of the timesteps, the 3D points may be
evaluated to discriminate between 3D points representing obstacles
and 3D points representing other objects, such as the ground, and
collected into clusters of 3D points representing respective
objects in the environment surrounding the vehicle 10. The
clustering may implement Markov random field-based clustering, for
example.
[0034] A given cluster of 3D points at one timestep representing an
object in the environment surrounding the vehicle 10 is associated
to clusters of 3D points at previous timesteps representing the
same object in the environment surrounding the vehicle 10. Over
multiple timesteps, a so-called track is generated, which is a
temporal series of clusters of 3D points representing the same
object in the environment surrounding the vehicle 10. The
generation of a track may be implemented by, or from, both particle
and Kalman filtering, for example. The tracks, once generated, are
updated with new clusters of 3D points representing the object in
the environment surrounding the vehicle 10 in subsequent
iterations.
[0035] The perception and tracking thread and the classifier thread
run in parallel and communicate via concurrent result and request
queues. The result queue is used to pass classification results
from the classifier thread to the perception and tracking thread,
while the request queue is filled by the perception and tracking
thread with tracks for which classification results are needed.
[0036] With this configuration, the detection module 50 may
implement an anytime system in which the results of the classifier
thread may be prioritized and integrated into the perception and
tracking thread to ensure output for real-time decision-making to
the planning module 52 despite the analytical costs of the
classifier thread.
[0037] To implement the anytime system, in the perception and
tracking thread, each track may be given a score for prioritization
prior to being inserted into the request queue. Tracks having
clusters of 3D points in front of and in closest proximity to the
vehicle 10 may be given the highest priority, for instance. In
these cases, the score given to each track may simply be the
distance from the vehicle 10 to a track's clusters of 3D points
plus a penalty (e.g., 50 meters) for being behind the vehicle 10,
with the tracks with lower scores being prioritized over those with
higher scores. Alternatively, or additionally, tracks having
clusters of 3D points in certain areas of interest in the
environment surrounding the vehicle 10, or tracks whose clusters of
3D points represent an object with an uncertain classification, for
instance, could be prioritized over other tracks.
[0038] The classifier thread pulls the highest priority track from
the request queue, identifies a number of features of the track's
clusters of 3D points, identifies classifiers for the track's
clusters of 3D points and classifies the object represented by the
track's clusters of 3D points based on its most likely object
class, as explained in additional detail below, and puts the
classifiers and an object class label reflecting the object's
resulting classification into the result queue.
[0039] After the perception and tracking thread generates a track,
it updates its clusters of 3D points with results of the classifier
thread from the result queue. Then, all unprocessed tracks are
cleared from the request queue, and the next track's clusters of 3D
points are pushed onto it. Then, the perception and tracking thread
sends each of the tracks to the planning module 52. Each of the
tracks has an associated history, so past results of the classifier
thread may be used for tracks not addressed in a given
iteration.
[0040] In general, for a track .tau. with features x.sub..tau., a
classifier may be expressed as the one-vs-all class log-odds
L ( x ; c ) = log ( Y = c x .tau. ) ( Y .noteq. c x .tau. ) ( Eq .
1 ) ##EQU00001##
where Y is the object class label and c is a discrete object class.
As explained in additional detail below, these log-odds can be
converted to a probability, with the vector containing
probabilities for each object class being
P.sub..tau.=[(c.sub.p|x.sub..tau.), (c.sub.b|x.sub.96 ),
(c.sub.v|x.sub..tau.), (c.sub.bg|x.sub..tau.)] (Eq. 2)
using the shorthand
(c|x.sub.96)=(Y=c|x.sub..tau.) (Eq. 2.1)
where the given object classes are pedestrian (c.sub.p), bicycle
(c.sub.b), vehicle (c.sub.v) and background (c.sub.bg) object
classes, respectively. This vector, along with the most likely
object class for the object represented by the track's clusters of
3D points, as well as typical tracking information such as
position, velocity, and size of the object, are passed to the to
the planning module 52.
[0041] In the autonomous operation of the vehicle 10 by its
autonomous operation system 20, the classification results from the
classifier thread advantageously allow the planning module 52 to
address the range of real world traffic situations otherwise faced
by human drivers, such as interactions between pedestrians,
bicycles and other vehicles. Addressing interactions between the
vehicle 10 and pedestrians and bicycles is particularly important
given the desire to ensure safety for occupants of the vehicle 10,
pedestrians and operators of bicycles, the potential for large
speed differentials and sudden relative lateral motions between the
vehicle 10 and pedestrians and bicycles, and the relative
vulnerability of pedestrians and operators of bicycles. Among other
things, proper classification of these and other objects in the
environment surrounding the vehicle 10 may, for example, provide
information to the planning module 52 used in the determination of
how much leeway to give the objects while they are being passed,
or, in the determination of whether to pass those objects in the
first place.
[0042] The operations of a process 100 for the classifier thread in
the detection module 50 of the autonomous operation system 20 of
the vehicle 10 are shown in FIG. 3.
[0043] As described below, the process 100 culminates in the
combination of cluster-based classifiers identified based on local
features for a track's clusters of 3D points, or cluster features,
and a track-based classifier based on the global features for the
track itself, or holistic features.
[0044] In general, the cluster features are based on the track's
clusters of 3D points, which change from one timestep t to the next
with changing distances, viewpoints and orientation of the LIDAR
sensor 22, among other things. The cluster features may, for
example, correspond in whole or in part to the appearance of the
track's clusters of 3D points. For the track, there is a local, or
cluster, feature set z.sub.1:T for timesteps 1 through T.
[0045] In general, the holistic features are higher level summary
statistics of the object represented by the track's clusters of 3D
points. The holistic features may, for example, correspond in whole
or in part to the motion of the object represented by the track's
clusters of 3D points. For the track, there is a single global, or
holistic, feature set .omega.. With both the cluster feature set
z.sub.1:T and the single holistic feature set .omega., the feature
set for the track at T is x.sub.T=z.sub.1:T, .omega..
[0046] In operation 102, the local features for a track's clusters
of 3D points, or cluster features, are identified, and in operation
104, the global features for the track itself, or holistic
features, are identified. In the process 100, for the track, each
of the resulting feature sets corresponds to a classifier, so there
will be T cluster-based classifiers and one track-based classifier
incorporated into the object's ultimate classification.
[0047] The local features for a track's clusters of 3D points, or
cluster features, may be identified, for instance, from spin images
and histogram of oriented gradients (HOG) features derived from
virtual orthographic images of the track's clusters of 3D points.
In general, this identification requires the track's clusters of 3D
points to be oriented consistently, which can be accomplished by
estimating the principle direction of each of the track's clusters
of 3D points.
[0048] With the vehicle 10 driving on relatively flat ground as the
LIDAR sensor 22 scans the environment surrounding the vehicle 10,
the Z axis can be assumed for up, and the principle direction can
be searched for in the XY plane. To estimate the principle
direction of a given cluster of 3D points, the 3D points may be
projected onto the XY plane, and a random sample consensus (RANSAC)
may be ran on all of the 3D points to find the direction that the
most 3D points align to (e.g., within 10 cm). A threshold of 50% of
the 3D points may be used, for example. An example estimation of
the principle direction of a cluster of 3D points representing a
vehicle when viewed from the side is shown in FIG. 4A, where the
principle direction PD is that to which the most 3D points
align.
[0049] It has been found that this estimation of the principle
direction of a cluster of 3D points generally works well for
clusters of 3D points representing vehicles and bicycles when
viewed from the side, as well as for clusters of 3D points
representing pedestrians. As can be seen from the comparison of the
estimation of the principle direction PD of a cluster of 3D points
representing a vehicle when viewed from the side, shown in FIG. 4A,
with the estimation of the principle direction PD of a cluster of
3D points representing a vehicle when viewed from the back, shown
in FIG. 4B, this estimation of the principle direction of a cluster
of 3D points sometimes fails for clusters of 3D points representing
a vehicle when viewed from the back, since the principle direction
to which the most 3D points align is perpendicular to the actual
orientation of the vehicle. However, if the same estimation of the
principle direction of a cluster of 3D points is used in both
learning and in classification, the learner will be able to
consider local features for a track's clusters of 3D points from
both views in classification.
[0050] Example spin images for a tree, bicycle, sedan and station
wagon are shown in FIG. 5. To generate the example and other spin
images, a virtual image plane may be spun around the Z axis around
the closest 3D points to the center top, front center and side of a
given of the track's clusters of 3D points representing an object,
accumulating all the 3D points hit along the way into bins.
[0051] Example virtual orthographic images of a cluster of 3D
points representing a vehicle when viewed from the front, side and
top of the cluster of 3D points are shown in FIGS. 6A-C,
respectively. Each virtual orthographic image is an orthographic
projection of the cluster of 3D points oriented along the principal
direction of the cluster of 3D points and centered on its bounding
box.
[0052] Other local features for a track's clusters of 3D points, or
cluster features, may be identified for a track's clusters of 3D
points. For instance, the estimated principle direction of a
cluster of 3D points may be used to orient a bounding box of the
cluster of 3D points, and the local features may include, for
example, the height, width and length of the bounding box of the
cluster of 3D points, as well as the volume of the bounding box of
the cluster of 3D points. Additionally, or alternatively, the
centroid of the bounding box of the cluster of 3D points may be
identified, and the local features may include a distance to the
centroid of the bounding box from the LIDAR sensor 22, for
instance, or otherwise from the vehicle 10.
[0053] The global features for a track, or holistic features, may
be, or include, a velocity of the track's clusters of 3D points
that represents a velocity of the object represented by the track's
clusters of 3D points. Accordingly, the global features for a track
may include a maximum velocity of the track's clusters of 3D points
or a maximum velocity of the track's clusters of 3D points, or
both, for instance. Alternatively, or additionally, the global
features for a track may be, or include, an acceleration of the
track's clusters of 3D points that represents an acceleration of
the object represented by the track's clusters of 3D points.
Accordingly, the global features for a track may include a maximum
acceleration of the track's clusters of 3D points or a maximum
acceleration of the track's clusters of 3D points, or both, for
instance. These and other global features for a track may be
identified, for example, using a Kalman filter over the centroids
of the track's clusters of 3D points.
[0054] In operation 106, it is learned which local features, or
cluster features, and which global features, or holistic features,
are predictive of objects belonging to the pedestrian (c.sub.p),
bicycle (c.sub.b), vehicle (c.sub.v) and background (c.sub.bg)
object classes. FIG. 7 is an example graphical model encoding the
probabilistic independencies between the local features for a
track's clusters of 3D points, or cluster features, and the global
features for the track, or holistic features. The learning in
operation 106 may implement a decision-tree-based Gentle
ADABoost.
[0055] In operation 108, cluster-based classifiers are identified
based on the local features for a track's clusters of 3D points, or
cluster features, and in operation 110, a track-based classifier is
identified based on the global features for the track itself, or
holistic features.
[0056] The cluster-based classifier and the track-based classifier
may generally be predictions of which of the pedestrian (c.sub.p),
bicycle (c.sub.b) vehicle (c.sub.v) and background (c.sub.bg)
object classes that the object represented by the track's clusters
of 3D points belongs to.
[0057] In one example, the cluster-based classifier and the
track-based classifier may be expressed as a one-vs-all log-odds
that the object belongs to one of the pedestrian (c.sub.p), bicycle
(c.sub.b) vehicle (c.sub.v) and background (C.sub.bg) object
classes. According to this example, in general, a strong classifier
H may be given by the sum of K weak classifiers h
H ( x , c ) = k = 1 K h k ( x , c ) ( Eq . 3 ) ##EQU00002##
where the weak classifiers h are regression trees of limited depth,
using the local features and the global features for splits. This
outputs real values, and the sum of the weak classifiers h may be
directly used as opposed to a discrete output sign(H(x, c)). In the
limit, the sum of the weak classifiers h converges to the log-odds
L(x, c). Using the Gentle AdaBoost, as implemented in OpenCV, these
log-odds may be identified for both the cluster-based classifiers
and the track-based classifier for the pedestrian (c.sub.p),
bicycle (c.sub.b) and vehicle (c.sub.v) object classes.
[0058] In operation 112, the cluster-based classifiers and the
track-based classifier are combined to classify the object
represented by the track's clusters of 3D point based on its most
likely object class apparent from the combined cluster-based
classifiers and track-based classifier.
[0059] To identify the log-odds for a track of length T, for a
specific object class c, there are T+1 classifier results from the
T cluster-based classifiers and the single track-based classifier.
As described below, these may be combined, for instance, using a
variation of the normalized Discrete Bayes Filter (DBF) that weighs
the cluster-based classifiers based on the amount of information on
the cluster of 3D points from which they are identified.
[0060] To simplify notation, the combination is described for one
object class c, and the object class notations are omitted below.
Accordingly, the log-odds ratio is L(x). As used below, L.sub.0 is
the log prior odds, and equals log
( Y = c ) ( Y .noteq. c ) , L 0 C ##EQU00003##
is an empirical estimate of log prior odds for the cluster-based
classifier, L.sub.0.sup.H is an empirical estimate of log prior
odds for the track-based classifier, H.sup.c is the cluster-based
classifier that returns the log-odds as identified for the local
features for a track's clusters of 3D points, or cluster features,
and H.sup.H is the track-based classifier that returns the log-odds
as identified for the global features for a track, or holistic
features.
[0061] Although the example graphical model in FIG. 7 assumes
conditional independence between the local features for a track's
clusters of 3D points, or cluster features, and the global features
for a track, or holistic features, a more sophisticated model is
described below. To begin with, however, assuming this conditional
independence, to identify the log-odds L(.omega.,z.sub.1:T) given
all local features for a track's clusters of 3D points and the
global features for a track over all timesteps (from 1 to T) for
the track, and using Bayes rule:
L ( .omega. , z 1 : T ) = log ( Y = c .omega. , z 1 : T ) ( Y
.noteq. c .omega. , z 1 : T ) = L 0 + log ( .omega. , z 1 : T Y = c
) ( .omega. , z 1 : T Y .noteq. c ) = L 0 + log ( .omega. Y = c ) (
.omega. Y .noteq. c ) + t = 1 T log ( z t Y = c ) ( z t Y .noteq. c
) = L ( .omega. ) + t = 1 T ( L ( z t ) - L 0 ) .apprxeq. H H (
.omega. ) + t = 1 T ( H C ( z t ) - L 0 C ) ( Eq . 4 )
##EQU00004##
This has the effect of placing unequal weight on the contribution
of the track-based classifier, depending on the length of the
track. Adding normalization term:
H H ( .omega. ) + 1 T t = 1 T ( H C ( z t ) - L 0 C ) ( Eq . 5 )
##EQU00005##
[0062] This still has the effect of placing equal weight on every
cluster-based classifier. Although this would be correct if the
cluster-based classifiers accurately predicted which of the object
classes that the object represented by the track's clusters of 3D
points belongs to in all cases, it has been found that the
predictive accuracy of the cluster-based classifiers significantly
increases with increasing amounts of information on the cluster of
3D points from which they are identified. In most instances, the
amounts of information on the cluster of 3D points from which a
given cluster-based classifier is identified is, or is associated
with, the amount of 3D points in the cluster. In this or other
instances, increasing amounts of information on these clusters of
3D points may be the product of closer proximity between the
vehicle 10 and the object represented by the cluster of 3D
points.
[0063] Accordingly, a weighing factor .alpha..sub.t may be applied
to the cluster-based classifiers to down-weight the cluster-based
classifiers with decreasing amounts of information on the cluster
of 3D points from which they are identified, or up-weight the
cluster-based classifiers with increasing amounts of information on
the cluster of 3D points from which they are identified, as the
case may be:
H H ( .omega. ) + 1 T t = 1 T .alpha. t ( H C ( z t ) - L 0 C ) (
Eq . 6 ) ##EQU00006##
As shown with additional reference to FIG. 8, the weighing factor
.alpha..sub.t for a given cluster-based classifier may increase
with increasing amounts of information on the cluster of 3D points
from which it is identified according to
.alpha. t = 1 - n .alpha. n .alpha. + n t ( Eq . 7 )
##EQU00007##
where n.sub.t is the number of 3D points in the cluster at time t
and n.sub..alpha. is a parameter controlling how quickly .alpha.
grows with the number of 3D points. In FIG. 8, n.sub..alpha.=250,
and it can be seen that 0.ltoreq..alpha..ltoreq.1 and .alpha.=0.5
when n.sub.t=n.sub..alpha..
[0064] Additionally, or alternatively, thresholds can be defined
and enforced for the amounts of information on the cluster of 3D
points from which a given cluster-based classifier is identified.
These thresholds may be defined and enforced, for instance, on the
amount of 3D points in the cluster (e.g., 25 3D points), the
proximity between the vehicle 10 and the object represented by the
cluster of 3D points (e.g., 30 meters), or both, and if the
thresholds are not satisfied, the cluster-based classifier can be
weighted to zero, for example, by setting the log-odds associated
to the cluster-based classifier to zero.
[0065] Returning again to the discussion of the pedestrian
(c.sub.p), bicycle (c.sub.b), vehicle (c.sub.v) and background
(C.sub.bg) object classes, for each track and timestep, with
features .omega., Z.sub.1:T, the above classification framework may
be applied to the pedestrian (c.sub.p), bicycle (c.sub.b) and
vehicle (c.sub.v) object classes, giving the one-vs-all log
odds
L(.omega.,z.sub.1:T;C.sub.p), L(.omega., z.sub.1:T; c.sub.b),
l(.omega., z.sub.1:T; c.sub.v) (Eq. 8)
predicting which of the pedestrian (c.sub.p), bicycle (c.sub.b) and
vehicle (c.sub.v) object classes that the object represented by the
track's clusters of 3D points belongs to. This may be converted to
a probability by solving
( Y = c .omega. , z 1 : T ) = L ( .omega. , z 1 : T ; c ) 1 + L (
.omega. , z 1 : T ; c ) ( Eq . 9 ) ##EQU00008##
for the probability of which of the pedestrian (c.sub.p), bicycle
(c.sub.b) and vehicle (c.sub.v) object classes that the object
represented by the track's clusters of 3D points belongs to. For
the background (c.sub.bg) object class:
(Y=c.sub.bg|.omega., z.sub.1:T)=1-.SIGMA..sub.c.di-elect
cons.c.sub.p.sub.,c.sub.b.sub.,c.sub.v(Y=c|.omega., z.sub.1:T) (Eq.
10)
[0066] With these results from the classifier, the object
represented by the track's clusters of 3D points may be classified
as belonging to its most likely object class. In operation 114, the
planning module 52 may plan how to drive the vehicle 10 along a
travel route based on the object's classification while avoiding
the object, and the autonomous operation system 20 of the vehicle
10 can drive the vehicle 10 along the route according to the
plan.
[0067] FIG. 9 shows the classifier confidence over time for an
example track in a cases where the object represented by the
track's clusters of 3D points is a bicycle. The solid lines show
the confidence for the combined results from the classifier, while
the dashed lines show the confidence for the cluster-based
classifiers and the dashed-dot lines show the confidence for the
track-based classifiers, for each of the pedestrian (c.sub.p),
bicycle (c.sub.b) vehicle (c.sub.v) and background (c.sub.bg)
object classes. For the first 120 timesteps, only the track-based
classifier contributes to the classification of the object because
there are too few 3D points in the clusters (i.e., fewer than 25).
The object is initially classified as bicycle, for the first 40
timesteps, but then is misclassified as car at a distance of 82
meters, for the next 80 timesteps, until there are enough 3D points
to use the cluster-based classifiers at a distance of 40 meters, at
which point the bicycle (c.sub.b) object class quickly wins out and
remains represented in the combined results of the classifier
despite several cluster misclassifications later.
[0068] While recited characteristics and conditions of the
invention have been described in connection with certain
embodiments, it is to be understood that the invention is not to be
limited to the disclosed embodiments but, on the contrary, is
intended to cover various modifications and equivalent arrangements
included within the spirit and scope of the appended claims, which
scope is to be accorded the broadest interpretation so as to
encompass all such modifications and equivalent structures as is
permitted under the law.
* * * * *