U.S. patent application number 15/607994 was filed with the patent office on 2018-10-11 for automatic tuning of autonomous vehicle cost functions based on human driving data.
The applicant listed for this patent is Uber Technologies, Inc.. Invention is credited to Dave Bradley, Charles Robert Hogg, III, Moslem Kazemi, Andy Lee, Chenggang Liu, Jacob Panikulam.
Application Number | 20180292830 15/607994 |
Document ID | / |
Family ID | 63711001 |
Filed Date | 2018-10-11 |
United States Patent
Application |
20180292830 |
Kind Code |
A1 |
Kazemi; Moslem ; et
al. |
October 11, 2018 |
Automatic Tuning of Autonomous Vehicle Cost Functions Based on
Human Driving Data
Abstract
The present disclosure provides systems and methods that enable
an autonomous vehicle motion planning system to learn to generate
motion plans that mimic human driving behavior. In particular, the
present disclosure provides a framework that enables automatic
tuning of cost function gains included in one or more cost
functions employed by the autonomous vehicle motion planning
system.
Inventors: |
Kazemi; Moslem; (Pittsburgh,
PA) ; Panikulam; Jacob; (Pittsburgh, PA) ;
Liu; Chenggang; (Pittsburgh, PA) ; Lee; Andy;
(Pittsburgh, PA) ; Bradley; Dave; (Pittsburgh,
PA) ; Hogg, III; Charles Robert; (Bellevue,
PA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Uber Technologies, Inc. |
San Francisco |
CA |
US |
|
|
Family ID: |
63711001 |
Appl. No.: |
15/607994 |
Filed: |
May 30, 2017 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
62482280 |
Apr 6, 2017 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G01C 21/3484 20130101;
G05D 1/0221 20130101; G01C 21/3453 20130101; G05D 1/0217 20130101;
G05D 2201/0213 20130101 |
International
Class: |
G05D 1/02 20060101
G05D001/02; G05D 1/00 20060101 G05D001/00 |
Claims
1. A computer-implemented method to automatically tune cost
function gains of an autonomous vehicle motion planning system, the
method comprising: obtaining, by one or more computing devices,
data descriptive of a humanly-executed motion plan that was
executed by a human driver during a previous humanly-controlled
vehicle driving session; generating, by the autonomous vehicle
motion planning system, an autonomous motion plan based at least in
part on a data log that includes data collected during the previous
humanly-controlled vehicle driving session, wherein generating, by
the autonomous vehicle motion planning system, the autonomous
motion plan comprises evaluating, by the autonomous vehicle motion
planning system, one or more cost functions, the one or more cost
functions including a plurality of gain values; evaluating, by the
one or more computing devices, an objective function that provides
an objective value based at least in part on a difference between a
first total cost associated with the humanly-executed motion plan
and a second total cost associated with the autonomous motion plan,
wherein evaluating, by the one or more computing devices, the
objective function comprises: inputting, by the one or more
computing devices, the humanly-executed motion plan into the one or
more cost functions of the autonomous vehicle motion planning
system to determine the first total cost associated with the
humanly-executed motion plan; and inputting, by the one or more
computing devices, the autonomous motion plan into the one or more
cost functions of the autonomous vehicle motion planning system to
determine the second total cost associated with the autonomous
motion plan; and determining, by the one or more computing devices,
at least one adjustment to at least one of the plurality of gain
values of the one or more cost functions that reduces the objective
value provided by the objective function.
2. The computer-implemented method of claim 1, wherein determining,
by the one or more computing devices, the at least one adjustment
to the at least one of the plurality of gain values comprises
iteratively optimizing, by the one or more computing devices, the
objective function.
3. The computer-implemented method of claim 2, wherein iteratively
optimizing, by the one or more computing devices, the objective
function comprises performing, by the one or more computing
devices, a subgradient technique to iteratively optimize the
objective function.
4. The computer-implemented method of claim 1, wherein evaluating,
by the one or more computing devices, the objective function
comprises evaluating, by the one or more computing devices, the
objective function that encodes a constraint that the first total
cost is less than the second total cost.
5. The computer-implemented method of claim 4, wherein evaluating,
by the one or more computing devices, the objective function
comprises applying, by the one or more computing devices, a slack
variable violation when the constraint is violated.
6. The computer-implemented method of claim 1, wherein evaluating,
by the one or more computing devices, the objective function
comprises evaluating, by the one or more computing devices, the
objective function that encodes a constraint that the difference
between the first total cost and the second total cost is greater
than or equal to a margin.
7. The computer-implemented method of claim 1, wherein evaluating,
by the one or more computing devices, the objective function
comprises evaluating, by the one or more computing devices, the
objective function that provides the objective value based at least
in part on a loss function that provides a dis-similarity value
that is descriptive of a dis-similarity between the
humanly-executed motion plan and the autonomous motion plan.
8. The computer-implemented method of claim 7, wherein evaluating,
by the one or more computing devices, the objective function
comprises evaluating, by the one or more computing devices, the
objective function that encodes a constraint that the difference
between the first total cost and the second total cost is greater
than or equal to the dis-similarity value provided by the loss
function.
9. The computer-implemented method of claim 8, wherein evaluating,
by the one or more computing devices, the objective function
comprises applying, by the one or more computing devices, a slack
variable violation when the constraint is violated.
10. The computer-implemented method of claim 1, wherein: obtaining,
by the one or more computing devices, the data descriptive of the
humanly-executed motion plan comprises obtaining, by the one or
more computing devices, the data descriptive of the
humanly-executed motion plan that was executed by the human driver
during the previous humanly-controlled vehicle driving session that
was performed in a target geographic area; generating, by the
autonomous vehicle motion planning system, the autonomous motion
plan comprises evaluating, by the autonomous vehicle motion
planning system, the one or more cost functions that include the
plurality of gain values, the plurality of gain values having been
previously tuned based on data collected from a second geographic
area that is different than the target geographic area; and
determining, by the one or more computing devices, the at least one
adjustment comprises determining, by the one or more computing
devices, the at least one adjustment to the at least one of the
plurality of gain values such that the adjusted plurality of gains
reflect driving behavior in the target geographic area.
11. The computer-implemented method of claim 1, wherein the at
least one of the plurality of gain values comprises at least one
of: a coefficient value for at least one of the one or more cost
functions; and a threshold value for at least one of the one or
more cost functions.
12. The computer-implemented method of claim 1, wherein obtaining,
by one or more computing devices, the data descriptive of the
humanly-executed motion plan comprises: obtaining, by the one or
more computing devices, the data log that includes data collected
during the previous humanly-controlled vehicle driving session,
wherein the data log includes state data for the humanly-controlled
vehicle; and fitting, by the one or more computing devices, a
trajectory to the state data for the humanly-controlled vehicle to
obtain the humanly-executed motion plan.
13. A computer system, comprising: one or more processors; and one
or more tangible, non-transitory, computer readable media that
collectively store instructions that, when executed by the one or
more processors, cause the computer system to perform operations,
the operations comprising: obtaining data descriptive of a
humanly-executed motion plan that was executed by a human driver
during a previous humanly-controlled vehicle driving session;
generating an autonomous motion plan based at least in part on a
data log that includes data collected during the previous
humanly-controlled vehicle driving session, wherein generating the
autonomous motion plan comprises evaluating one or more cost
functions to generate the autonomous motion plan, the one or more
cost functions including a plurality of gain values; evaluating an
objective function that provides an objective value based at least
in part on a difference between a first total cost associated with
the humanly-executed motion plan and a second total cost associated
with the autonomous motion plan, wherein evaluating the objective
function comprises: inputting the humanly-executed motion plan into
the one or more cost functions to determine the first total cost
associated with the humanly-executed motion plan; and inputting the
autonomous motion plan into the one or more cost functions to
determine the second total cost associated with the autonomous
motion plan; and determining at least one adjustment to at least
one of the plurality of gain values of the one or more cost
functions that reduces the objective value provided by the
objective function.
14. The computer system of claim 13, wherein determining the at
least one adjustment to the at least one of the plurality of gain
values comprises performing a subgradient method to iteratively
optimize the objective function.
15. The computer system of claim 13, wherein evaluating the
objective function comprises evaluating the objective function that
encodes a constraint that the first total cost is less than the
second total cost.
16. The computer system of claim 15, wherein evaluating the
objective function comprises applying a slack variable violation
when the constraint is violated.
17. The computer system of claim 13, wherein evaluating the
objective function comprises evaluating the objective function that
encodes a constraint that the difference between the first total
cost and the second total cost is greater than or equal to a
dis-similarity value that is descriptive of a dis-similarity
between the humanly-executed motion plan and the autonomous motion
plan.
18. A computer system, comprising: one or more processors; one or
more tangible, non-transitory, computer-readable media that
collectively store a data log that includes data collected during a
previous humanly-controlled vehicle driving session; an autonomous
vehicle motion planning system implemented by the one or more
processors, the motion planning system comprising an optimization
planner configured to optimize one or more cost functions that
include a plurality of gains to generate an autonomous motion plan
for an autonomous vehicle; and an automatic tuning system
implemented by the one or more processors, the automatic tuning
system configured to: receive an autonomous motion plan generated
by the autonomous vehicle motion planning system based at least in
part on the data collected during the previous humanly-controlled
vehicle driving session, the optimization planner having optimized
the one or more cost functions to generate the autonomous motion
plan; obtain a humanly-executed motion plan that was executed
during the previous humanly-controlled vehicle driving session; and
optimize an objective function to determine an adjustment to at
least one of the plurality of gains, wherein the objective function
provides an objective value based at least in part on a difference
between a first total cost obtained by input of the
humanly-executed motion plan into the one or more cost functions of
the autonomous vehicle motion planning system and a second total
cost obtained by input of the autonomous motion plan into the one
or more cost functions of the autonomous vehicle motion planning
system.
19. The computer system of claim 18, wherein: the objective
function encodes a constraint that the first total cost is less
than the second total cost; and violation of the constraint results
in application of a slack penalty.
20. The computer system of claim 18, wherein: the objective
function encodes a constraint that the difference between the first
total cost and the second total cost is greater than or equal to a
margin; and violation of the constraint results in application of a
slack penalty.
Description
FIELD
[0001] The present disclosure relates generally to autonomous
vehicles. More particularly, the present disclosure relates to
automatic tuning of a plurality of gains of one or more cost
functions used by a motion planning system of an autonomous
vehicle.
BACKGROUND
[0002] An autonomous vehicle is a vehicle that is capable of
sensing its environment and navigating with little or no human
input. In particular, an autonomous vehicle can observe its
surrounding environment using a variety of sensors and can attempt
to comprehend the environment by performing various processing
techniques on data collected by the sensors. Given knowledge of its
surrounding environment, the autonomous vehicle can identify an
appropriate motion path through such surrounding environment.
SUMMARY
[0003] Aspects and advantages of embodiments of the present
disclosure will be set forth in part in the following description,
or can be learned from the description, or can be learned through
practice of the embodiments.
[0004] One example aspect of the present disclosure is directed to
a computer-implemented method to automatically tune cost function
gains of an autonomous vehicle motion planning system. The method
includes obtaining, by one or more computing devices, data
descriptive of a humanly-executed motion plan that was executed by
a human driver during a previous humanly-controlled vehicle driving
session. The method includes generating, by the autonomous vehicle
motion planning system, an autonomous motion plan based at least in
part on a data log that includes data collected during the previous
humanly-controlled vehicle driving session. Generating, by the
autonomous vehicle motion planning system, the autonomous motion
plan includes evaluating, by the autonomous vehicle motion planning
system, one or more cost functions. The one or more cost functions
include a plurality of gain values. The method includes evaluating,
by the one or more computing devices, an objective function that
provides an objective value based at least in part on a difference
between a first total cost associated with the humanly-executed
motion plan and a second total cost associated with the autonomous
motion plan. Evaluating the objective function includes inputting,
by the one or more computing devices, the humanly-executed motion
plan into the one or more cost functions of the autonomous vehicle
motion planning system to determine the first total cost associated
with the humanly-executed motion plan. Evaluating the objective
function includes inputting, by the one or more computing devices,
the autonomous motion plan into the one or more cost functions of
the autonomous vehicle motion planning system to determine the
second total cost associated with the autonomous motion plan. The
method includes determining, by the one or more computing devices,
at least one adjustment to at least one of the plurality of gain
values of the one or more cost functions that reduces the objective
value provided by the objective function.
[0005] Another example aspect of the present disclosure is directed
to a computer system. The computer system includes one or more
processors and one or more tangible, non-transitory, computer
readable media that collectively store instructions that, when
executed by the one or more processors, cause the computer system
to perform operations. The operations include obtaining data
descriptive of a humanly-executed motion plan that was executed by
a human driver during a previous humanly-controlled vehicle driving
session. The operations include generating an autonomous motion
plan based at least in part on a data log that includes data
collected during the previous humanly-controlled vehicle driving
session. Generating the autonomous motion plan includes evaluating
one or more cost functions to generate the autonomous motion plan.
The one or more cost functions include a plurality of gain values.
The operations include evaluating an objective function that
provides an objective value based at least in part on a difference
between a first total cost associated with the humanly-executed
motion plan and a second total cost associated with the autonomous
motion plan. Evaluating the objective function includes inputting
the humanly-executed motion plan into the one or more cost
functions to determine the first total cost associated with the
humanly-executed motion plan. Evaluating the objective function
includes inputting the autonomous motion plan into the one or more
cost functions to determine the second total cost associated with
the autonomous motion plan. The operations include determining at
least one adjustment to at least one of the plurality of gain
values of the one or more cost functions that reduces the objective
value provided by the objective function.
[0006] Another example aspect of the present disclosure is directed
to a computer system. The computer system includes one or more
processors and one or more tangible, non-transitory,
computer-readable media that collectively store a data log that
includes data collected during a previous humanly-controlled
vehicle driving session. The computer system includes an autonomous
vehicle motion planning system implemented by the one or more
processors. The motion planning system includes an optimization
planner that is configured to optimize one or more cost functions
that include a plurality of gains to generate an autonomous motion
plan for an autonomous vehicle. The computer system includes an
automatic tuning system implemented by the one or more processors.
The automatic tuning system is configured to receive an autonomous
motion plan generated by the autonomous vehicle motion planning
system based at least in part on the data collected during the
previous humanly-controlled vehicle driving session. The
optimization planner optimized the one or more cost functions to
generate the autonomous motion plan. The automatic tuning system is
configured to obtain a humanly-executed motion plan that was
executed during the previous humanly-controlled vehicle driving
session. The automatic tuning system is configured to optimize an
objective function to determine an adjustment to at least one of
the plurality of gains. The objective function provides an
objective value based at least in part on a difference between a
first total cost obtained by input of the humanly-executed motion
plan into the one or more cost functions of the autonomous vehicle
motion planning system and a second total cost obtained by input of
the autonomous motion plan into the one or more cost functions of
the autonomous vehicle motion planning system.
[0007] Other aspects of the present disclosure are directed to
various systems, apparatuses, non-transitory computer-readable
media, user interfaces, and electronic devices.
[0008] These and other features, aspects, and advantages of various
embodiments of the present disclosure will become better understood
with reference to the following description and appended claims.
The accompanying drawings, which are incorporated in and constitute
a part of this specification, illustrate example embodiments of the
present disclosure and, together with the description, serve to
explain the related principles.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] Detailed discussion of embodiments directed to one of
ordinary skill in the art is set forth in the specification, which
makes reference to the appended figures, in which:
[0010] FIG. 1 depicts a block diagram of an example autonomous
vehicle according to example embodiments of the present
disclosure.
[0011] FIG. 2 depicts a block diagram of an example motion planning
system according to example embodiments of the present
disclosure.
[0012] FIG. 3 depicts a block diagram of an example optimization
planner according to example embodiments of the present
disclosure.
[0013] FIG. 4 depicts a block diagram of an example automatic
tuning computing system according to example embodiments of the
present disclosure.
[0014] FIG. 5 depicts a block diagram of an example automatic
tuning computing system according to example embodiments of the
present disclosure.
[0015] FIG. 6 depicts a block diagram of an example processing
pipeline to derive humanly-executed motion plans according to
example embodiments of the present disclosure.
[0016] FIG. 7 depicts a flowchart diagram of an example method to
automatically tune cost function gains according to example
embodiments of the present disclosure.
[0017] FIG. 8 depicts a flowchart diagram of an example method to
train an autonomous vehicle motion planning system to approximate
human driving behavior associated with a target geographic area
according to example embodiments of the present disclosure.
[0018] FIG. 9 depicts a flowchart diagram of an example method to
train an autonomous vehicle motion planning system to approximate
human driving behavior associated with a target driving style
profile according to example embodiments of the present
disclosure.
[0019] FIG. 10 depicts a flowchart diagram of an example method to
train an autonomous vehicle motion planning system to approximate
human driving behavior associated with a target vehicle type
according to example embodiments of the present disclosure.
DETAILED DESCRIPTION
[0020] Generally, the present disclosure is directed to systems and
methods that enable an autonomous vehicle motion planning system to
learn to generate motion plans that mimic human driving behavior.
In particular, the present disclosure provides a framework that
enables automatic tuning of cost function gains included in one or
more cost functions employed by the autonomous vehicle motion
planning system. Gains of the one or more cost functions can
include coefficients, thresholds, or other configurable parameters
of the one or more cost functions that, for example, serve to
effectuate a balance between competing concerns (e.g., in the form
of cost features) when the motion planning system generates an
autonomous motion plan for the autonomous vehicle. In particular,
the autonomous vehicle motion planning system can include an
optimization planner that iteratively optimizes over a vehicle
state space to obtain a trajectory which minimizes the total cost
(e.g., combination of one or more cost functions).
[0021] More particularly, an automatic tuning system of the present
disclosure can automatically tune the cost function gains by
minimizing or otherwise optimizing an objective function that
provides an objective value based at least in part on a difference
in respective total costs between a humanly-executed motion plan
and an autonomous motion plan generated by the autonomous vehicle
motion planning system. In particular, the automatic tuning system
can respectively input the humanly-executed motion plan and the
autonomous motion plan into the one or more cost functions used by
the optimization planner of the autonomous vehicle motion planning
system to obtain their respective total costs. The automatic tuning
system can iteratively adjust the gains of the one or more cost
functions to minimize or otherwise optimize the objective function.
In addition, in some implementations, the objective function can
encode a constraint that the difference in respective total costs
between the humanly-executed motion plan and the autonomous motion
plan is greater than or equal to a margin. For example, the margin
can be positively correlated to a degree of dis-similarity between
the humanly-executed motion plan and the autonomous motion
plan.
[0022] Thus, the systems and methods of the present disclosure
leverage the existing cost function structure used by the
optimization planner of the autonomous vehicle motion planning
system, which may, in some implementations, be or include a linear
quadratic regulator. In particular, rather than attempting to teach
the motion planning system to directly replicate the
humanly-executed trajectory within the vehicle state space, the
systems and methods of the present disclosure enable the autonomous
vehicle motion planning system to learn to generate motion plans
that mimic human driving behavior by optimizing or otherwise
adjusting the gains of the one or more cost functions that are
already used by the optimization planner of the autonomous vehicle
motion planning system.
[0023] After such automatic tuning, the autonomous vehicle motion
planning system will produce motion plans for the autonomous
vehicle that more closely resemble human driving behavior. In
particular, the systems and methods of the present disclosure can
adjust the cost function gains to approximate a human judgment of
the appropriate balance of competing cost features that is
implicitly exhibited by the humanly-executed motion plan.
Therefore, the autonomous driving performed by the tuned autonomous
vehicle will feel more natural and comfortable to a human passenger
and/or drivers of adjacent vehicles. Likewise, the time-consuming
requirement to manually tune the cost function gains can be
eliminated, while producing superior results. In addition,
automatic tuning enables the exploration and identification of new
cost features. Finally, in example applications, the systems and
methods of the present disclosure can train a motion planning
system of an autonomous vehicle to generate motion plans that
approximate the driving behavior exhibited by the human residents
of a particular target geographic area (e.g., Pittsburgh, Pa.
versus Phoenix, Ariz.); different human driving behavior profiles
(e.g., sporty versus cautious); and/or different driving behaviors
exhibited by human operators of different vehicle types (e.g.,
sedan versus sports utility vehicle versus large truck).
[0024] More particularly, in some implementations, an autonomous
vehicle can be a ground-based autonomous vehicle (e.g., car, truck,
bus, etc.), an air-based autonomous vehicle (e.g., airplane, drone,
helicopter, or other aircraft), or other types of vehicles (e.g.,
watercraft). The autonomous vehicle can include a computing system
that assists in controlling the autonomous vehicle. In some
implementations, the autonomous vehicle computing system can
include a perception system, a prediction system, and a motion
planning system that cooperate to perceive the surrounding
environment of the autonomous vehicle and determine a motion plan
for controlling the motion of the autonomous vehicle
accordingly.
[0025] In particular, in some implementations, the perception
system can receive sensor data from one or more sensors that are
coupled to or otherwise included within the autonomous vehicle. As
examples, the one or more sensors can include a Light Detection and
Ranging (LIDAR) system, a Radio Detection and Ranging (RADAR)
system, one or more cameras (e.g., visible spectrum cameras,
infrared cameras, etc.), and/or other sensors. The sensor data can
include information that describes the location of objects within
the surrounding environment of the autonomous vehicle.
[0026] In addition to the sensor data, the perception system can
retrieve or otherwise obtain map data that provides detailed
information about the surrounding environment of the autonomous
vehicle. The map data can provide information regarding: the
identity and location of different roadways, road segments,
buildings, or other items; the location and directions of traffic
lanes (e.g., the location and direction of a parking lane, a
turning lane, a bicycle lane, or other lanes within a particular
roadway); traffic control data (e.g., the location and instructions
of signage, traffic lights, or other traffic control devices);
and/or any other map data that provides information that assists
the computing system in comprehending and perceiving its
surrounding environment and its relationship thereto.
[0027] The perception system can identify one or more objects that
are proximate to the autonomous vehicle based on sensor data
received from the one or more sensors and/or the map data. In
particular, in some implementations, the perception system can
provide, for each object, state data that describes a current state
of such object. As examples, the state data for each object can
describe an estimate of the object's: current location (also
referred to as position); current speed (also referred to as
velocity); current acceleration, current heading; current
orientation; size/footprint (e.g., as represented by a bounding
polygon); class (e.g., vehicle vs. pedestrian vs. bicycle), and/or
other state information.
[0028] According to an aspect of the present disclosure, the
prediction system can receive the state data and can predict one or
more future locations for the object(s) identified by the
perception system. For example, various prediction techniques can
be used to predict the one or more future locations for the
object(s) identified by the perception system. The prediction
system can provide the predicted future locations of the objects to
the motion planning system.
[0029] The motion planning system can determine a motion plan for
the autonomous vehicle based at least in part on the state data
provided by the perception system and/or the predicted one or more
future locations for the objects. Stated differently, given
information about the current locations of proximate objects and/or
predictions about the future locations of proximate objects, the
motion planning system can determine a motion plan for the
autonomous vehicle that best navigates the vehicle relative to the
objects at their current and/or future locations.
[0030] As an example, in some implementations, the motion planning
system operates to generate a new autonomous motion plan for the
autonomous vehicle multiple times per second. Each new autonomous
motion plan can describe motion of the autonomous vehicle over the
next several seconds (e.g., 5 seconds). Thus, in some example
implementations, the motion planning system continuously operates
to revise or otherwise generate a short-term motion plan based on
the currently available data.
[0031] In some implementations, the motion planning system can
include an optimization planner that, for each instance of
generating a new motion plan, searches (e.g., iteratively searches)
over a motion planning space (e.g., a vehicle state space) to
identify a motion plan that optimizes (e.g., locally optimizes) a
total cost associated with the motion plan, as provided by one or
more cost functions. For example, the motion plan can include a
series of vehicle states and/or a series of controls to achieve the
series of vehicle states. A vehicle state can include the
autonomous vehicle's current location (also referred to as
position); current speed (also referred to as velocity); current
acceleration, current heading; current orientation; and/or other
state information. As an example, in some implementations, the
optimization planner can be or include an iterative linear
quadratic regulator or similar iterative solver.
[0032] Once the optimization planner has identified the optimal
motion plan (or some other iterative break occurs), the optimal
candidate motion plan can be selected and executed by the
autonomous vehicle. For example, the motion planning system can
provide the selected motion plan to a vehicle controller that
controls one or more vehicle controls (e.g., actuators that control
gas flow, steering, braking, etc.) to execute the selected motion
plan until the next motion plan is generated.
[0033] According to an aspect of the present disclosure, the motion
planning system can employ or otherwise include one or more cost
functions that, when evaluated, provide a total cost for a
particular candidate motion plan. The optimization planner can
search over a motion planning space (e.g., a vehicle state space)
to identify a motion plan that optimizes (e.g., locally optimizes)
the total cost provided by the one or more cost functions.
[0034] In some implementations, different cost function(s) can be
used depending upon a particular scenario that is selected by the
motion planning system. For example, the motion planning system can
include a plurality of scenario controllers that detect certain
scenarios (e.g., a changing lanes scenario versus a queueing
scenario) and guide the behavior of the autonomous vehicle
according to the selected scenario. Different sets of one or more
cost functions can correspond to the different possible scenarios
and the cost function(s) corresponding to the selected scenario can
be loaded and used by the motion planning system at each instance
of motion planning.
[0035] In addition, according to another aspect of the present
disclosure, the one or more cost functions used by the motion
planning system can include a plurality of gains. Gains of the one
or more cost functions can include coefficients, thresholds, or
other configurable parameters of the one or more cost functions.
For example, the cost function gains can serve to effectuate a
balance between competing concerns (e.g., in the form of cost
features) when the motion planning system generates an autonomous
motion plan for the autonomous vehicle.
[0036] To provide an example for the purpose of illustration: an
example cost function can provide, among other costs, a first cost
that is negatively correlated to a magnitude of a first distance
from the autonomous vehicle to a lane boundary. Thus, if a
candidate motion plan approaches a lane boundary, the first cost
increases, thereby discouraging (e.g., through increased cost
penalization) the autonomous vehicle from selecting motion plans
that come close to or cross over lane boundaries. The magnitude of
the first distance from the autonomous vehicle to the lane boundary
can be referred to as a "feature." The example cost function
provides the first cost based on such feature. In particular, the
example cost function includes a number of configurable parameters,
including, for example, a threshold gain value that describes a
certain magnitude of the first distance at which the first cost
becomes greater than zero, a coefficient gain value that influences
a rate at which the first cost increases as the magnitude of the
first distance decreases, and/or other configurable parameters. As
another example, the example cost function might provide, among
other costs, a second cost that is negatively correlated to a
magnitude of a second distance from the autonomous vehicle to a
pedestrian. Thus, the motion planning system is discouraged from
selecting motion plans that approach pedestrians. Again, the
magnitude of the second distance can be referred to as a feature
and the cost function can include a number of gains that control
the influence of such feature on the total cost. In particular, the
respective gains of the second cost and the first cost will
effectuate a certain balance between the second cost and the first
cost (e.g., it is more important to avoid approaching a pedestrian
than it is to avoid crossing a lane boundary).
[0037] The example cost function described above is provided only
as an example cost function to illustrate the principles of
features, gains, and costs. Many other and different cost functions
with different features and costs can be employed in addition or
alternatively to the example cost function described above. In some
optimization-based implementations, the cost function(s) should be
C1 continuous in state variables at each time step. In addition,
while only a first cost and a second cost are described above with
respect to the example cost function, the cost functions of the
present disclosure can include any number (e.g., hundreds) of
different features, gains, and costs. As examples, additional costs
can be assessed based on dynamics, speed limits, crosstrack (e.g.,
deviation from a center line of a lane), end of path, stop sign,
traffic light, adaptive cruise control, static obstacles, etc. In
some implementations, the cost function(s) are quadratic, linear,
or a combination thereof. Furthermore, in some implementations, the
cost function(s) can include a portion that provides a reward
rather than a cost. For example, the reward can be of opposite sign
to cost(s) provided by other portion(s) of the cost function.
Example rewards can be provided for distance traveled, velocity, or
other forms of progressing toward completion of a route.
[0038] In some instances which contrast with the automatic tuning
of the present disclosure, the gains of the cost function(s) can be
manually tuned. Adding and tuning gains of a new cost function
and/or tuning gains of existing cost function(s) is a tedious and
labor/time intensive manual process. Manual tuning can require:
designing the cost function; using intuition to come up with some
"good" initial guess for the gains of the cost function; running
use of the cost function through a simulation; performing a
development test; modifying the gains based on the initial results;
running use of the cost function through an additional simulation,
performing an additional development test; and/or other actions. In
particular, this sequence of testing and modifying actions can be
repeated indefinitely until the desired behavior emerges. This is a
difficult, impractical, and un-scalable process. In particular, as
the number of cost functions and/or associated cost features
increase, this process becomes extremely complex and
interdependent.
[0039] In view of the above, the present disclosure provides a
framework that enables automatic tuning of cost function gains
included in one or more cost functions employed by the autonomous
vehicle motion planning system. In particular, the systems and
methods of the present disclosure can enable imitation learning
based on one or more humanly-executed motion plans that were
executed by a human driver during one or more humanly-controlled
driving sessions.
[0040] Thus, in some implementations, high quality
humanly-controlled driving sessions can be identified and selected
for use as a "gold-standard" for imitation training of the
autonomous vehicle motion planning system. For example, driving
sessions can be considered high quality if they illustrate or
otherwise exhibit good or otherwise appropriate human driving
behavior. Particular humanly-controlled driving sessions can be
identified as high quality and selected for use according to any
number of metrics including, for example, ride quality scoring
metrics. Example ride quality scoring metrics include automated
scoring metrics that automatically identify certain driving events
(e.g., undesirable events such as jerking events or heavy braking
events) and provide a corresponding score and/or manual scoring
metrics such as human passenger feedback or scoring based on human
passenger feedback. Particular humanly-controlled driving sessions
can be also identified as high quality and selected for use
according to driver reputation or other factors.
[0041] According to an aspect of the present disclosure, one or
more session logs can be respectively associated with the one or
more humanly-controlled driving sessions that were selected for use
in performing automatic tuning. Each session log can include any
data that was acquired by the vehicle or its associated sensors
during the corresponding driving session. In particular, the
session log can include the various types of sensor data described
above with reference to the perception system. Thus, even though
the vehicle was being manually controlled, the sensors and/or any
other vehicle systems can still operate as if the vehicle was
operating autonomously and the corresponding data can be recorded
and stored in the session log. The session log can also include
various other types of data alternatively or in addition to sensor
data. For example, the session log can include vehicle control data
(e.g., the position or control parameters of actuators that control
gas flow, steering, braking, etc.) and/or vehicle state data (e.g.,
vehicle location, speed, acceleration, heading, orientation, etc.)
for any number of timestamps or sampling points.
[0042] In some implementations, the session log for each of the one
or more humanly-controlled driving sessions can directly include
the humanly-executed motion plans that were executed by the human
driver during such driving session. For example, the session log
can directly include vehicle state data, vehicle control data,
and/or vehicle trajectory data that can be sampled (e.g., in a
window fashion) to form humanly-executed motion plans.
[0043] In other implementations, the humanly-executed motion plans
can be derived from the session logs. For example, the session logs
may not directly include motion plans but may include information
sufficient to derive motion plans. In particular, in some
implementations, the automatic tuning systems of the present
disclosure can include a trajectory fitter. The trajectory fitter
can operate to fit full trajectory profiles to autonomous vehicle
partial states. For example, the trajectory fitter can identify the
most reliable fields from the logged vehicle states to generate
full trajectory profiles (e.g., including higher derivatives) which
match the vehicle partial states as closely as possible. As such,
the humanly-executed motion plans can be derived from the session
logs.
[0044] Regardless, the automatic tuning system can obtain one or
more humanly-executed motion plans that can be used as a
"gold-standard" for imitation training of the autonomous vehicle
motion planning system. To perform such imitation training, the
automatic tuning system can employ the autonomous vehicle motion
planning system to generate autonomous motion plans based on the
humanly-controlled driving session logs.
[0045] In particular, according to another aspect of the present
disclosure, the data from the humanly-controlled driving session
logs can be provided as input to an autonomous vehicle computing
system, which can include various systems such as, for example, a
perception system, a prediction system, and/or a motion planning
system as described above. The systems of the autonomous vehicle
computing system can process the data from the humanly-controlled
driving session logs as if it was being collected by an autonomous
vehicle during autonomous operation and, in response to the data
from the humanly-controlled driving session logs, output one or
more autonomous motion plans. Stated differently, the autonomous
vehicle computing system can generate autonomous motion plans as if
it were attempting to autonomously operate through the environment
described by the data from the humanly-controlled driving session
logs. As described above, generating these autonomous motion plans
can include implementing an optimization planner to optimize over
one or more cost functions that include a plurality of gains. Thus,
the autonomous motion plans provide an insight into how the
autonomous vehicle would react or otherwise operate in the same
situations or scenarios that were encountered by the human driver
during the previous humanly-controlled driving sessions.
[0046] According to another aspect of the present disclosure, the
systems and methods of the present disclosure can automatically
tune the cost function gains by minimizing or otherwise optimizing
an objective function. In particular, the objective function can
provide an objective value based at least in part on a difference
between a first total cost associated with the humanly-executed
motion plan and a second total cost associated with the autonomous
motion plan. As such, evaluating the objective function can include
inputting the humanly-executed motion plan into the one or more
cost functions of the autonomous vehicle motion planning system to
determine the first total cost associated with the humanly-executed
motion plan and inputting the autonomous motion plan into the one
or more cost functions of the autonomous vehicle motion planning
system to determine the second total cost associated with the
autonomous motion plan. More particularly, in some implementations,
a training dataset can include a plurality of pairs of motion
plans, where each pair includes a humanly-executed motion plan and
a corresponding autonomous motion plan. The objective function can
be optimized over all of the plurality of pairs of motion plans
included in the training dataset.
[0047] In some implementations, the objective function can be
crafted according to an approach known as Maximum Margin Planning.
In particular, the objective function can be crafted to enable an
optimization approach that allows imitation learning in which
humanly-executed motion plan examples are used to inform the cost
function gains. In some implementations, the objective function and
associated optimization approach can operate according to a number
of assumptions. For example, in some implementations, it can be
assumed that the one or more cost functions of the autonomous
vehicle motion planning system are linear (e.g., linear in its
features).
[0048] According to another aspect of the present disclosure, in
some implementations, the objective function can encode or
otherwise include one or more constraints. For example, in some
implementations, the objective function can encode a first
constraint that the first total cost associated with the
humanly-executed motion plan is less than the second total cost
associated with the autonomous motion plan. In effect, this first
constraint reflects an assumption that the humanly-executed motion
plan is optimal. Therefore, any autonomous motion plan generated by
the autonomous vehicle motion planning system will necessarily have
a higher total cost.
[0049] In some implementations, in addition or alternatively to the
first constraint described above, the objective function can encode
a second constraint that the difference between the first total
cost and the second total cost is greater than or equal to a
margin. In some implementations, the margin can be based on or
equal to a dis-similarity value provided by a loss function. The
dis-similarity value can be descriptive of a dis-similarity between
the humanly-executed motion plan and the autonomous motion plan.
For example, a larger dis-similarity value can indicate that the
plans are more dis-similar (i.e., less similar) while a smaller
dis-similarity value can indicate that the plans are less
dis-similar (i.e., more similar). In some implementations, the loss
function can compare the humanly-executed motion plan to the
autonomous motion plan and output a real positive number as the
dis-similarity value.
[0050] In effect, this second constraint that the difference
between the first total cost and the second total cost be greater
than or equal to the margin reflects the assumption that, if the
plans are dis-similar, then the humanly-executed motion plan is
expected to have a significantly lower cost than the corresponding
autonomous motion plan. Stated differently, the humanly-executed
motion plan is expected to be significantly better in terms of cost
if the plans are significantly differently. By contrast, if the
plans are quite similar, then their respective costs are expected
to be relatively close. Thus, a distinction can be made between
similar plans and dis-similar plans.
[0051] However, in some instances, it may be not be possible to
satisfy one or more of the constraints encoded in the objective
function. For example, if the margin (e.g., as provided by the loss
function) is made relatively strong, it may not be possible to meet
the constraints for every pair of plans included in the training
dataset. To account for this issue, a slack variable can be
included to account for the occasional violation. In particular,
when one or more of the constraints are violated, a slack variable
penalty can be applied; while no penalty is applied if all
constraints are met.
[0052] As noted above, the objective function can be minimized or
otherwise optimized to automatically tune the cost function gains.
That is, the gains can be iteratively adjusted to optimize the
objective function and the ultimate gain values that optimize the
objective function can themselves be viewed as optimal or otherwise
"tuned". In some implementations, the objective function can be
convex, but non-differentiable. In some implementations, a
subgradient technique can be used to optimize the objective
function. In some implementations, the objective function can
enable guaranteed convergence to an optimal value for a small
enough step size. In some implementations, optimization of the
objective function can be similar to stochastic gradient descent
with the added concept of margins.
[0053] In some implementations, the automatic tuning system can
identify and reject or otherwise discard outlying pairs of motion
plans. In particular, in one example, if the dis-similarity value
(or some other measure of similarity) for a given pair of
humanly-executed plan and corresponding autonomous motion plan
exceeds a certain value, such pair of plans can be identified as an
outlier and removed from the training dataset. As another example,
if the difference between the total costs respectively associated
with a given pair of humanly-executed plan and corresponding
autonomous motion plan exceeds a certain value, then such pair of
plans can be identified as an outlier and removed from the training
dataset. One reason for such outlier identification is that, as
described above, different cost function(s) can be used depending
upon a particular scenario that is selected by the motion planning
system (e.g., a changing lanes scenario versus a queueing
scenario). Thus, if the autonomous vehicle motion planning system
selected a different scenario than was performed by the human
driver, then the automatic tuning system will be unable to match
such pair of plans. As yet another example of outlier
identification, if the optimization planner fails to converge, the
corresponding data and humanly-executed plan can be removed from
the dataset.
[0054] Thus, the present disclosure provides a framework that
enables automatic tuning of cost function gains included in one or
more cost functions employed by an autonomous vehicle motion
planning system. One technical effect and benefit of the present
disclosure is improved control of and performance by autonomous
vehicles. In particular, since the systems and methods of the
present disclosure can adjust the cost function gains to
approximate a human judgment of the appropriate balance of
competing cost features, the autonomous driving performed by the
tuned autonomous vehicle will feel more natural and comfortable to
a human passenger and, further, will more closely meet the
expectations of the human drivers of adjacent vehicles.
[0055] As another technical effect and benefit, the time-consuming
requirement to manually tune the cost function gains can be
eliminated, while producing superior tuning results. As another
technical effect and benefit, automatic tuning enables the
exploration and identification of new cost features. For example,
newly created features can easily be introduced and tuned, without
disrupting the highly interdependent cost balance of all other
features. Likewise, if an automatically tuned autonomous vehicle
motion planning is unable to approximate human driving performance,
it can be assumed that certain features that are important to human
drivers are simply not reflected in the existing cost function.
Therefore, the present disclosure provides automatic detection of
such instances which can lead to improved identification and
formulation of cost features.
[0056] Another example technical effect and benefit provided in at
least some implementations of the present disclosure leverages the
unique and novel concept of applying optimization principles to the
cost functions of a linear quadratic regulator-based motion
planner. In particular, the gains of the existing cost function
structure used by the linear quadratic regulator can be optimized
based on human driving data. Thus, rather than learning to mimic
trajectories, the linear quadratic regulator-based motion planner
can learn a cost structure that guides or causes selection of
optimal trajectories.
[0057] Furthermore, in one example application, the systems and
methods of the present disclosure can train a motion planning
system of an autonomous vehicle to generate motion plans that
approximate the driving behavior exhibited by the human residents
of a particular target geographic area. For example, an existing
autonomous vehicle motion planning system may have been tuned
(e.g., automatically and/or manually) based on driving data or
other testing data associated with a first geographic area. Thus,
based on such tuning, the autonomous vehicle may be capable of
approximating good human driving performance in such first
geographic area.
[0058] However, the residents of different geographic areas have
different driving styles. In addition, different geographic areas
present different driving scenarios and challenges. Thus, an
autonomous vehicle specifically tuned for performance in a first
geographic area may exhibit decreased performance quality when
autonomously driving in a second geographic area that is different
than the first geographic area.
[0059] Thus, in one example application of the present disclosure,
the gains of the autonomous vehicle motion planning system can be
automatically tuned based on humanly-controlled driving session
logs (and corresponding humanly-executed motion plans) that were
collected during humanly-controlled driving sessions that were
performed in a target geographic area (e.g., the second geographic
area).
[0060] To provide an example for the purpose of illustration, an
autonomous vehicle motion planning system tuned based on data and
testing in Pittsburgh, Pa., USA may approximate human driving
behavior that is appropriate in Pittsburgh. However, in some
instances, such vehicle may not approximate the human driving
behavior that is commonplace and appropriate in Manila,
Philippines. For example, human drivers in Manila may be less
averse to changing lanes, drive closer together,
accelerate/decelerate faster, etc. Thus, to automatically tune the
autonomous vehicle for autonomous driving in Manila, a human driver
can operate a vehicle in Manila to generate a humanly-controlled
session log that is indicative of appropriate human driving
behavior in Manila (that is, driving behavior that is "good"
driving from the perspective of a Manila resident or driver). The
cost function gains of the autonomous vehicle can be automatically
tuned based on such Manila session logs. After tuning, the
autonomous vehicle motion planning system can generate autonomous
motion paths that approximate appropriate human driving behavior in
Manila. In other implementations, it is not required that the human
driver actually be physically located in Manila, but instead that
the driver simply operate the vehicle in the style of the residents
Manila to generate the Manila session logs.
[0061] According to another aspect, a plurality of sets of tuned
gains that respectively correspond to a plurality of different
locations can be stored in memory. A particular set of gains can be
selected based on the location of the autonomous vehicle and the
selected set of gains can be loaded into the autonomous vehicle
motion planning system for use, thereby enabling an autonomous
vehicle to change driving behavior based on its current
location.
[0062] In another example application of the present disclosure,
the systems and methods of the present disclosure can train a
motion planning system of an autonomous vehicle to generate motion
plans that approximate one of a plurality of different human
driving behavior profiles. For example, human drivers can be
requested to operate vehicles according to different human driving
behavior profiles (e.g., sporty versus cautious). A corpus of
humanly-controlled session logs can be collected for each driving
behavior profile. Thereafter, the cost function gains of an
autonomous vehicle motion planning system can be automatically
tuned to approximate one of the driving behavior profiles. For
example, the cost function gains of an autonomous vehicle motion
planning system can be automatically tuned based on session logs
that correspond to sporting human driving behavior. Thereafter, the
tuned autonomous vehicle motion planning system can generate
autonomous motion plans that fit the sporty driving behavior
profile.
[0063] In one example implementation of the above, a plurality of
different sets of gains that respectively correspond to the
different human driving behavior profiles can be respectively
automatically tuned and then stored in memory. A passenger of the
autonomous vehicle can select (e.g., through an interface of the
autonomous vehicle) which of the human driving behavior profiles
they would like to autonomous vehicle to approximate. In response,
the autonomous vehicle can load the particular gains associated
with the selected behavior profile and can generate autonomous
motion plans using such gains. Therefore, a human passenger can be
given the ability to select the style of driving that she
prefers.
[0064] In another example application of the present disclosure,
the systems and methods of the present disclosure can train a
motion planning system of an autonomous vehicle to generate motion
plans that approximate driving behaviors exhibited by human
operators of different vehicle types (e.g., sedan versus sports
utility vehicle versus delivery truck). For example, human drivers
can be requested to operate different vehicle types or models. A
corpus of humanly-controlled session logs can be collected for each
vehicle type or model. Thereafter, the cost function gains of an
autonomous vehicle motion planning system can be automatically
tuned to approximate human driving of one of the vehicle types or
model. For example, the cost function gains of an autonomous
vehicle motion planning system can be automatically tuned based on
session logs that correspond to human operation of a delivery
truck.
[0065] To provide an example for the purpose of illustration, an
autonomous vehicle motion planning system tuned based on data and
testing performed by a sedan may approximate human driving behavior
that is appropriate for driving a sedan. However, in some
instances, such motion planning system may not provide autonomous
motion plans that are appropriate for a large truck. For example,
human drivers of large trucks might take wider turns, leave more
space between the nearest vehicle, apply braking earlier, etc.
Thus, to automatically tune the autonomous vehicle motion planning
system for use in a large truck, a human driver can operate a large
truck to generate a humanly-controlled session log that is
indicative of appropriate human driving behavior in a large truck.
The cost function gains of the autonomous vehicle can be
automatically tuned based on such large truck human driving session
logs. After tuning, the autonomous vehicle motion planning system
can generate autonomous motion paths that approximate appropriate
human driving behavior for large trucks, rather than sedans.
[0066] Thus, the present disclosure provides techniques that enable
a computing system to automatically tune cost function of gains,
which was heretofore unobtainable using existing computers or
control systems. Therefore, the present disclosure improves the
operation of an autonomous vehicle computing system and the
autonomous vehicle it controls. Stated differently, the present
disclosure provides a particular solution to the problem of tuning
cost function gains and provides a particular way to achieve the
desired outcome.
[0067] With reference now to the Figures, example embodiments of
the present disclosure will be discussed in further detail.
Example Devices and Systems
[0068] FIG. 1 depicts a block diagram of an example autonomous
vehicle 10 according to example embodiments of the present
disclosure. The autonomous vehicle 10 is capable of sensing its
environment and navigating without human input. The autonomous
vehicle 10 can be a ground-based autonomous vehicle (e.g., car,
truck, bus, etc.), an air-based autonomous vehicle (e.g., airplane,
drone, helicopter, or other aircraft), or other types of vehicles
(e.g., watercraft).
[0069] The autonomous vehicle 10 includes one or more sensors 101,
a vehicle computing system 102, and one or more vehicle controls
107. The vehicle computing system 102 can assist in controlling the
autonomous vehicle 10. In particular, the vehicle computing system
102 can receive sensor data from the one or more sensors 101,
attempt to comprehend the surrounding environment by performing
various processing techniques on data collected by the sensors 101,
and generate an appropriate motion path through such surrounding
environment. The vehicle computing system 102 can control the one
or more vehicle controls 107 to operate the autonomous vehicle 10
according to the motion path.
[0070] The vehicle computing system 102 includes one or more
processors 112 and a memory 114. The one or more processors 112 can
be any suitable processing device (e.g., a processor core, a
microprocessor, an ASIC, a FPGA, a controller, a microcontroller,
etc.) and can be one processor or a plurality of processors that
are operatively connected. The memory 114 can include one or more
non-transitory computer-readable storage mediums, such as RAM, ROM,
EEPROM, EPROM, flash memory devices, magnetic disks, etc., and
combinations thereof. The memory 114 can store data 116 and
instructions 118 which are executed by the processor 112 to cause
vehicle computing system 102 to perform operations.
[0071] As illustrated in FIG. 1, the vehicle computing system 102
can include a perception system 103, a prediction system 104, and a
motion planning system 105 that cooperate to perceive the
surrounding environment of the autonomous vehicle 10 and determine
a motion plan for controlling the motion of the autonomous vehicle
10 accordingly.
[0072] In particular, in some implementations, the perception
system 103 can receive sensor data from the one or more sensors 101
that are coupled to or otherwise included within the autonomous
vehicle 10. As examples, the one or more sensors 101 can include a
Light Detection and Ranging (LIDAR) system, a Radio Detection and
Ranging (RADAR) system, one or more cameras (e.g., visible spectrum
cameras, infrared cameras, etc.), and/or other sensors. The sensor
data can include information that describes the location of objects
within the surrounding environment of the autonomous vehicle
10.
[0073] As one example, for a LIDAR system, the sensor data can
include the location (e.g., in three-dimensional space relative to
the LIDAR system) of a number of points that correspond to objects
that have reflected a ranging laser. For example, a LIDAR system
can measure distances by measuring the Time of Flight (TOF) that it
takes a short laser pulse to travel from the sensor to an object
and back, calculating the distance from the known speed of
light.
[0074] As another example, for a RADAR system, the sensor data can
include the location (e.g., in three-dimensional space relative to
the RADAR system) of a number of points that correspond to objects
that have reflected a ranging radio wave. For example, radio waves
(e.g., pulsed or continuous) transmitted by the RADAR system can
reflect off an object and return to a receiver of the RADAR system,
giving information about the object's location and speed. Thus, a
RADAR system can provide useful information about the current speed
of an object.
[0075] As yet another example, for one or more cameras, various
processing techniques (e.g., range imaging techniques such as, for
example, structure from motion, structured light, stereo
triangulation, and/or other techniques) can be performed to
identify the location (e.g., in three-dimensional space relative to
the one or more cameras) of a number of points that correspond to
objects that are depicted in imagery captured by the one or more
cameras. Other sensor systems can identify the location of points
that correspond to objects as well.
[0076] As another example, the one or more sensors 101 can include
a positioning system. The positioning system can determine a
current position of the vehicle 10. The positioning system can be
any device or circuitry for analyzing the position of the vehicle
10. For example, the positioning system can determine position by
using one or more of inertial sensors, a satellite positioning
system, based on IP address, by using triangulation and/or
proximity to network access points or other network components
(e.g., cellular towers, WiFi access points, etc.) and/or other
suitable techniques. The position of the vehicle 10 can be used by
various systems of the vehicle computing system 102.
[0077] Thus, the one or more sensors 101 can be used to collect
sensor data that includes information that describes the location
(e.g., in three-dimensional space relative to the autonomous
vehicle 10) of points that correspond to objects within the
surrounding environment of the autonomous vehicle 10.
[0078] In addition to the sensor data, the perception system 103
can retrieve or otherwise obtain map data 126 that provides
detailed information about the surrounding environment of the
autonomous vehicle 10. The map data 126 can provide information
regarding: the identity and location of different travelways (e.g.,
roadways), road segments, buildings, or other items or objects
(e.g., lampposts, crosswalks, curbing, etc.); the location and
directions of traffic lanes (e.g., the location and direction of a
parking lane, a turning lane, a bicycle lane, or other lanes within
a particular roadway or other travelway); traffic control data
(e.g., the location and instructions of signage, traffic lights, or
other traffic control devices); and/or any other map data that
provides information that assists the computing system 102 in
comprehending and perceiving its surrounding environment and its
relationship thereto.
[0079] The perception system 103 can identify one or more objects
that are proximate to the autonomous vehicle 10 based on sensor
data received from the one or more sensors 101 and/or the map data
126. In particular, in some implementations, the perception system
103 can determine, for each object, state data that describes a
current state of such object. As examples, the state data for each
object can describe an estimate of the object's: current location
(also referred to as position); current speed (also referred to as
velocity); current acceleration; current heading; current
orientation; size/footprint (e.g., as represented by a bounding
shape such as a bounding polygon or polyhedron); class (e.g.,
vehicle versus pedestrian versus bicycle versus other); yaw rate;
and/or other state information. According to one example notation,
the state of the vehicle x can be within a state space S. That is,
x.di-elect cons.S.
[0080] In some implementations, the perception system 103 can
determine state data for each object over a number of iterations.
In particular, the perception system 103 can update the state data
for each object at each iteration. Thus, the perception system 103
can detect and track objects (e.g., vehicles) that are proximate to
the autonomous vehicle 10 over time.
[0081] The prediction system 104 can receive the state data from
the perception system 103 and predict one or more future locations
for each object based on such state data. For example, the
prediction system 104 can predict where each object will be located
within the next 5 seconds, 10 seconds, 20 seconds, etc. As one
example, an object can be predicted to adhere to its current
trajectory according to its current speed. As another example,
other, more sophisticated prediction techniques or modeling can be
used.
[0082] The motion planning system 105 can determine a motion plan
for the autonomous vehicle 10 based at least in part on the
predicted one or more future locations for the object and/or the
state data for the object provided by the perception system 103.
Stated differently, given information about the current locations
of objects and/or predicted future locations of proximate objects,
the motion planning system 105 can determine a motion plan for the
autonomous vehicle 10 that best navigates the autonomous vehicle 10
relative to the objects at such locations.
[0083] In particular, according to an aspect of the present
disclosure, the motion planning system 105 can evaluate one or more
cost functions for each of one or more candidate motion plans for
the autonomous vehicle 10. For example, the cost function(s) can
describe a cost (e.g., over time) of adhering to a particular
candidate motion plan and/or describe a reward for adhering to the
particular candidate motion plan. For example, the reward can be of
opposite sign to the cost.
[0084] More particularly, to evaluate the one or more cost
functions, the motion planning system 105 can determine a plurality
of features that are within a feature space. For example, the
status of each feature can be derived from the state of the vehicle
and/or the respective states of other objects or aspects of the
surrounding environment. According to one example notation, the
plurality of features are within a feature space as follows:
F.sub.x.di-elect cons.F.
[0085] The motion planning system 105 can determine the plurality
of features for each vehicle state included in the current
candidate motion plan. In particular, according to one example
notation, a candidate motion plan P can be expressed as a series of
vehicle states, as follows: P={x.sub.0, . . . , x.sub.n}. The
motion planning system 105 can determine the plurality of features
for each vehicle state included in the candidate motion plan.
[0086] The motion planning system 105 can evaluate one or more cost
functions based on the determined features. For example, in some
implementations, the one or more cost functions can include a
respective linear cost for each feature at each state. According to
one example notation, the linear cost for the features at each
state can be expressed as follows: C(F.sub.x)=w.sup.TF.sub.x, where
w.sup.T are a set of cost function gains. Although gains w.sup.T
are used as coefficients in the example linear cost function, gains
of the one or more cost functions can also include thresholds or
other configurable parameters of the one or more cost functions
that, for example, serve to effectuate a balance between competing
concerns (e.g., in the form of cost features F.sub.x) when the
motion planning system generates an autonomous motion plan for the
autonomous vehicle.
[0087] Thus, according to one example notation, and in some
implementations, the total cost of a candidate motion plan can be
expressed as follows:
C ( P ) = x .di-elect cons. P C ( F x ) = x .di-elect cons. P w T F
x ##EQU00001##
[0088] The motion planning system 105 can iteratively optimize the
one or more cost functions to minimize a total cost associated with
the candidate motion plan. For example, the motion planning system
105 can include an optimization planner that iteratively optimizes
the one or more cost functions.
[0089] Following optimization, the motion planning system 105 can
provide the optimal motion plan to a vehicle controller 106 that
controls one or more vehicle controls 107 (e.g., actuators or other
devices that control gas flow, steering, braking, etc.) to execute
the optimal motion plan.
[0090] Each of the perception system 103, the prediction system
104, the motion planning system 105, and the vehicle controller 106
can include computer logic utilized to provide desired
functionality. In some implementations, each of the perception
system 103, the prediction system 104, the motion planning system
105, and the vehicle controller 106 can be implemented in hardware,
firmware, and/or software controlling a general purpose processor.
For example, in some implementations, each of the perception system
103, the prediction system 104, the motion planning system 105, and
the vehicle controller 106 includes program files stored on a
storage device, loaded into a memory and executed by one or more
processors. In other implementations, each of the perception system
103, the prediction system 104, the motion planning system 105, and
the vehicle controller 106 includes one or more sets of
computer-executable instructions that are stored in a tangible
computer-readable storage medium such as RAM hard disk or optical
or magnetic media.
[0091] FIG. 2 depicts a block diagram of an example motion planning
system 200 according to example embodiments of the present
disclosure. The example motion planning system 105 includes a world
state generator 204, one or more scenario controllers 206, and an
optimization planner 208.
[0092] The world state generator 204 can receive information from
the prediction system 104, the map data 126, and/or other
information such as vehicle pose, a current route, or other
information. The world state generator 204 can synthesize all
received information to produce a world state that describes the
state of all objects in and other aspects of the surrounding
environment of the autonomous vehicle at each time step.
[0093] The scenario controller(s) 206 can detect certain scenarios
(e.g., a changing lanes scenario versus a queueing scenario) and
guide the behavior of the autonomous vehicle according to the
selected scenario. Thus, the scenario controller(s) can make
discrete-type decisions (e.g., should the autonomous vehicle turn
left, turn right, change lanes, etc.) and can control motion of the
vehicle based on such decisions. In some implementations, each of
the scenario controller(s) 206 can be a classifier (e.g., a
machine-learned classifier) designed to classify the current state
of the world as either included or excluded from one or more
corresponding scenarios. In some implementations, the scenario
controller(s) 206 can operate at each time step.
[0094] As examples, the scenario controllers 206 can include one or
more of: a pass, ignore, queue controller that decides, for each
object in the world, whether the autonomous vehicle should pass,
ignore, or queue such object; a yield controller that decides, for
each adjacent vehicle in the world, whether the autonomous vehicle
should yield to such vehicle; a lane change controller that
identifies whether and when to change lanes; and/or a speed
regressor that determines an appropriate driving speed for each
time step. These scenario controllers 206 are provided as examples
only. Alternative and/or additional scenario controllers 206 can be
used. In some implementations of the present disclosure, the motion
planning system 200 does not include or implement the scenario
controllers 206.
[0095] According to another aspect of the present disclosure, the
motion planning system 200 can include an optimization planner 208
that searches (e.g., iteratively searches) over a motion planning
space (e.g., an available control space) to identify a motion plan
that optimizes (e.g., locally optimizes) a total cost associated
with the motion plan. For example, the optimization planner can
iteratively evaluate and modify a candidate motion plan until the
total cost is optimized.
[0096] FIG. 3 depicts a block diagram of an example optimization
planner 300 according to example embodiments of the present
disclosure. As described above, the optimization planner 300 can
iteratively search over a motion planning space (e.g., an available
control space) to identify a motion plan that optimizes (e.g.,
locally optimizes) a total cost associated with the motion plan. In
particular, the example optimization planner 300 can implement an
optimizer 308 to optimize the total cost. The optimizer 308 can be
or include a solver (e.g., an iterative solver) or other
optimization tool that is able to optimize the total cost. In some
implementations, the optimizer 308 is an iterative linear quadratic
regulator.
[0097] According to an aspect of the present disclosure, the total
cost can be based at least in part on one or more cost functions
304. In one example implementation, the total cost equals the sum
of all costs minus the sum of all rewards and the optimization
planner attempts to minimize the total cost.
[0098] In some implementations, different cost function(s) 304 can
be used depending upon a particular scenario that is provided to
the optimization planner 300. For example, as described above, a
motion planning system can include a plurality of scenario
controllers that detect certain scenarios (e.g., a changing lanes
scenario versus a queueing scenario) and guide the behavior of the
autonomous vehicle according to the selected scenario. Different
sets of one or more cost functions 304 can correspond to the
different possible scenarios and a penalty/reward generator can
load the cost function(s) 304 corresponding to the selected
scenario at each instance of motion planning. In other
implementations, the same cost function(s) 304 can be used at each
instance of motion planning (e.g., no particular scenarios are
used). In some implementations, the optimization planner 300 does
not include the penalty/reward generator 302.
[0099] To provide an example cost function 304 for the purpose of
illustration: a first example cost function can provide a first
cost that is negatively correlated to a magnitude of a first
distance from the autonomous vehicle to a lane boundary. Thus, if a
candidate motion plan approaches a lane boundary, the first cost
increases, thereby discouraging (e.g., through increased cost
penalization) the autonomous vehicle from selecting motion plans
that come close to or cross over lane boundaries. This first
example cost function is provided only as an example cost function
to illustrate the principle of cost. The first cost function is not
required to implement the present disclosure. Many other and
different cost functions 304 can be employed in addition or
alternatively to the first cost function described above.
[0100] Furthermore, in some implementations, the cost function(s)
can include a portion that provides a reward rather than a cost.
For example, the reward can be of opposite sign to cost(s) provided
by other portion(s) of the cost function. Example rewards can be
provided for distance traveled, velocity, or other forms of
progressing toward completion of a route.
[0101] Referring again to FIG. 2, once the optimization planner 208
has identified the optimal candidate motion plan (or some other
iterative break occurs), the optimal candidate motion plan can be
selected and executed by the autonomous vehicle. For example, the
motion planning system 200 can provide the selected motion plan to
a vehicle controller 106 that controls one or more vehicle controls
(e.g., actuators that control gas flow, steering, braking, etc.) to
execute the selected motion plan.
[0102] Each of the world state generator 204, scenario
controller(s) 206, the optimization planner 208, and penalty/reward
generator 302 can include computer logic utilized to provide
desired functionality. In some implementations, each of world state
generator 204, scenario controller(s) 206, the optimization planner
208, and penalty/reward generator 302 can be implemented in
hardware, firmware, and/or software controlling a general purpose
processor. For example, in some implementations, each of world
state generator 204, scenario controller(s) 206, the optimization
planner 208, and penalty/reward generator 302 includes program
files stored on a storage device, loaded into a memory and executed
by one or more processors. In other implementations, each of world
state generator 204, scenario controller(s) 206, the optimization
planner 208, and penalty/reward generator 302 includes one or more
sets of computer-executable instructions that are stored in a
tangible computer-readable storage medium such as RAM hard disk or
optical or magnetic media.
[0103] FIG. 4 depicts a block diagram of an example automatic
tuning computing system 402 according to example embodiments of the
present disclosure. The automatic tuning computing system 402 can
automatically tune the cost function gains of one or more cost
functions 304. The automatic tuning computing system 402 can
include or otherwise be implemented by one or more discrete
computing devices. For example, some aspects of the computing
system 402 can be implemented by a first device while other aspects
of the system 402 are implemented by a second device.
[0104] The automatic tuning computing system 402 includes one or
more processors 412 and a memory 414. The one or more processors
412 can be any suitable processing device (e.g., a processor core,
a microprocessor, an ASIC, a FPGA, a controller, a microcontroller,
etc.) and can be one processor or a plurality of processors that
are operatively connected. The memory 414 can include one or more
non-transitory computer-readable storage media, such as RAM, ROM,
EEPROM, EPROM, one or more memory devices, flash memory devices,
etc., and combinations thereof.
[0105] The memory 414 can store information that can be accessed by
the one or more processors 412. For instance, the memory 414 (e.g.,
one or more non-transitory computer-readable storage mediums,
memory devices) can store data 416 that can be obtained, received,
accessed, written, manipulated, created, and/or stored. In some
implementations, the computing system 402 can obtain data from one
or more memory device(s) that are remote from the system 402.
[0106] The memory 414 can also store computer-readable instructions
418 that can be executed by the one or more processors 412. The
instructions 418 can be software written in any suitable
programming language or can be implemented in hardware.
Additionally, or alternatively, the instructions 418 can be
executed in logically and/or virtually separate threads on
processor(s) 412.
[0107] For example, the memory 414 can store instructions 418 that
when executed by the one or more processors 412 cause the one or
more processors 412 to perform any of the operations and/or
functions described herein.
[0108] The automatic tuning computing system 402 can include or
otherwise be in communication with a vehicle motion planning
system, such as, for example, the example motion planning system
200 described with reference to FIG. 2. The autonomous vehicle
motion planning system can include an optimization planner, such
as, for example, the optimization planner 300 described with
reference to FIG. 3. The optimization planner 300 can include one
or more cost functions 304 and an optimizer 308.
[0109] The automatic tuning computing system 402 can include an
automatic tuner 420. The computing system 402 can implement the
automatic tuner 420 to automatically tune one or more gains of the
one or more cost functions 304 of the vehicle motion planning
system 200. In particular, the computing system 402 can implement
the automatic tuner 420 to automatically tune the cost function
gains by minimizing or otherwise optimizing an objective function
422 that provides an objective value based at least in part on a
difference in respective total costs between a humanly-executed
motion plan and an autonomous motion plan generated by the
autonomous vehicle motion planning system 200. For example, the
automatic tuner 420 can include and implement a solver 424 to
minimize or otherwise reduce the optimization function 422. For
example, the solver 424 can be an iterative solver.
[0110] Thus, the automatic tuner 420 can enable imitation learning
based on one or more humanly-executed motion plans that were
executed by a human driver during one or more humanly-controlled
driving sessions. In some implementations, high quality
humanly-controlled driving sessions can be identified and selected
for use as a "gold-standard" for imitation training of the
autonomous vehicle motion planning system. For example, driving
sessions can be considered high quality if they illustrate or
otherwise exhibit good or otherwise appropriate human driving
behavior.
[0111] Particular humanly-controlled driving sessions can be
identified as high quality and selected for use according to any
number of metrics including, for example, ride quality scoring
metrics. Example ride quality scoring metrics include automated
scoring metrics that automatically identify certain driving events
(e.g., undesirable events such as jerking events or heavy braking
events) and provide a corresponding score and/or manual scoring
metrics such as human passenger feedback or scoring based on human
passenger feedback. Particular humanly-controlled driving sessions
can be also identified as high quality and selected for use
according to driver reputation or other factors.
[0112] According to an aspect of the present disclosure, one or
more session logs 428 can be respectively associated with the one
or more humanly-controlled driving sessions that were selected for
use in performing automatic tuning. Each session log 428 can
include any data that was acquired by the vehicle or its associated
sensors during the corresponding driving session. In particular,
the session log 428 can include the various types of sensor data
described above with reference to the perception system. Thus, even
though the vehicle was being manually controlled, the sensors
and/or any other vehicle systems can still operate as if the
vehicle was operating autonomously and the corresponding data can
be recorded and stored in the session log 428.
[0113] The session log 428 can also include various other types of
data alternatively or in addition to sensor data. For example, the
session log 428 can include vehicle control data (e.g., the
position or control parameters of actuators that control gas flow,
steering, braking, etc.) and/or vehicle state data (e.g., vehicle
location, speed, acceleration, heading, orientation, etc.) for any
number of timestamps or sampling points.
[0114] In some implementations, the session log 428 for each of the
one or more humanly-controlled driving sessions can directly
include the humanly-executed motion plans that were executed by the
human driver during such driving session. For example, the session
log 428 can directly include vehicle state data, vehicle control
data, and/or vehicle trajectory data that can be sampled (e.g., in
a window fashion) to form humanly-executed motion plans.
[0115] In other implementations, the humanly-executed motion plans
can be derived from the session logs 428. For example, the session
logs 428 may not directly include humanly-executed motion plans but
may include information sufficient to derive motion plans. As such,
in some implementations, the automatic tuning computing system 402
can include a trajectory fitter 426 that devices humanly-executed
motion plans from the humanly-controlled session logs 428.
[0116] In particular, as an example, FIG. 6 depicts a block diagram
of an example processing pipeline to derive humanly-executed motion
plans according to example embodiments of the present disclosure.
In particular, humanly-controlled session logs 428 can be provided
to the trajectory fitter 426. The trajectory fitter 426 can operate
to fit full trajectory profiles to autonomous vehicle partial
states. For example, the trajectory fitter 426 can identify the
most reliable fields from the logged vehicle states to generate
full trajectory profiles (e.g., including higher derivatives) which
match the vehicle partial states as closely as possible. Therefore,
the trajectory fitter 426 can derive the humanly-executed motion
plans 508 from the session logs 428. However, as described above,
in some implementations, the trajectory fitter 426 is not
required.
[0117] Referring again to FIG. 4, the automatic tuning computing
system 402 can obtain one or more humanly-executed motion plans
that can be used as a "gold-standard" for imitation training of the
autonomous vehicle motion planning system. To perform such
imitation training, the automatic tuning computing system 402 can
employ the autonomous vehicle motion planning system 200 to
generate autonomous motion plans based on the humanly-controlled
driving session logs 428. The automatic tuning computing system 402
can automatically tune the cost function gains by minimizing or
otherwise optimizing the objective function 422 that provides an
objective value based at least in part on a difference in
respective total costs between a humanly-executed motion plan and
an autonomous motion plan generated by the autonomous vehicle
motion planning system. In particular, the automatic tuning
computing system 402 can respectively input the humanly-executed
motion plan and the autonomous motion plan into the one or more
cost functions 304 used by the optimization planner 300 of the
autonomous vehicle motion planning system 200 to obtain their
respective total costs. The automatic tuning computing system 402
can iteratively adjust the gains of the one or more cost functions
304 to minimize or otherwise optimize the objective function
422.
[0118] More particularly, as one example, FIG. 5 depicts a workflow
diagram of an example automatic tuning computing system according
to example embodiments of the present disclosure. In particular,
according to another aspect of the present disclosure, the data
from the humanly-controlled driving session logs 428 can be
provided as input to an autonomous vehicle computing system, which
can include various systems such as, for example, a perception
system, a prediction system, and/or a motion planning system 200 as
described above. The systems of the autonomous vehicle computing
system can process the data from the humanly-controlled driving
session logs 428 as if it was being collected by an autonomous
vehicle during autonomous operation and, in response to the data
from the humanly-controlled driving session logs 428, output one or
more autonomous motion plans 506. Stated differently, the
autonomous vehicle computing system (e.g., the motion planning
system 200) can generate autonomous motion plans 506 as if it were
attempting to autonomously operate through the environment
described by the data from the humanly-controlled driving session
logs 428. As described above, generating these autonomous motion
plans 406 can include implementing the optimization planner 300 to
optimize over the one or more cost functions 304 that include a
plurality of gains 504. Thus, the autonomous motion plans 506
provide an insight into how the autonomous vehicle would react or
otherwise operate in the same situations or scenarios that were
encountered by the human driver during the previous
humanly-controlled driving sessions.
[0119] The automatic tuning computing system can also obtain one or
more corresponding humanly-executed motion plans 508. For example,
the one or more corresponding humanly-executed motion plans 508 can
be obtained directly from the humanly-controlled session logs 428
or can be derived from the humanly-controlled session logs 428.
[0120] According to another aspect of the present disclosure, the
systems and methods of the present disclosure can automatically
tune the cost function gains 504 by minimizing or otherwise
optimizing the objective function 422. In particular, the objective
function 422 can provide an objective value based at least in part
on a difference between a first total cost associated with the
humanly-executed motion plan 508 and a second total cost associated
with the autonomous motion plan 506. As such, evaluating the
objective function 422 can include inputting the humanly-executed
motion plan 508 into the one or more cost functions 304 of the
autonomous vehicle motion planning system 200 to determine the
first total cost associated with the humanly-executed motion plan
508 and inputting the autonomous motion plan 406 into the one or
more cost functions 304 of the autonomous vehicle motion planning
system 200 to determine the second total cost associated with the
autonomous motion plan 506. More particularly, in some
implementations, a training dataset can include a plurality of
pairs of motion plans, where each pair includes a humanly-executed
motion plan 508 and a corresponding autonomous motion plan 506. The
objective function 422 can be optimized over all of the plurality
of pairs of motion plans included in the training dataset.
[0121] In some implementations, the objective function 422 can be
crafted according to an approach known as Maximum Margin Planning.
In particular, the objective function 422 can be crafted to enable
an optimization approach that allows imitation learning in which
humanly-executed motion plan examples are used to inform the cost
function gains 504. In some implementations, the objective function
422 and associated optimization approach can operate according to a
number of assumptions. For example, in some implementations, it can
be assumed that the one or more cost functions 304 of the
autonomous vehicle motion planning system are linear (e.g., linear
in their features).
[0122] According to another aspect of the present disclosure, in
some implementations, the objective function 422 can encode or
otherwise include one or more constraints. For example, in some
implementations, the objective function can encode a first
constraint that the first total cost associated with the
humanly-executed motion plan 508 is less than the second total cost
associated with the autonomous motion plan 506. In effect, this
first constraint reflects an assumption that the humanly-executed
motion plan 508 is optimal. Therefore, any autonomous motion plan
506 generated by the autonomous vehicle motion planning system 200
will necessarily have a higher total cost. According to one example
notation, in some implementations, this first constraint can be
expressed according to the following equation, where {circumflex
over (P)} refers to the autonomous motion plan 506 and P.sub.e
refers to the humanly-executed motion plan 508.
x .di-elect cons. P ^ w T F x - x .di-elect cons. P e w T F x
.gtoreq. 0 ##EQU00002##
[0123] In some implementations, in addition or alternatively to the
first constraint described above, the objective function 422 can
encode a second constraint that the difference between the first
total cost and the second total cost is greater than or equal to a
margin.
[0124] In some implementations, the margin can be based on or equal
to a dis-similarity value provided by a loss function (P.sub.e,
{circumflex over (P)}). The dis-similarity value can be descriptive
of a dis-similarity between the humanly-executed motion plan 508
and the autonomous motion plan 506. For example, a larger
dis-similarity value can indicate that the plans are more
dis-similar (i.e., less similar) while a smaller dis-similarity
value can indicate that the plans are less dis-similar (i.e., more
similar). In some implementations, the loss function can compare
the humanly-executed motion plan 508 to the autonomous motion plan
506 and output a real positive number as the dis-similarity
value.
[0125] In effect, this second constraint that the difference
between the first total cost and the second total cost be greater
than or equal to the margin reflects the assumption that, if the
plans are dis-similar, then the humanly-executed motion plan 508 is
expected to have a significantly lower cost than the corresponding
autonomous motion plan 506. Stated differently, the
humanly-executed motion plan 508 is expected to be significantly
better in terms of cost if the plans are significantly differently.
By contrast, if the plans are quite similar, then their respective
costs are expected to be relatively close. Thus, a distinction can
be made between similar plans and dis-similar plans.
[0126] According to one example notation, in some implementations,
this second constraint can be expressed according to the following
equation.
x .di-elect cons. P ^ w T F x - x .di-elect cons. P e w T F x
.gtoreq. L ( P e , P ^ ) ##EQU00003##
[0127] However, in some instances, it may be not be possible to
satisfy one or more of the constraints encoded in the objective
function 422. For example, if the margin (e.g., as provided by the
loss function) is made relatively strong, it may not be possible to
meet the constraints for every pair of plans included in the
training dataset.
[0128] As one example, according to one example notation, a
violation occurs when the following equation is satisfied.
x .di-elect cons. P e w T F x - ( x .di-elect cons. P ^ w T F x - L
( P e , P ^ ) ) .gtoreq. 0 ##EQU00004##
[0129] To account for this issue, a slack variable can be included
to account for the occasional violation. In particular, when one or
more of the constraints are violated, a slack variable penalty can
be applied; while no penalty is applied if all constraints are
met.
[0130] As one example, according to one example notation, the slack
variable can be expressed as follows:
.xi. = { violation : violation > 0 0 : otherwise
##EQU00005##
[0131] Taking the above constraints into account, one example
objective function 422 can be derived as follows:
Objective : argmin w ( .lamda. w 2 + ( x .di-elect cons. P e w T F
x - x .di-elect cons. P ^ w T F x ) + L ( P e , P ^ ) )
##EQU00006##
[0132] As noted above, the objective function 422 can be minimized
or otherwise optimized to automatically tune the cost function
gains 504. That is, the gains 504 can be iteratively adjusted
(e.g., in the form of iterative gain updates 510) to optimize the
objective function 422. The ultimate values of the gains 504 that
optimize the objective function 422 can themselves be viewed as
optimal or otherwise "tuned".
[0133] In some implementations, the objective function 422 can be
convex, but non-differentiable. In some implementations, a
subgradient technique can be used to optimize the objective
function. In some implementations, the objective function 422 can
enable guaranteed convergence to an optimal value for a small
enough step size. In some implementations, optimization of the
objective function 422 can be similar to stochastic gradient
descent with the added concept of margins.
[0134] Referring again to FIG. 4, in some implementations, the
automatic tuning computing system 402 can identify and reject or
otherwise discard outlying pairs of motion plans. For example, the
automatic tuner 420 can include an outlier remover 425 that
identifies and rejects or otherwise discards outlying pairs of
motion plans.
[0135] In particular, in one example, if the dis-similarity value
(or some other measure of similarity) for a given pair of
humanly-executed plan and corresponding autonomous motion plan
exceeds a certain value, the outlier remover 425 can identify such
pair of plans as an outlier and remove them from the training
dataset. As another example, if the difference between the total
costs respectively associated with a given pair of humanly-executed
plan and corresponding autonomous motion plan exceeds a certain
value, then the outlier remover 425 can identify such pair of plans
as an outlier and remove them from the training dataset. One reason
for use of the outlier remover 425 is that, as described above,
different cost function(s) 304 can be used depending upon a
particular scenario that is selected by the motion planning system
200 (e.g., a changing lanes scenario versus a queueing scenario).
Thus, if the autonomous vehicle motion planning system 200 selected
a different scenario than was performed by the human driver, then
the automatic tuning system 402 will be unable to match such pair
of plans. As yet another example of outlier identification, if the
optimization planner fails to converge, the outlier remover 425 can
remove the corresponding data and humanly-executed plan from the
dataset.
Example Methods
[0136] FIG. 7 depicts a flowchart diagram of an example method 700
to automatically tune cost function gains according to example
embodiments of the present disclosure.
[0137] At 702, a computing system obtains data descriptive of a
humanly-executed motion plan that was executed during a previous
humanly-controlled vehicle driving session. For example, the data
descriptive of the humanly-executed motion plan can be obtained or
derived from a data log that includes data collected during the
previous humanly-controlled vehicle driving session. For example,
the data log can include state data for the humanly-controlled
vehicle.
[0138] In some implementations, obtaining the data descriptive of
the humanly-executed motion plan at 702 can include obtaining the
data log that includes the data collected during the previous
humanly-controlled vehicle driving session and fitting a trajectory
to the state data for the humanly-controlled vehicle to obtain the
humanly-executed motion plan.
[0139] At 704, an autonomous vehicle motion planning system
generates an autonomous motion plan based at least in part on the
data log that includes the data collected during the previous
humanly-controlled vehicle driving session. For example, generating
the autonomous motion plan can include evaluating one or more cost
functions that include a plurality of gains. In particular, the
autonomous vehicle motion planning system can optimize over the one
or more cost functions to generate the autonomous motion plan.
[0140] At 706, the computing system evaluates an objective function
that provides an objective value based at least in part on a
difference between a first total cost associated with the
humanly-executed motion plan and a second total cost associated
with the autonomous motion plan. In particular, evaluating the
objective function at 706 can include inputting the
humanly-executed motion plan into the one or more cost functions of
the autonomous vehicle motion planning system to determine the
first total cost associated with the humanly-executed motion plan;
and inputting the autonomous motion plan into the one or more cost
functions of the autonomous vehicle motion planning system to
determine the second total cost associated with the autonomous
motion plan.
[0141] In some implementations, the objective function can encode a
first constraint that the first total cost associated with the
humanly-executed motion plan is less than the second total cost
associated with the autonomous motion plan. In some
implementations, evaluating the objective function at 706 can
include applying a slack variable violation when the first
constraint is violated.
[0142] In some implementations, the objective function can encode a
second constraint that the difference between the first total cost
and the second total cost is greater than or equal to a margin. In
some implementations, the margin is based at least in part on or
equal to a dis-similarity value that is descriptive of a
dis-similarity between the humanly-executed motion plan and the
autonomous motion plan. For example, the dis-similarity value can
be provided by a loss function. In some implementations, evaluating
the objective function at 706 can include applying a slack variable
violation when the second constraint is violated.
[0143] At 708, the computing system determines at least one
adjustment to at least one of the plurality of gains values of the
one or more cost functions of the autonomous vehicle motion
planning system that reduces the objective value provided by the
objective function.
[0144] In some implementations, determining the at least one
adjustment to the at least one of the plurality of gain values at
708 can include iteratively optimizing the objective function. As
an example, iteratively optimizing the objective function can
include performing a subgradient technique to iteratively optimize
the objective function.
[0145] FIG. 8 depicts a flowchart diagram of an example method 800
to train an autonomous vehicle motion planning system to
approximate human driving behavior associated with a target
geographic area according to example embodiments of the present
disclosure.
[0146] At 802, a computing system collects humanly-controlled
driving session logs that are descriptive of appropriate driving
behavior in a target geographic area. At 804, the computing system
uses the collected session logs to automatically tune gains of one
or more cost functions used by an autonomous vehicle motion
planning system.
[0147] More particularly, as an example, an existing autonomous
vehicle motion planning system may have been tuned (e.g.,
automatically and/or manually) based on driving data or other
testing data associated with a first geographic area. Thus, based
on such tuning, the autonomous vehicle may be capable of
approximating good human driving performance in such first
geographic area.
[0148] However, the residents of different geographic areas have
different driving styles. In addition, different geographic areas
present different driving scenarios and challenges. Thus, an
autonomous vehicle specifically tuned for performance in a first
geographic area may exhibit decreased performance quality when
autonomously driving in a second geographic area that is different
than the first geographic area.
[0149] Thus, through performance of method 800, the gains of the
autonomous vehicle motion planning system can be automatically
tuned based on humanly-controlled driving session logs (and
corresponding humanly-executed motion plans) that were collected
during humanly-controlled driving sessions that were performed in a
target geographic area (e.g., the second geographic area).
[0150] To provide an example for the purpose of illustration, an
autonomous vehicle motion planning system tuned based on data and
testing in Pittsburgh, Pa., USA may approximate human driving
behavior that is appropriate in Pittsburgh. However, in some
instances, such vehicle may not approximate the human driving
behavior that is commonplace and appropriate in Manila,
Philippines. For example, human drivers in Manila may be less
averse to changing lanes, drive closer together,
accelerate/decelerate faster, etc. Thus, to automatically tune the
autonomous vehicle for autonomous driving in Manila, a human driver
can operate a vehicle in Manila to generate a humanly-controlled
session log that is indicative of appropriate human driving
behavior in Manila (that is, driving behavior that is "good"
driving from the perspective of a Manila resident or driver). The
cost function gains of the autonomous vehicle can be automatically
tuned based on such Manila session logs. After tuning, the
autonomous vehicle motion planning system can generate autonomous
motion paths that approximate appropriate human driving behavior in
Manila. In other implementations, it is not required that the human
driver actually be physically located in Manila, but instead that
the driver simply operate the vehicle in the style of the residents
Manila to generate the Manila session logs.
[0151] According to another aspect, a plurality of sets of tuned
gains that respectively correspond to a plurality of different
locations can be stored in memory. A particular set of gains can be
selected based on the location of the autonomous vehicle and the
selected set of gains can be loaded into the autonomous vehicle
motion planning system for use, thereby enabling an autonomous
vehicle to change driving behavior based on its current
location.
[0152] FIG. 9 depicts a flowchart diagram of an example method 900
to train an autonomous vehicle motion planning system to
approximate human driving behavior associated with a target driving
style profile according to example embodiments of the present
disclosure.
[0153] At 902, a computing system collects humanly-controlled
driving session logs that are descriptive of appropriate driving
behavior of a human driving behavior profile. At 904, the computing
system uses the collected session logs to automatically tune gains
of one or more cost functions used by an autonomous vehicle motion
planning system.
[0154] More particularly, as an example, human drivers can be
requested to operate vehicles according to different human driving
behavior profiles (e.g., sporty versus cautious). A corpus of
humanly-controlled session logs can be collected for each driving
behavior profile. Thereafter, the cost function gains of an
autonomous vehicle motion planning system can be automatically
tuned to approximate one of the driving behavior profiles. For
example, the cost function gains of an autonomous vehicle motion
planning system can be automatically tuned based on session logs
that correspond to sporting human driving behavior. Thereafter, the
tuned autonomous vehicle motion planning system can generate
autonomous motion plans that fit the sporty driving behavior
profile.
[0155] In one example implementation of the above, a plurality of
different sets of gains that respectively correspond to the
different human driving behavior profiles can be respectively
automatically tuned and then stored in memory. A passenger of the
autonomous vehicle can select (e.g., through an interface of the
autonomous vehicle) which of the human driving behavior profiles
they would like to autonomous vehicle to approximate. In response,
the autonomous vehicle can load the particular gains associated
with the selected behavior profile and can generate autonomous
motion plans using such gains. Therefore, a human passenger can be
given the ability to select the style of driving that she
prefers.
[0156] FIG. 10 depicts a flowchart diagram of an example method
1000 to train an autonomous vehicle motion planning system to
approximate human driving behavior associated with a target vehicle
type according to example embodiments of the present
disclosure.
[0157] At 1002, a computing system collects humanly-controlled
driving session logs that are descriptive of appropriate driving
behavior for a particular vehicle type or model. At 1004, the
computing system uses the collected session logs to automatically
tune gains of one or more cost functions used by an autonomous
vehicle motion planning system.
[0158] More particularly, as an example, human drivers can be
requested to operate different vehicle types or models. A corpus of
humanly-controlled session logs can be collected for each vehicle
type or model. Thereafter, the cost function gains of an autonomous
vehicle motion planning system can be automatically tuned to
approximate human driving of one of the vehicle types or model. For
example, the cost function gains of an autonomous vehicle motion
planning system can be automatically tuned based on session logs
that correspond to human operation of a delivery truck.
[0159] To provide an example for the purpose of illustration, an
autonomous vehicle motion planning system tuned based on data and
testing performed by a sedan may approximate human driving behavior
that is appropriate for driving a sedan. However, in some
instances, such motion planning system may not provide autonomous
motion plans that are appropriate for a large truck. For example,
human drivers of large trucks might take wider turns, leave more
space between the nearest vehicle, apply braking earlier, etc.
Thus, to automatically tune the autonomous vehicle motion planning
system for use in a large truck, a human driver can operate a large
truck to generate a humanly-controlled session log that is
indicative of appropriate human driving behavior in a large truck.
The cost function gains of the autonomous vehicle can be
automatically tuned based on such large truck human driving session
logs. After tuning, the autonomous vehicle motion planning system
can generate autonomous motion paths that approximate appropriate
human driving behavior for large trucks, rather than sedans.
Additional Disclosure
[0160] The technology discussed herein makes reference to servers,
databases, software applications, and other computer-based systems,
as well as actions taken and information sent to and from such
systems. The inherent flexibility of computer-based systems allows
for a great variety of possible configurations, combinations, and
divisions of tasks and functionality between and among components.
For instance, processes discussed herein can be implemented using a
single device or component or multiple devices or components
working in combination. Databases and applications can be
implemented on a single system or distributed across multiple
systems. Distributed components can operate sequentially or in
parallel.
[0161] While the present subject matter has been described in
detail with respect to various specific example embodiments
thereof, each example is provided by way of explanation, not
limitation of the disclosure. Those skilled in the art, upon
attaining an understanding of the foregoing, can readily produce
alterations to, variations of, and equivalents to such embodiments.
Accordingly, the subject disclosure does not preclude inclusion of
such modifications, variations and/or additions to the present
subject matter as would be readily apparent to one of ordinary
skill in the art. For instance, features illustrated or described
as part of one embodiment can be used with another embodiment to
yield a still further embodiment. Thus, it is intended that the
present disclosure cover such alterations, variations, and
equivalents.
[0162] In particular, although FIGS. 7-10 respectively depict steps
performed in a particular order for purposes of illustration and
discussion, the methods of the present disclosure are not limited
to the particularly illustrated order or arrangement. The various
steps of the method 700, 800, 900, and/or 1000 can be omitted,
rearranged, combined, and/or adapted in various ways without
deviating from the scope of the present disclosure.
* * * * *