U.S. patent application number 17/140422 was filed with the patent office on 2021-04-29 for calibration of laser and vision sensors.
This patent application is currently assigned to SZ DJI TECHNOLOGY CO. LTD.. The applicant listed for this patent is SZ DJI TECHNOLOGY CO. LTD.. Invention is credited to Lu Ma, Kanzhi Wu.
Application Number | 20210124029 17/140422 |
Document ID | / |
Family ID | 1000005316500 |
Filed Date | 2021-04-29 |
![](/patent/app/20210124029/US20210124029A1-20210429\US20210124029A1-2021042)
United States Patent
Application |
20210124029 |
Kind Code |
A1 |
Wu; Kanzhi ; et al. |
April 29, 2021 |
CALIBRATION OF LASER AND VISION SENSORS
Abstract
Automatic calibration between laser and vision sensors carried
by a mobile platform, and associated systems and methods are
disclosed herein. A representative method includes evaluating
depth-based feature points obtained from the laser sensor with edge
information obtained from the vision sensor and generating
calibration rules based thereon.
Inventors: |
Wu; Kanzhi; (Shenzhen City,
CN) ; Ma; Lu; (Shenzhen City, CN) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
SZ DJI TECHNOLOGY CO. LTD. |
Shenzhen City |
|
CN |
|
|
Assignee: |
SZ DJI TECHNOLOGY CO. LTD.
Shenzhen City
CN
|
Family ID: |
1000005316500 |
Appl. No.: |
17/140422 |
Filed: |
January 4, 2021 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
16556984 |
Aug 30, 2019 |
10884110 |
|
|
17140422 |
|
|
|
|
15730617 |
Oct 11, 2017 |
10436884 |
|
|
16556984 |
|
|
|
|
PCT/CN2017/082604 |
Apr 28, 2017 |
|
|
|
15730617 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G05D 1/024 20130101;
G05D 2201/0213 20130101; G06T 2207/10028 20130101; G06T 2207/10021
20130101; G01S 17/931 20200101; G01S 7/4808 20130101; G06T
2207/10152 20130101; G01S 7/4972 20130101; G06T 2207/30252
20130101; G05D 1/0202 20130101; G01S 17/86 20200101; G06T
2207/10024 20130101; G06T 2207/10032 20130101; G01S 17/93 20130101;
G06K 9/46 20130101; G06T 2207/30248 20130101; G05D 1/0248 20130101;
G06T 7/136 20170101; G06T 7/85 20170101; G05D 1/0246 20130101; G06T
7/13 20170101; G01S 17/89 20130101; G01S 7/497 20130101 |
International
Class: |
G01S 7/497 20060101
G01S007/497; G01S 17/93 20060101 G01S017/93; G06K 9/46 20060101
G06K009/46; G06T 7/136 20060101 G06T007/136; G06T 7/13 20060101
G06T007/13; G06T 7/80 20060101 G06T007/80; G05D 1/02 20060101
G05D001/02; G01S 7/48 20060101 G01S007/48; G01S 17/89 20060101
G01S017/89; G01S 17/86 20060101 G01S017/86 |
Claims
1-106. (canceled)
107. A computer-implemented method for generating a point cloud,
the method comprising: obtaining observation data generated by at
least one vision sensor within a time period; evaluating states
associated with a laser unit at different points in time within the
time period based at least on the observation data; determining one
or more transformation rules based at least on the states
associated with the laser unit for transforming between one or more
reference systems and a target reference system each of which being
associated with the laser unit, wherein the one or more reference
systems are associated with the laser unit at the different points
in time within the time period and the target reference system is
associated with the laser unit at a target point in time within the
time period; and generating the point cloud by transforming data
obtained by the laser unit to the target reference system based at
least on the one or more transformation rules, the data obtained by
the laser unit corresponding to the different points in time within
the time period.
108. The method of claim 107, wherein determining the one or more
transformation rules further comprises: computing transformation
matrices for the laser unit at the different points in time with
respect to the target point in time, wherein each transformation
matrix is computed using a corresponding state associated with the
laser unit at a corresponding point in time.
109. The method of claim 108, wherein transforming data obtained by
the laser unit based at least on the one or more transformation
rules to the target reference system further comprises:
transforming the data obtained by the laser unit at the
corresponding point in time to the target point in time using a
corresponding transformation matrix.
110. The method of claim 107, wherein the at least one vision
sensor and the laser unit are carried by a mobile platform.
111. The method of claim 107, wherein the at least one vision
sensor comprises at least one of a stereo camera or a monocular
camera.
112. The method of claim 107, wherein obtaining the observation
data comprises obtaining the observation data at different data
acquisition rates from at least two different vision sensors.
113. The method of claim 107, wherein the laser unit has a
different data acquisition rate than the at least one vision
sensor.
114. The method of claim 107, wherein the states associated with
the laser unit are evaluated based on states associated with the at
least one vision sensor, or wherein the states associated with the
laser unit include at least one of a position, a speed, or a
rotation of the laser unit.
115. The method of claim 107, further comprising selecting one or
more feature points from the point cloud based at least on one or
more depth differences between points within the point cloud.
116. The method of claim 115, wherein selecting the one or more
feature points from the point cloud is further based on a
relationship between the one or more depth differences and a
threshold discontinuity in depth measurement.
117. The method of claim 115, further comprising evaluating the
selected feature points, using edge information obtained from the
at least one vision sensor based at least on a target function, the
target function defined at least by positions of the selected
feature points when projected to a reference system associated with
the at least one vision sensor.
118. The method of claim 117, further comprising: generating at
least one calibration rule for calibration between the laser unit
and the at least one vision sensor based at least on evaluating the
selected feature points using the edge information; and causing the
calibration between the laser unit and the at least one vision
sensor using the at least one calibration rule.
119. The method of claim 107, further comprising: converting an
image obtained from the at least one vision sensor into a grayscale
image; and determining edge information based at least on a
difference between at least one pixel of the grayscale image and
one or more pixels within a threshold proximity of the at least one
pixel.
120. The method of claim 107, wherein the one or more
transformation rules are at least partially defined in accordance
with a position and an orientation of the at least one vision
sensor relative to a mobile platform.
121. The method of claim 107, wherein the method further comprises:
selecting one or more feature points from the point cloud; and
evaluating the selected feature points, using edge information
obtained from the at least one vision sensor.
122. The method of claim 121, wherein the method further comprises
generating at least one calibration rule for calibration between
the laser unit and the at least one vision sensor based at least on
evaluating the selected feature points using the edge
information.
123. A non-transitory computer-readable medium storing
computer-executable instructions that, when executed, cause one or
more processors associated with a mobile platform to perform
operations, the operations comprising: obtaining observation data
generated by at least one vision sensor within to a time period;
evaluating states associated with a laser unit at different points
in time within the time period based at least on the observation
data; determining one or more transformation rules based at least
on the states associated with the laser unit for transforming
between one or more reference systems and a target reference system
each of which being associated with the laser unit, wherein the one
or more reference systems are associated with the laser unit at the
different points in time within the time period, and the target
reference system is associated with the laser unit at a target
point in time within the time period; and generating the point
cloud by transforming data obtained by the laser unit to the target
reference system based at least on the one or more transformation
rules, the data obtained by the laser unit corresponding to the
different points in time within the time period.
124. The computer-readable medium of claim 123, wherein the
operations further comprise: selecting one or more feature points
from the point cloud; and evaluating the selected feature points,
using edge information obtained from the at least one vision sensor
based at least on a target function, the target function defined at
least by positions of the selected feature points when projected to
a reference system associated with the at least one vision
sensor.
125. A apparatus including a programmed controller that at least
partially controls one or more motions of the apparatus, wherein
the programmed controller includes one or more processors to
perform operations, the operations comprising: obtaining
observation data generated by at least one vision sensor within a
time period; evaluating states associated with a laser unit at
different points in time within the time period based at least on
the observation data; determining one or more transformation rules
based at least on the states associated with the laser unit for
transforming between one or more reference systems and a target
reference system each of which being associated with the laser
unit, wherein the one or more reference systems are associated with
the laser unit at the different points in time within the time
period and the target reference system is associated with the laser
unit at a target point in time within the time period; and
generating the point cloud by transforming data obtained by the
laser unit to the target reference system based at least on the one
or more transformation rules, the data obtained by the laser unit
corresponding to the different points in time within the time
period.
126. The apparatus of claim 125, being coupled to a vehicle.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] The present application is a continuation of International
Patent Application No. PCT/CN17/82604, filed Apr. 28, 2017, which
is incorporated herein by reference.
TECHNICAL FIELD
[0002] The present technology is generally directed to calibration
between an emitter/detector sensor (e.g., a laser sensor) and an
optical detection sensor (e.g., a vision sensor such as a camera)
that are carried by a mobile platform.
BACKGROUND
[0003] The operations of mobile platforms are typically facilitated
by obtaining position information of objects in a surrounding
environment, using a combination of sensors. The information
obtained regarding the positions of objects can facilitate the
detecting pedestrians and/or vehicles in the environment, thereby
allowing the mobile platforms to avoid obstacles during navigation.
Typical optical detection sensors, such as monocular cameras, can
detect an object based on computer vision and machine learning
algorithms, but cannot consistently provide three-dimensional
position information of the target. Emitter/detector sensors, such
as LiDAR sensors, typically transmit a pulsed signal (e.g. laser
signal) outwards, detect the pulsed signal reflections, and measure
three-dimensional information (e.g., laser scanning points) in the
environment to facilitate mapping the environment. Typical
emitter/detector sensors can provide three-dimensional geometry
information of the environment, but object detection based thereon
is relatively difficult. Additionally, conventional
omni-directional laser sensors with 360-degree horizontal field of
view (FOV) can be expensive and non-customizable. Accordingly,
there remains a need for improved sensing techniques and devices
for mobile platforms.
SUMMARY
[0004] The following summary is provided for the convenience of the
reader and identifies several representative embodiments of the
disclosed technology.
[0005] In some embodiments, a computer-implemented method for
automatically calibrating at least an emitter/detector unit and an
optical detection unit, both carried by a common mobile platform,
includes combining one or more sets of point information obtained
from the emitter/detector unit to form a point cloud in a reference
system associated with the mobile platform; selecting a subset of
feature points from the point cloud; evaluating the subset of
feature points with edge information obtained from the optical
detection unit; and generating at least one calibration rule for
calibration between the emitter/detector unit and the optical
detection unit based at least in part on evaluating the feature
points with the edge information. In some embodiments, the method
further includes transforming the subset of feature points based at
least in part on a set of transformation rules, which is at least
partially defined in accordance with a position and orientation of
the optical detection unit relative to the mobile platform. In some
embodiments, the reference system associated with the mobile
platform comprises a coordinate system. In some embodiments, the
method further includes selecting the subset of feature points
based at least in part on one or more depth differences between
points within the point cloud based on a relationship between the
one or more depth differences and a threshold value. In some
embodiments, the method further includes converting an image
obtained from the optical detection unit into a grayscale image;
and determining the edge information based at least in part on a
difference between at least one pixel of the grayscale image and
one or more pixels within a threshold proximity of the at least one
pixel. In some embodiments, evaluating the feature points with the
edge information comprises projecting the feature points to
respective positions in an image obtained from the optical
detection unit. In some embodiments, evaluating the feature points
with the edge information further comprises evaluating a target
function defined at least in part by the projected positions of the
feature points, wherein generating at least one calibration rule
comprises optimizing the target function and wherein optimizing the
target function comprises optimizing the target function in
accordance with at least six degrees of freedom. In some
embodiments, the at least one calibration rule includes a rule for
transformation between a reference system associated with the
emitter/detector unit and the reference system associated with the
optical detection unit. In some embodiments, the method further
includes detecting a difference between (a) the generated at least
one calibration rule with (b) one or more previously generated
calibration rules. In some embodiments, the method further includes
causing calibration between the emitter/detector unit and the
optical detection unit in accordance with the at least one
calibration rule.
[0006] In some embodiments, a non-transitory computer-readable
medium stores computer-executable instructions. The
computer-executable instructions, when executed, cause one or more
processors associated with a mobile platform to perform actions
including combining one or more sets of point information obtained
from an emitter/detector unit to form a point cloud in a reference
system associated with the mobile platform; selecting a subset of
feature points from the point cloud; evaluating the feature points
with edge information obtained from the optical detection unit; and
generating at least one calibration rule for calibration between
the emitter/detector unit and the optical detection unit based at
least in part on evaluating the feature points with the edge
information. In some embodiments, the actions further include
transforming the subset of feature points based at least in part on
a set of transformation rules, which are at least partially defined
in accordance with a position and orientation of the optical
detection unit relative to the mobile platform. In some
embodiments, the reference system associated with the mobile
platform comprises a coordinate system. In some embodiments, the
actions further include selecting the subset of feature points
based at least in part on one or more depth differences between
points within the point cloud based on a relationship between the
one or more depth differences and a threshold value. In some
embodiments, the actions further include converting an image
obtained from the optical detection unit into a grayscale image;
and determining the edge information based at least in part on a
difference between at least one pixel of the grayscale image and
one or more pixels within a threshold proximity of the at least one
pixel. In some embodiments, evaluating the feature points with the
edge information comprises projecting the feature points to
respective positions in an image obtained from the optical
detection unit. In some embodiments, evaluating the feature points
with the edge information further comprises evaluating a target
function defined at least in part by the projected positions of the
feature points, wherein generating at least one calibration rule
comprises optimizing the target function and wherein optimizing the
target function comprises optimizing the target function in
accordance with at least six degrees of freedom. In some
embodiments, the at least one calibration rule includes a rule for
transformation between a reference system associated with the
emitter/detector unit and the reference system associated with the
optical detection unit. In some embodiments, the actions further
include detecting a difference between (a) the generated at least
one calibration rule with (b) one or more previously generated
calibration rules. In some embodiments, the actions further include
causing calibration between the emitter/detector unit and the
optical detection unit in accordance with the at least one
calibration rule.
[0007] In some embodiments, a vehicle includes a programmed
controller that at least partially controls one or more motions of
the vehicle. The programmed controller includes one or more
processors configured to combine temporally sequenced sets of point
information obtained from a measurement unit to form a point cloud
in a reference system associated with the vehicle; transform a
subset of the point cloud into a plurality of feature points in a
reference system associated with an optical detection unit;
evaluate the feature points with edge information obtained from the
optical detection unit; and generate at least one calibration rule
for calibration between the measurement unit and the optical
detection unit based at least in part on evaluating the feature
points with the edge information. In some embodiments, transforming
a subset of the point cloud is based at least in part on a set of
transformation rules, which comprises a transformation matrix. In
some embodiments, selecting the subset of the point cloud comprises
selecting a portion of the subset of points based at least in part
on one set of the temporally sequenced sets of point information.
In some embodiments, the measurement unit comprises at least one
laser sensor that has a field of view (FOV) smaller than at least
one of 360 degrees, 180 degrees, 90 degrees, or 60 degrees. In some
embodiments, the optical detection unit includes a monocular
camera. In some embodiments, the one or more processors are further
configured to convert an image obtained from the optical detection
unit into a grayscale image and determine the edge information
based at least in part on a difference between at least one pixel
of the grayscale image and one or more pixels within a threshold
proximity of the at least one pixel. In some embodiment, evaluating
the feature points with the edge information comprises projecting
the feature points to respective positions in an image obtained
from the optical detection unit. In some embodiments, the vehicle
corresponds to at least one of an unmanned aerial vehicle (UAV), a
manned aircraft, an autonomous car, a self-balancing vehicle, or a
robot.
[0008] In some embodiments, a computer-implemented method for
generating a combined point cloud for a measurement unit carried by
a mobile platform includes obtaining observation data generated
from a plurality of observation sensors carried by the mobile
platform, wherein the observation data corresponds to a time
period; evaluating states associated with the measurement unit at
different points in time within the time period based at least in
part on the observation data; determining one or more
transformation rules for transforming between reference systems
associated with the measurement unit at different points in time
within the time period to a target reference system associated with
the measurement unit; transforming data obtained by the measurement
unit at different points in time within the time period based at
least in part on the one or more transformation rules; and
generating the combined point cloud using at least a portion of the
transformed data. In some embodiments, the measurement unit emits
and detects signals. In some embodiments, the plurality of
observation sensors comprises at least one of a stereo camera, an
inertial measurement unit, a wheel encoder, or a global positioning
system. In some embodiments, obtaining observation data comprises
obtaining observation data at different rates from at least two
different observation sensors. In some embodiments, the measurement
unit has a different data acquisition rate than at least one
observation sensor. In some embodiments, the states associated with
the measurement unit is based on states associated with at least
one observation sensor. In some embodiments, the states associated
with the measurement unit include at least one of a position,
speed, or rotation. In some embodiments, evaluating the states
associated with the measurement unit comprises evaluating a
probability model. In some embodiments, evaluating the states
associated with the measurement unit further comprises evaluating
the states based at least in part on Gaussian white noise. In some
embodiments, evaluating the states associated with the measurement
unit further comprises determining optimal values for the states
associated with the measurement unit. In some embodiments,
evaluating the states associated with measurement unit is based at
least part on a maximum-a-posteriori method. In some embodiments,
the time period includes a target point in time that corresponds to
the target reference system, wherein the target point in time
corresponds to an initial point of the time period. In some
embodiments, transforming data obtained by the measurement unit at
different points in time further comprises projecting at least a
portion of the data obtained by the measurement unit in accordance
with one or more transformation matrices.
[0009] In some embodiments, a non-transitory computer-readable
medium stores computer-executable instructions. The
computer-executable instructions, when executed, cause one or more
processors associated with a mobile platform to perform actions
including: obtaining observation data generated from a plurality of
observation sensors carried by the mobile platform, wherein the
observation data corresponds to a time period; evaluating states
associated with a measurement unit at different points in time
within the time period based at least in part on the observation
data; determining one or more transformation rules for transforming
between reference systems associated with the measurement unit at
different points in time within the time period to a target
reference system associated with the measurement unit; transforming
data obtained by the measurement unit at different points in time
within the time period based at least in part on the one or more
transformation rules; and generating the combined point cloud using
at least a portion of the transformed data. In some embodiments,
the measurement unit measures at least one object by emitting and
detecting one or more signals. In some embodiments, the plurality
of observation sensors comprises at least one of a stereo camera,
an inertial measurement unit, a wheel encoder, or a global
positioning system. In some embodiments, obtaining observation data
comprises obtaining observation data at different rates from at
least two different observation sensors. In some embodiments, the
measurement unit has a different data acquisition rate than at
least one observation sensor. In some embodiments, the states
associated with the measurement unit is based on states associated
with at least one observation sensor. In some embodiments, the
states associated with the measurement unit include at least one of
a position, speed, or rotation. In some embodiments, evaluating the
states associated with the measurement unit comprises evaluating a
probability model. In some embodiments, evaluating the states
associated with the measurement unit further comprises evaluating
the states based at least in part on Gaussian white noise. In some
embodiments, evaluating the states associated with the measurement
unit further comprises determining optimal values for the states
associated with the measurement unit. In some embodiments,
evaluating the states associated with measurement unit is based at
least part on a maximum-a-posteriori method. In some embodiments,
the time period includes a target point in time that corresponds to
the target reference system, wherein the target point in time
corresponds to an initial point of the time period. In some
embodiments, transforming data obtained by the measurement unit at
different points in time further comprises projecting at least a
portion of the data obtained by the measurement unit in accordance
with one or more transformation matrices.
[0010] In some embodiments, a vehicle includes a programmed
controller that at least partially controls one or more motions of
the vehicle. The programmed controller includes one or more
processors configured to obtain observation data generated from a
plurality of observation sensors carried by the vehicle, wherein
the observation data corresponds to a time period; evaluate states
associated with a measurement unit at different points in time
within the time period based at least in part on the observation
data; determine one or more transformation rules for transforming
between reference systems associated with the measurement unit at
different points in time within the time period to a target
reference system associated with the measurement unit; transform
data obtained by the measurement unit at different points in time
within the time period based at least in part on the one or more
transformation rules; and generate the combined point cloud using
at least a portion of the transformed data. In some embodiments,
the plurality of observation sensors exclude the measurement unit.
In some embodiments, the plurality of observation sensors comprises
at least one of a stereo camera, an inertial measurement unit, a
wheel encoder, or a global positioning system. In some embodiments,
obtaining observation data comprises obtaining observation data at
different rates from at least two different observation sensors. In
some embodiments, the measurement unit has a different data
acquisition rate than at least one observation sensor. In some
embodiments, the states associated with the measurement unit is
based on states associated with at least one observation sensor. In
some embodiments, the states associated with the measurement unit
include at least one of a position, speed, or rotation. In some
embodiments, evaluating the states associated with the measurement
unit comprises evaluating a probability model. In some embodiments,
evaluating the states associated with the measurement unit further
comprises evaluating the states based at least in part on Gaussian
white noise. In some embodiments, evaluating the states associated
with the measurement unit further comprises determining optimal
values for the states associated with the measurement unit. In some
embodiments, evaluating the states associated with measurement unit
is based at least part on a maximum-a-posteriori method. In some
embodiments, the time period includes a target point in time that
corresponds to the target reference system, wherein the target
point in time corresponds to an initial point of the time period.
In some embodiments, transforming data obtained by the measurement
unit at different points in time further comprises projecting at
least a portion of the data obtained by the measurement unit in
accordance with one or more transformation matrices.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] FIG. 1A illustrates a scanning pattern of a laser sensor
that can be utilized in accordance with some embodiments of the
presently disclosed technology.
[0012] FIG. 1B illustrates a frontal view of a three-dimensional
point cloud generated by a laser sensor, in accordance with some
embodiments of the presently disclosed technology.
[0013] FIG. 1C illustrates an angled view of a three-dimensional
point cloud generated by a laser sensor, in accordance with some
embodiments of the presently disclosed technology.
[0014] FIG. 2 illustrates a mobile platform with a laser sensor and
an vision sensor in accordance with some embodiments of the
presently disclosed technology.
[0015] FIG. 3 illustrates a sequence of frames of scanning point
data generated by a laser sensor that is carried by a mobile
platform, which moves during a period of time, in accordance with
some embodiments of the presently disclosed technology.
[0016] FIG. 4 illustrates a calibration process for calibration
between a laser unit (e.g., including one or more laser sensors)
and an vision unit (e.g., including one or more vision sensors) in
accordance with some embodiments of the presently disclosed
technology.
[0017] FIG. 5 illustrates one frame of laser scanning points
produced by a laser unit in accordance with some embodiments of the
presently disclosed technology.
[0018] FIG. 6 illustrates a combined point cloud generated in
accordance with some embodiments of the presently disclosed
technology.
[0019] FIG. 7A illustrates a grayscale image that is captured or
otherwise obtained from a vision unit, in accordance with some
embodiments of the presently disclosed technology.
[0020] FIG. 7B illustrates an edge image that can be determined
from the grayscale image of FIG. 7A, in accordance with some
embodiments of the presently disclosed technology.
[0021] FIG. 7C illustrates a position set (projections of feature
points 615 of FIG. 6) in the edge image of FIG. 7B, in accordance
with some embodiments of the presently disclosed technology.
[0022] FIG. 8 illustrates a mobile platform that carries multiple
sensors in addition to a laser unit (or laser sensor), in
accordance with some embodiments of the presently disclosed
technology.
[0023] FIG. 9 illustrates information that can be provided by the
multiple sensors of FIG. 8, in accordance with some embodiments of
the presently disclosed technology.
[0024] FIG. 10 illustrates data collection frequency differences of
the multiple sensors and the laser unit of FIG. 8, in accordance
with some embodiments of the presently disclosed technology.
[0025] FIG. 11 illustrates a process for combining time sequenced
point information to form a point cloud in accordance with some
embodiments of the presently disclosed technology.
[0026] FIG. 12 illustrates examples of mobile platforms configured
in accordance with some embodiments of the presently disclosed
technology.
[0027] FIG. 13 is a block diagram illustrating a representative
architecture for a computer system or device that can be utilized
to implement various portions of the presently disclosed
technology.
DETAILED DESCRIPTION
1. Overview
[0028] To facilitate efficient and accurate object detection for
mobile platforms while overcoming the deficiencies associated with
omni-directional laser sensors, the presently disclosed technology
is directed to calibrating emitter/detector sensor(s) (e.g., laser
sensor(s) with a limited FOV) with optical detection sensor(s) to
provide position information (including distance information) of
objects in the environment surrounding of mobile platform. Laser
sensors with a limited FOV (e.g., small-angle laser sensors) can be
significantly cheaper than omni-directional laser sensors and as
used herein typically refer to laser sensors with a horizontal
field of view (FOV) smaller than 360 degrees, 180 degrees, 90
degrees, or 60 degrees.
[0029] Laser sensors with a limited FOV typically generate a more
limited number of laser scanning points (and a sparser distribution
of laser scanning points) than an omni-directional LiDAR. These
factors may make it difficult to develop a stable corresponding
relationship between the laser sensor and a camera. With respect to
this problem, the presently disclosed technology can use an
advanced visual inertial navigation technology in combination with
sensors carried by the mobile platform to stably generate and/or
update six-degrees-of-freedom transformation information (e.g.,
transformation matrix) for transforming between coordinate systems
associated with the laser sensor and the camera, based on certain
positioning information of the mobile platform body. Additionally,
the disclosed technology can detect external interferences (e.g.,
external vibration and/or other disturbances during the deployment
of the mobile platform) to the laser sensor and/or the camera based
on changes to the calibrated transformation information. The
disclosed technology can enable accurate calibration and
interference detection in real time, further contributing to the
reliability and safety of the mobile platform.
[0030] Several details describing structures and/or processes that
are well-known and often associated with mobile platforms (e.g.,
UAVs or other types of movable objects) and corresponding systems
and subsystems, but that may unnecessarily obscure some significant
aspects of the presently disclosed technology, are not set forth in
the following description for purposes of clarity. Moreover,
although the following disclosure sets forth several embodiments of
different aspects of the presently disclosed technology, several
other embodiments can have different configurations or different
components than those described herein. Accordingly, the presently
disclosed technology may have other embodiments with additional
elements and/or without several of the elements described below
with reference to FIGS. 1-13.
[0031] FIGS. 1-13 are provided to illustrate representative
embodiments of the presently disclosed technology. Unless provided
for otherwise, the drawings are not intended to limit the scope of
the claims in the present application.
[0032] Many embodiments of the technology described below may take
the form of computer- or controller-executable instructions,
including routines executed by a programmable computer or
controller. The programmable computer or controller may or may not
reside on a corresponding mobile platform. For example, the
programmable computer or controller can be an onboard computer of
the mobile platform, or a separate but dedicated computer
associated with the mobile platform, or part of a network or cloud
based computing service. Those skilled in the relevant art will
appreciate that the technology can be practiced on computer or
controller systems other than those shown and described below. The
technology can be embodied in a special-purpose computer or data
processor that is specifically programmed, configured or
constructed to perform one or more of the computer-executable
instructions described below. Accordingly, the terms "computer" and
"controller" as generally used herein refer to any data processor
and can include Internet appliances and handheld devices (including
palm-top computers, wearable computers, cellular or mobile phones,
multi-processor systems, processor-based or programmable consumer
electronics, network computers, mini computers and the like).
Information handled by these computers and controllers can be
presented at any suitable display medium, including an LCD (liquid
crystal display). Instructions for performing computer- or
controller-executable tasks can be stored in or on any suitable
computer-readable medium, including hardware, firmware or a
combination of hardware and firmware. Instructions can be contained
in any suitable memory device, including, for example, a flash
drive, USB (universal serial bus) device, and/or other suitable
medium.
2. Representative Embodiments
[0033] FIG. 1A illustrates a scanning pattern 102a of a laser
sensor that can be utilized in accordance with some embodiments of
the presently disclosed technology. As illustrated in FIG. 1A, the
FOV of an example laser sensor is no larger than 60 degrees in both
horizontal or vertical directions.
[0034] FIG. 1B illustrates a frontal view of a three-dimensional
point cloud generated by a laser sensor (e.g., the laser sensor
illustrated in FIG. 1A). Compared with a conventional
omni-directional laser sensor that can provide a dense and
uniformly-distributed, 360-degree three-dimensional point cloud
(e.g., a single frame may provide at least 200,000 scanning points
within 0.1 second), the illustrative laser sensor of FIG. 1B
generates sparser point clouds (e.g., a single frame may provide
only 2000 scanning points within 0.1 second), with non-uniform or
uneven point distribution (e.g., points are relatively concentrated
in the central region of the sensor's FOV and are relatively sparse
in the peripheral regions of the sensor's FOV).
[0035] FIG. 1C illustrates an angled view of a three-dimensional
point cloud generated by a laser sensor (e.g., the laser sensor
illustrated in FIG. 1A). As discussed earlier, (and unlike the
uniform angular distribution of a laser beam generated by a typical
omni-directional LiDAR), the distribution of laser scanning points
generated by certain laser sensors can be non-uniform or uneven.
Illustratively, with reference to FIG. 1C; the points are
relatively sparse in a peripheral area 110, and are relatively
dense in a frontal area 120.
[0036] Conventional methods for calibration between an
omni-directional LiDAR and a monocular camera divide single frame
LiDAR observation data (e.g., laser scanning data obtained within
0.1 second) into individual laser beams, and detect
depth-discontinuous points (sometimes referred to herein as
"feature points") on individual laser beams. However, applying
these conventional methods to laser sensors with a limited FOV can
be difficult, due to the point cloud characteristics discussed
earlier with reference to FIGS. 1A to 1C (e.g., the non-uniform
distribution and/or limited number of points in point cloud
data).
[0037] The presently disclosed technology can use multiple sensors
carried by the mobile platform, and can apply an advanced data
fusion method to combine multiple frames of laser scanning data and
establish dense point cloud information. The presently disclosed
technology includes a new method for detecting feature points
within point clouds, which can account for point cloud distribution
characteristics of laser sensors with a limited FOV and planar
distribution characteristics in an environment. In combination with
methods for extracting edge information in an image, embodiments of
the disclosed technology evaluate a match or correlation between
the feature points and the edge information, for example, via an
exhaustion based method, and generate calibration rules for
calibrating, for example, between a laser sensor and a monocular
camera.
[0038] FIG. 2 illustrates a mobile platform 210 with a laser sensor
215 (e.g., a small-angle LiDAR sensor) and a vision sensor 225
(e.g., a monocular camera) in accordance with some embodiments of
the presently disclosed technology. The mobile platform, laser
sensor and the vision sensor can be associated with respective
coordinate systems. Hereinafter, F.sup.r, F.sup.l, and F.sup.c are
used to represent coordinate systems of the mobile platform 210,
the laser sensor 215, and the vision sensor 225, respectively. In
some embodiments, the initial value .sub.rT.sub.c of a
transformation matrix between coordinate systems of the vision
sensor 215 and the mobile platform 210, and the initial value
.sub.rT.sub.l of a transformation matrix between coordinate systems
of the laser sensor 215 and the mobile platform 210 can be known or
predetermined, for example, based on their relative position and
orientation. Based on these, an initial value .sub.cT.sub.l of a
transformation matrix between the coordinate systems of the vision
sensor 225 and the laser sensor 215 can be calculated.
[0039] FIG. 3 illustrates a sequence of frames of scanning point
data generated by a laser sensor 315 that is carried by a mobile
platform 310, which moves during a period of time. As illustrated
in FIG. 3, the laser sensor 315 (e.g., corresponding to the laser
sensor 215 illustrated in FIG. 2) carried by the mobile platform
310 (e.g., corresponding to the mobile platform 210 illustrated in
FIG. 2) generates multiple frames 320 of scanning point data during
a time period from t.sub.i to t.sub.i+k. For example, frame 320a is
generated at time t.sub.i with the mobile platform 310 (and the
laser sensor 315) situated in a first position/orientation, frame
320b is generated at a subsequent time t.sub.i+1 with the mobile
platform 310 (and the laser sensor 315) situated in a second
position/orientation, and frame 320c is generated at a subsequent
moment t.sub.i+2 with the mobile platform 310 (and the laser sensor
315) situated in a third position/orientation. As will be discussed
in detail below with reference to FIGS. 8-11, portions of the
presently disclosed technology can generate a combined point cloud
based on temporally sequenced sets of point data, such as the
sequence of frames 320. Also, the relative position
T.sub.t.sub.i.sup.r of the mobile platform 310 in a reference
coordinate system at any point in time t.sub.i within the time
period can be calculated based thereon.
[0040] FIG. 4 illustrates a calibration process for calibration
between a laser unit (e.g., including one or more laser sensors 215
illustrated in FIG. 2) and a vision unit (e.g., including one or
more vision sensors 225 illustrated in FIG. 2) in accordance with
some embodiments of the presently disclosed technology. The
calibration process of FIG. 4 can be implemented by a controller
(e.g., an onboard computer of a mobile platform, an associated
computing device, and/or an associated computing service).
[0041] In step 405, the process includes combining temporally
sequenced sets of point information obtained from the laser unit to
form a point cloud in a reference system. For example, FIG. 5
illustrates one frame of laser scanning points 510 produced by the
laser unit at a point in time (e.g., time t.sub.i as illustrated in
FIG. 3), in accordance with some embodiments of the presently
disclosed technology. Illustratively, individual scanning points
within a frame may not be generated simultaneously. For example, in
some embodiments, although laser sensor data (e.g., scanning
points) is collected continuously, frames of scanning points are
generated or transmitted in accordance with some discrete time
intervals. In other words, a frame may correspond to a set of laser
sensor data (e.g., scanning points) accumulated in a certain
duration of time (e.g., 0.1 second). With reference to FIG. 5,
illustratively a sparse set of laser scanning points 510 is
distributed in a three-dimensional coordinate system 520 in a
non-uniform manner. As discussed earlier, in some embodiments, the
sparse and non-uniform distribution of points 510 may not provide
enough data for the desired calibration between the laser unit and
the vision unit. FIG. 6 illustrates a combined point cloud
generated in accordance with some embodiments of the presently
disclosed technology. As illustrated in FIG. 6, a dense set of
laser scanning points 610 that combines multiple sets (e.g., 10
consecutive frames) of laser scanning points (e.g., similar to the
set of points 510 in FIG. 5) is distributed in a three-dimensional
coordinate system 620 in a relatively uniform manner to provide
comprehensive three-dimensional environmental information.
[0042] Embodiments of the combining process will be discussed in
further detail below with reference to FIGS. 8-11. To combine
multiple frames of point data in a manner that reduces noise and/or
errors, embodiments of the presently disclosed technology include
estimating a relative transformation matrix between successive
frames by using multiple types of sensors carried by the mobile
platform.
[0043] In some embodiments, step 405 determines relative positions
T.sub.t.sub.i.sup.r, T.sub.t.sub.i+1.sup.r, . . . ,
T.sub.t.sub.i+k.sup.r of the mobile platform body at respective
points in time with or without actually combining the multiple
frames of scanning points. In these embodiments, feature points can
be selected from each frame of point data and combined based on the
relative positions T.sub.t.sub.i.sup.r, T.sub.t.sub.i+1.sup.4, . .
. , T.sub.t.sub.i+k.sup.r. For example, given two relative
positions T.sub.t.sub.i.sup.r and T.sub.t.sub.i+1.sup.r, the
controller can calculate transformation matrix
t.sub.iT.sub.t.sub.i+1.sup.r for transforming between the mobile
platform coordinate systems at times t.sub.i and t.sub.i+1. Also
using suitable default or initial transformation between coordinate
systems of the laser unit and the mobile platform, the controller
can align feature points in frames of different times in a mobile
platform coordinate system at a particular time (e.g.,
t.sub.i).
[0044] In step 410, the calibration process includes selecting a
subset of feature points from the point cloud. Illustratively,
feature points can be identified in multiple frames of scanning
points. In addition to a depth difference between neighboring or
continuous points, the presently disclosed technology can account
for at least two aspects:
[0045] 1) as distance in depth increases, laser scanning points
become sparser, and thus the distance between two neighboring or
continuous points increases; and
[0046] 2) as laser scanning points approach the periphery of the
FOV (e.g., an angle between the laser beam line and the laser unit
orientation (e.g., laser unit main axis) becomes larger), distance
between two neighboring points increases.
[0047] Based on the above, the process can include calculating the
greater distance between two pairs of neighboring or continuous
points in individual frames according to the following formula:
d.sub.i=max(|p.sub.i-p.sub.i+1|, |p.sub.i-p.sub.i-1|)
wherein |p.sub.i-p.sub.i+1| denotes a distance between two points i
and i+1. Then, the controller determines two scaling
parameters:
d .varies. z i .times. .times. and .times. ##EQU00001## .gamma.
.varies. arccos .times. .times. ( p i n | p i | n | ) .
##EQU00001.2##
[0048] The first parameter .epsilon..sub.d is proportional to the
z-direction distance to a point (e.g., along the laser beam axis),
and the second parameter .epsilon..sub.y is proportional to an
angle between a corresponding laser beam and the laser unit
orientation ii. The controller can calculate a normalized
depth-discontinuous value
d i = d i d .times. .gamma. , ##EQU00002##
which can be compared to a threshold to filter out those values
that are smaller than the threshold. In this manner, the controller
identifies feature points (that correspond relatively large
normalized values d.sub.i) from a frame of points. Illustratively,
black solid points 515 represent a subset of feature points
identified from scanning points 510 in the frame of FIG. 5. In some
embodiments, this selecting process can be applied to a combined
point cloud if it is generated in step 405. Illustratively, black
points 615 represent a subset of feature points identified from a
combined point cloud 610 of FIG. 6.
[0049] According to (1) the known transformation initial value
.sub.rT.sub.l for transforming between coordinate systems of the
mobile platform and the laser unit, and (2) relative positions
T.sub.t.sub.i.sup.r, T.sub.t.sub.i+1.sup.r, . . . ,
T.sub.t.sub.i+k.sup.r of the mobile platform body at respective
points in time (e.g., as determined in step 405 or calculated by an
associated attitude estimation unit), the controller can project
feature points identified from frames at different points in time
into an initial mobile platform coordinate system
F.sub.t.sub.i.sup.r that corresponds to time t.sub.i, i.e., the
beginning moment of a time period from t.sub.i to t.sub.i+k).
Depending on the orientation of the initial mobile platform
coordinate system F.sub.t.sub.i.sup.r, the projected feature points
from multiple frames can appear similar to the black points 615
illustrated in FIG. 6.
[0050] The controller can then determine a position of the vision
unit relative to the initial mobile platform coordinate system
F.sub.t.sub.i.sup.r based on (1) the relative positions
T.sub.t.sub.i.sup.r, T.sub.t.sub.i+1.sup.r, . . . ,
T.sub.t.sub.i+k.sup.r of the mobile platform body and (2) the
initial value .sub.rT.sub.c of transformation matrix between
coordinate systems of the vision unit and the mobile platform, and
project the feature points into coordinate systems of the vision
unit at different points in time.
[0051] In step 415, the calibration process includes deriving edge
information from one or more image(s) obtained from the vision
unit. Illustratively, the vision unit captures color images (which
can be converted to corresponding grayscale images) or grayscale
images at different times from t.sub.i to t.sub.i+k. For example,
FIG. 7A illustrates a grayscale image that is captured or otherwise
obtained from the vision unit, in accordance with some embodiments
of the presently disclosed technology.
[0052] For each grayscale image captured at a particular point in
time, the controller derives edge information. In some embodiments,
for each pixel of the image, the controller determines the maximum
difference between the grayscale values of the pixel and any of its
neighboring pixels (e.g., within a threshold proximity) in
accordance with the following formula:
e i , j = max g m , n .di-elect cons. G .times. | g i , j - g m , n
| ##EQU00003##
wherein G denotes a neighborhood area around g.sub.i,j. An edge
image E indicating all e.sub.i,j values can be generated to
describe edge information derived from a corresponding image. In
some embodiments, the controller may optionally smooth the image E
to help improve the matching between edge information and feature
points in the following step. FIG. 7B illustrates an edge image E
that can be determined from the grayscale image of FIG. 7A, in
accordance with some embodiments of the presently disclosed
technology. Representative edges 712 (in lighter tone) are
identified in FIG. 7B.
[0053] Those of skill in the relevant art may use other suitable
edge detection techniques to obtain edge information from the
vision unit. Additionally, the extraction of edge information can
be performed via associated GPU parallelism, so that the image can
be divided into blocks for parallel processing to quickly extract
the edge information.
[0054] In step 420, the calibration process includes generating
calibration rules based on evaluating a match between feature
points and edge information. Illustratively, based on (a) relative
positions r.sub.xT.sub.c, x=t.sub.i, . . . , t.sub.i+k of the
vision unit at different times and (b) corresponding internal
parameters, the controller can project feature points in the
feature point subset P.sup.f that is obtained in step 410 onto
individual edge images E.sub.i, . . . , E.sub.i+k obtained in step
415. The projection can produce a position set p.sup.f of
two-dimensional points (corresponding to the three-dimensional
feature points) in a respective edge image. For example, FIG. 7C
illustrates a position set 715 (a projection of feature points 615
of FIG. 6) in the edge image of FIG. 7B, in accordance with some
embodiments of the presently disclosed technology.
[0055] With respect to each point p.sub.j.sup.f .di-elect
cons.p.sup.f, where p.sub.j.sup.f=[u.sub.j.sup.f, v.sub.j.sup.f],
the controller can identify an edge value e.sub.u.sub.j.sup.f,
v.sub.j.sup.f of the pixel in the corresponding edge image E.sub.i.
Based on the normalized depth-discontinuous value d.sub.j for each
feature point as calculated in step 410, the controller can
evaluate the following target function:
V = i = 1 , 2 , ... , k .times. .times. j = 1 , 2 , ... , n .times.
f .function. ( e i , j .times. ' .times. d j ) ##EQU00004##
wherein i denotes an index of an image obtained by the vision unit,
k denotes the number of images in a time period (e.g., a
time-domain window W.sub.t of 10 or 20 seconds), j denotes an index
of a feature point, and n denotes the number of points in the
feature point subset P.sup.f, e.sub.i,j denotes an edge value of a
pixel (corresponding to a projection of feature point j) in image
i, and d.sub.j denotes a normalized depth-discontinuous value of a
feature point j. In some embodiments, f(e.sub.i,j, d.sub.j) can be
defined as e.sub.i,jd.sub.j. In various embodiments, edge points in
an image correspond to depth-discontinuous points in a
corresponding three-dimensional space, therefore a higher value of
V indicates a more accurate calibration between the laser unit and
the camera unit.
[0056] To generate calibration rules (e.g., transformation matrix
.sub.cT.sub.l for transforming between coordinate systems of the
vision unit and the laser unit), the controller can implement an
exhaustion based method. On the basis of a given initial value
.sub.cT.sub.l for the transformation matrix, the controller may
generate a set of m transformation matrices
={, , . . . ,}
by introducing disturbances such that =.sub.cT.sub.l.DELTA.T.sub.i,
where .DELTA.T.sub.i can be a randomly generated disturbance factor
within a threshold. In some embodiments, the transformation matrix
has six degrees of freedom, therefore can generally be calculated
by adding randomized noise to a translation vector [t.sub.x,
t.sub.y, t.sub.z] and an Eulerian angle [.alpha., .beta., .gamma.],
respectively. In some embodiments, this approach uses an initial
value .sub.cT.sub.l that is not too far away (e.g. within a
threshold proximity) from the truth value .sub.cT.sub.l, that is,
the truth value is in a neighborhood of a parameter space where the
initial value is located.
[0057] For each value, the controller can calculate a respective
value V.sub.i of the target function. Among all transformation
matrices in the set , the controller can select a transformation
matrix corresponding to a maximum value V.sub.max to be
.sub.cT.sub.l. In some embodiments, the controller can calibrate
the laser unit with the vision unit based on the generated
calibration rules. For example, the controller may use the
determined transformation matrix .sub.cT.sub.l to correlate (a)
scanning points data generated by the laser unit with (2) image
data (such as pixels) generated by the vision unit.
[0058] In some embodiments, noise in the observation data may cause
the target function value to appear smaller when evaluated with the
truth value .sub.cT.sub.l than with certain non-truth values. This
situation may be more apparent if the time-domain window is
relatively short (e.g., a time period limited to include only one
or two frames of image generated by the vision unit). To mitigate
this problem, the presently disclosed technology can include using
a longer time-domain window (e.g., a time period to include tens or
hundreds of frames of image generated by the vision unit) in order
to select an optimal transformation matrix .sub.cT.sub.l. A longer
time-domain window may enhance the robustness of the calibration
process and possibly avoid local maximum issues.
[0059] In step 425, the calibration process includes comparing
newly generated calibration rules against previously generated
calibrations rules. Generally speaking, the laser unit and the
vision unit are both fixed to the mobile platform body during its
movement. Under usual circumstances, .sub.cT.sub.l may not change
substantially and/or abruptly, but may change slightly due to
vibrations. .sub.cT.sub.l may change substantially and/or abruptly
when the mobile platform and/or the units receive some significant
external impact.
[0060] The controller can compare a newly determined transformation
matrix .sub.cT.sub.l against those determined in an initial round
of calibration, a most recent round of calibration, an average or
weighted average of several recent rounds, or the like. In some
embodiments, the calibration process uses a sliding time-domain
window method to detect, within the sliding time-domain window,
whether a currently determined optimal .sub.c{tilde over (T)}.sub.l
is evidently different (e.g., with respect to a threshold) from the
truth value(s) estimated previously.
[0061] In step 430, the calibration process includes determining
whether the difference that results from the comparison in step 425
exceeds a threshold. If not, the process proceeds to step 405 for a
new round of calibration. If the difference exceeds the threshold,
the process proceeds to step 435.
[0062] In step 435, the calibration process includes taking one or
more further actions. The difference exceeding the threshold may
indicate that the laser unit and the vision unit cannot be reliably
calibrated with each other. For example, the physical position or
orientation of at least one of the two units may have deviated
substantially from a preset configuration. In this case, the
controller may issue a warning to an operator of the mobile
platform. Alternatively, the controller may suspend the navigation
or other functions of the mobile platform in a safe manner.
[0063] As discussed earlier, in the use of certain laser units or
sensors, the number and/or distribution of laser scanning points in
a single frame may not provide a sufficiently dense point cloud to
facilitate calibration, mapping, object detection, and/or
positioning. This problem may be particularly apparent in the use
of low-cost small-angle LiDAR sensors. For example, for a typical
low-cost small-angle LiDAR, the number of laser points in a single
frame is usually limited to be fewer than 4000 or even 2000,
whereas a more expensive omni-directional LiDAR may produce 288000
laser scanning points in a single frame. To combine multiple frames
of point data in a manner that reduces noise and error, the
presently disclosed technology includes estimating a relative
transformation matrix between successive frames by using multiple
types of sensors carried by a mobile platform.
[0064] FIG. 8 illustrates a mobile platform 820 that carries
multiple sensors in addition to a laser unit (or sensor), in
accordance with some embodiments of the presently disclosed
technology. As illustrated, the mobile platform 820 may carry a
stereo camera 804, an inertial measurement unit 806, a wheel
encoder 810, and/or a global positioning system (GPS) 802, in
addition to a laser unit 808. Those of skill in the relevant art
will appreciate that fewer, more, or alternative sensors may be
used by the presently disclosed technology. For example, instead of
using the stereo camera 804, a set, array, or system of multiple
cameras can be used.
[0065] FIG. 9 illustrates information that can be provided by the
multiple sensors of FIG. 8, in accordance with some embodiments of
the presently disclosed technology. The stereo camera 804 can
provide three-dimensional coordinates of environmental features 902
(e.g., one or more distinctive points in three dimensional space of
surrounding environment), which may establish a constraint
relationship between successive frames (e.g., corresponding to
observations from two different positions 920a and 920b).
Illustratively, the sampling frequency or data acquisition rate of
the stereo camera 804 is between 20 Hz and 40 Hz. The inertial
measurement unit 806 can provide high-frequency acceleration
information and angular velocity information. Illustratively, the
sampling frequency or data acquisition rate of the inertial
measurement unit is 200 Hz or higher. Via integration, a
transformation matrix of the mobile platform 820 between two
successive frames can be calculated. The wheel encoder 810 can
provide the rotation speed of the powered wheels (e.g., rear
wheels) and steering information of the front wheels, and can
provide, according to a known wheel size, constraints on forward
speeds and deflection angles between successive frames.
Illustratively, the sampling frequency or data acquisition rate of
the wheel encoder is about 20 Hz. Depending on outdoor signal
conditions, the GPS 802 can provide the position of the mobile
platform 820 and attitude information thereof in global system.
Illustratively, the sampling frequency or data acquisition rate of
the GPS is below 5 Hz. Illustratively, the laser unit 808 (e.g.,
including one or more LiDAR sensors) has a sampling frequency or
data acquisition rate of 10 Hz.
[0066] The table below summarizes typical data acquisition
frequency information of the representative sensors illustrated in
FIGS. 8 and 9:
TABLE-US-00001 Sensor Frequency Laser 10 hz Stereo camera 20 hz to
40 hz Inertial measurement unit >200 hz Wheel encoder
approximately 20 hz Global positioning system 5 hz
[0067] FIG. 10 illustrates data collection frequency differences of
the multiple sensors and the laser unit of FIG. 8, in accordance
with some embodiments of the presently disclosed technology.
[0068] FIG. 11 illustrates a process for combining time sequenced
point information generated by a laser unit to form a point cloud
in accordance with some embodiments of the presently disclosed
technology. The process can be implemented by a controller (e.g.,
an onboard computer of a mobile platform, an associated computing
device, and/or an associated computing service). As part of the
presently disclosed technology, generating a combined point cloud
can include estimating relative states associated with the laser
unit over a period of time, instead of estimating all subsequent
states with respect to a global coordinate system. Illustratively,
embodiments of the presently disclosed technology estimate relative
position information of the laser unit with respect to two or more
different frames that it generates in the period of time, thereby
enabling accurate accumulation of laser point data from different
frames in this period of time. This approach can facilitate or
enhance subsequent calibration, object detection, mapping, and/or
positioning operations.
[0069] Step 1105 of the process includes obtaining observation
data, corresponding to a period of time, from multiple observation
sensors (e.g., the multiple sensors as illustrated in FIG. 8). In
some embodiments, methods in accordance with the presently
disclosed technology may make an approximate case that observation
data from different sensors is synchronized. For example, in a
representative case, the data acquisition frequency of the target
laser unit is 10 Hz, the frequency of the stereo camera is 40 Hz,
the frequency of the wheel encoder is 20 Hz, the frequency of the
inertial measurement unit is 200 Hz, and the frequency of the GPS
is 5 Hz. As an approximation, observation data from different
sensors can be considered as accurately aligned according to
different frequency multiples. Accordingly, using a 1-second time
window as an example, the controller can obtain 200 accelerometer
and gyroscope readings (from the inertial measurement unit), 40
frames of stereo camera observation, 20 groups of speed and
deflection angle observations (from the wheel encoder), and 5
pieces of GPS positioning information. Based on these, embodiments
of the presently disclosed technology can estimate relative
positions between 10 laser unit data acquisition events or
positions thereof with respect to a particular local coordinate
system (such as a local coordinate system corresponding to the
first of the 10 data acquisition events).
[0070] In some embodiments, the presently disclosed technology
includes a further approximation that the position of the laser
unit coincides with that of the stereo camera, thereby further
simplifying the problem to be solved. As discussed with reference
to FIG. 9, the observation data from the different sensors can be
described mathematically as follows: [0071] 1) According to the
observation data from the stereo camera, illustratively
three-dimensional coordinates and/or descriptor(s) of one or more
environmental features (e.g., feature 902) can be extracted from
frames produced by the camera at positions 920a and 920b,
respectively. These coordinates and/or descriptor(s) can be matched
with respect to the feature 902. In an objective function for
optimization, this type of observation can be embodied by an error
item relating to the re-projection of feature(s) onto the camera
coordinate systems at different positions. For example, the cost
term based on an environment feature and two consecutive frames of
stereo camera observation includes 3 parts: (a) a re-projection
error between the left camera and right camera at a frame
corresponding to position 920a; (b) a re-projection error between
the left camera and right camera at a frame corresponding to
position 920b; and (c) a re-projection error between the left (or
right) camera at two positions 920a and 920b. [0072] 2) According
to the observation data from the inertial measurement unit with
known timestamp and initial values, a constraint relationship of a
rotation matrix, a translation vector, and a speed between two
consecutive camera frames can be calculated, for example, by using
suitable integration techniques known to those of skill in the
relevant art. This type of observation can be embodied by an error
item between the post-integration state and a real state in the
objective function. Illustratively, the variables to be estimated
at each frame, e.g., camera frames corresponding to positions 920a
and 920b, include the camera's orientation (e.g., an element in
Special Orthogonal group), and position and velocity (e.g.,
elements in R.sup.3 space group.) Integration using observations
captured from the inertial measurement unit provides the
constraints between the variables explained above. In some
embodiments, while a state is optimized iteratively, suitable
pre-integration technique is adopted to improve computational
efficiency. [0073] 3) A motion model including the speed and
deflection angle of the mobile platform can be derived based on
observation data from the wheel encoder. Similarly, via
integration, a state constraint between consecutive camera frames
can be obtained, and the expression of this type of observation can
be similar to that of the inertial measurement unit. In some
embodiments, in contrast to the situation of the inertial
measurement unit, only a sub-space of the state is constrained
(e.g., the position and the yaw angle of the mobile platform) based
on the wheel odometer observations. Due to possible noise of the
wheel encoder, the covariance of this error term can be set to be
relatively larger in some embodiments. [0074] 4) The observation
data from the GPS can directly provide a constraint on a state of
the mobile platform at a particular time. In the objective
function, this type of observation can be expressed as an error
between an estimated state provided by the GPS and a real state
value. Due to the low data acquisition frequency of the GPS in some
embodiments, the GPS observation may only be used when its noise
level lower than certain threshold and/or its accuracy guaranteed
within certain range.
[0075] In embodiments for which the position of the laser unit is
approximately coinciding with that of the stereo camera, a
controller (e.g., an onboard computer of the mobile platform, an
associated computing device, and/or an associated computing
service) obtains observation data that can be provided by the
sensors for a period of time from time 1 until time k . The
observation data can be expressed as follows:
Z.sub.k={C.sub.l:k, I.sub.l:k-1, W.sub.l:p, G.sub.l:q}
where
[0076] 1) the first element C .sub.l:k denotes observation
information obtained by the stereo camera, and may be defined as
follows:
C.sub.i={z.sub.i,1, z.sub.i,2, . . . , z.sub.i,1}
where denotes an observation of a j.sup.th feature in the i.sup.th
frame by the stereo camera;
[0077] 2) the second element I.sub.l:k-1 denotes a set of data
acquired by the inertial measurement unit until the k.sup.th point
in time, where I.sub.i={I.sub.i, I.sub.i+1, I.sub.i+2, . . .
I.sub.i+,m} denotes a set of all observations by the inertial
measurement unit between the i.sup.th frame produced by the camera
and the i+1.sup.th frame produced by camera (e.g., a total of 20
readings from the inertial measurement unit between 2 successive
camera observations);
[0078] 3) the third element W.sub.l:p denotes the observation by
the wheel encoder, which may be expressed as follows:
W.sub.i,j=[v.sub.i,j.sup.W, q.sub.i,j.sup.W]
where v.sub.i,j.sup.W denotes speed information obtained by the
wheel encoder at the point in time and the j.sup.th point in time
and q.sub.i,j.sup.W denotes a rotation transformation (e.g.,
quaternion expression), which can be derived or otherwise obtained
by a deflection angle calculation, between the i.sup.th point in
time and the j.sup.th point in time; and
[0079] 4) the last element G.sub.l:q, expresses the observation
obtained by the GPS:
G.sub.i=[p.sub.i.sup.G, q.sub.i.sup.G]
where p.sub.i.sup.G denotes a global position of the i.sup.th point
in time, and q.sub.i.sup.G denotes rotation with respect to a
global coordinate system.
[0080] Step 1110 of the process includes evaluating states
associated with the laser unit at different points in time within
the time period based on the observation data. Using a factor
graph, the controller may establish a relationship between an a
priori probability and an a posteriori probability associated with
states
X.sub.k={X.sub.k}.sub.k=1 . . . . n
of the laser unit (coincident with the stereo camera):
p .function. ( X k | Z k ) .varies. .times. p .function. ( X 0 )
.times. .times. p .function. ( Z k | X k ) .times. = .times.
.times. .times. .times. p .function. ( X 0 ) .times. .times. i k -
1 .times. p ( I i .times. x i , x i + 1 ) .times. .times. i , j k
.times. p ( W i , j | x i , x j .times. ) .times. i m .times. p ( G
i | x i .times. ) .times. i k .times. i C 1 .times. p ( z i , l
.times. x i ) ##EQU00005##
where k=[1, 2, . . . , k] denotes a set of observation indexes of
the camera, in denotes a set of observation indices of the GPS, and
a state of the laser unit can be expressed as:
x.sub.k=[p.sub.k, v.sub.k, q.sub.k]
where x.sub.k=p.sub.k and q.sub.k respectively denote a position, a
speed, and a quaternion (rotation) of the laser unit with respect
to a particular coordinate system at the k.sup.th point in time. In
the above formula, each p( ) is called a factor of the factor
graph.
[0081] In some embodiments, using a mathematical derivation based
on an assumption of zero-mean Gaussian white noise, the controller
may compute a maximum-a-posteriori of the above factor graph based
formula by solving for a minimum of the following formula:
X k * = arg .times. .times. min X k .times. - log .times. p
.function. ( X k | Z k ) .times. = arg .times. .times. min X k
.times. r 0 .SIGMA. 0 2 + .SIGMA. i k - 1 .times. r I i .SIGMA. I i
2 + .times. i , j k .times. r W i , j .SIGMA. W i , j 2 + i
.di-elect cons. m .times. r G i .SIGMA. G i 2 + i .di-elect cons. k
.times. .times. l .di-elect cons. C i .times. .times. r C i , j
.SIGMA. C i , j 2 ##EQU00006##
where r.sub.* represents different residual types, and
.SIGMA..sub.* denotes covariance matrices corresponding to
different types of residuals, and is used to describe the
uncertainty of the observation. In this regard, those of skill in
the relevant art can determine residual models for different
sensors and determine Jacobian matrices between optimization
iterations. The controller can calculate optimal values for the
laser unit states based on the minimization, for example, based on
a gradient-based optimization method.
[0082] Step 1115 of the process includes determining transformation
rules for transforming between multiple reference systems (e.g., at
different points in time) and a target reference system.
Illustratively, according to the following approximations: (1) the
positions of the stereo camera and laser unit coincide with each
other; and (2) timestamps of data acquired by the laser unit and
data acquired by the camera are exactly the same, the controller
can compute relative transformation matrices for the laser unit at
different points in time with respect to a target point in time
(i.e., when the subject period of time starts, half-way through the
subject time period, or when the subject period of time ends) using
corresponding states as determined.
[0083] In some embodiments, the approximations that (1) the
positions of the stereo camera and laser unit coincide with each
other; and (2) timestamps of data acquired by the laser unit and
data acquired by the camera are exactly the same are not used. In
these embodiments, the presently disclosed technology can account
for two factors: (1) relative changes (e.g., the transformation
matrix .sub.cT.sub.l between the stereo camera and the laser unit;
and (2) a timestamp difference between different sensors. Regarding
the first factor (1), because the laser unit and the stereo camera
are not likely to move relative to each other during the subject
period of time, the controller may calculate a relative position of
the laser unit at any q.sup.th point in time with respect to any
p.sup.th point in time during the subject time period by simply
calculating a relative position of the camera at time q with time
p. As for the second factor (2) where timestamps between different
sensors cannot be perfectly synchronized, the controller may use
interpolation (e.g., based on a polynomial fitting) to compute
relative position information in a coordinate system (e.g., a
coordinate system of the mobile platform) at the time of any
specified timestamp.
[0084] Step 1120 of the process includes transforming data obtained
by the laser unit at different points in time based on the
transformation rules. Illustratively, using the relative
transformation matrices as determined in step 1115, the controller
can re-project data (e.g., laser scanning points) acquired at
different points in time (e.g., different frames) in the subject
time period, to the target point in time. In some embodiments, the
controller can exclude certain points in time from the
re-projection process due to excessive noise, data error, or other
factors. Step 1125 of the process includes generating a combined
point cloud using the transformed data. Illustratively, the
controller can add the re-projected data from multiple (selected)
frames to the frame of point data initially associated with the
target point in time, thereby accumulating temporally sequenced
frames of data to form a combined point cloud as if the data were
all acquired by the laser unit at the target point in time.
[0085] FIG. 12 illustrates examples of mobile platforms configured
in accordance with various embodiments of the presently disclosed
technology. As illustrated, a representative mobile platform as
disclosed herein may include at least one of an unmanned aerial
vehicle (UAV) 1202, a manned aircraft 1204, an autonomous car 1206,
a self-balancing vehicle 1208, a terrestrial robot 1210, a smart
wearable device 1212, a virtual reality (VR) head-mounted display
1214, or an augmented reality (AR) head-mounted display 1216.
[0086] FIG. 13 is a block diagram illustrating an example of the
architecture for a computer system or other control device 1300
that can be utilized to implement various portions of the presently
disclosed technology. In FIG. 13, the computer system 1300 includes
one or more processors 1305 and memory 1310 connected via an
interconnect 1325. The interconnect 1325 may represent any one or
more separate physical buses, point to point connections, or both,
connected by appropriate bridges, adapters, or controllers. The
interconnect 1325, therefore, may include, for example, a system
bus, a Peripheral Component Interconnect (PCI) bus, a
HyperTransport or industry standard architecture (ISA) bus, a small
computer system interface (SCSI) bus, a universal serial bus (USB),
IIC (I2C) bus, or an Institute of Electrical and Electronics
Engineers (IEEE) standard 674 bus, sometimes referred to as
"Firewire".
[0087] The processor(s) 1305 may include central processing units
(CPUs) to control the overall operation of, for example, the host
computer. In certain embodiments, the processor(s) 1305 accomplish
this by executing software or firmware stored in memory 1310. The
processor(s) 1305 may be, or may include, one or more programmable
general-purpose or special-purpose microprocessors, digital signal
processors (DSPs), programmable controllers, application specific
integrated circuits (ASICs), programmable logic devices (PLDs), or
the like, or a combination of such devices.
[0088] The memory 1310 can be or include the main memory of the
computer system. The memory 1310 represents any suitable form of
random access memory (RAM), read-only memory (ROM), flash memory,
or the like, or a combination of such devices. In use, the memory
1310 may contain, among other things, a set of machine instructions
which, when executed by processor 1305, causes the processor 1305
to perform operations to implement embodiments of the present
invention.
[0089] Also connected to the processor(s) 1305 through the
interconnect 1325 is a (optional) network adapter 1315. The network
adapter 1315 provides the computer system 1300 with the ability to
communicate with remote devices, such as the storage clients,
and/or other storage servers, and may be, for example, an Ethernet
adapter or Fiber Channel adapter.
[0090] The techniques introduced herein can be implemented by, for
example, programmable circuitry (e.g., one or more microprocessors)
programmed with software and/or firmware, or entirely in
special-purpose hardwired circuitry, or in a combination of such
forms. Special-purpose hardwired circuitry may be in the form of,
for example, one or more application-specific integrated circuits
(ASICs), programmable logic devices (PLDs), field-programmable gate
arrays (FPGAs), etc.
[0091] Software or firmware for use in implementing the techniques
introduced here may be stored on a machine-readable storage medium
and may be executed by one or more general-purpose or
special-purpose programmable microprocessors. A "machine-readable
storage medium," as the term is used herein, includes any mechanism
that can store information in a form accessible by a machine (a
machine may be, for example, a computer, network device, cellular
phone, personal digital assistant (FDA), manufacturing tool, any
device with one or more processors, etc.). For example, a
machine-accessible storage medium includes
recordable/non-recordable media (e.g., read-only memory (ROM);
random access memory (RAM); magnetic disk storage media; optical
storage media; flash memory devices; etc.), etc.
[0092] The term "logic," as used herein, can include, for example,
programmable circuitry programmed with specific software and/or
firmware, special-purpose hardwired circuitry, or a combination
thereof.
[0093] Some embodiments of the disclosure have other aspects,
elements, features, and steps in addition to or in place of what is
described above. These potential additions and replacements are
described throughout the rest of the specification. Reference in
this specification to "various embodiments," "certain embodiments,"
or "some embodiments" means that a particular feature, structure,
or characteristic described in connection with the embodiment is
included in at least one embodiment of the disclosure. These
embodiments, even alternative embodiments (e.g., referenced as
"other embodiments") are not mutually exclusive of other
embodiments. Moreover, various features are described which may be
exhibited by some embodiments and not by others. Similarly, various
requirements are described which may be requirements for some
embodiments but not other embodiments.
[0094] As discussed above, the disclosed technology can achieve
high precision calibration between laser sensors (e.g., low-cost
laser sensors with limited FOV) and vision sensors (e.g., monocular
cameras), which may use combined point clouds generated in
accordance with point data obtained at different times. While
advantages associated with certain embodiments of the technology
have been described in the context of those embodiments, other
embodiments may also exhibit such advantages, and not all
embodiments need necessarily exhibit such advantages to fall with
within the scope of the present technology. For example, the
disclosed technology can be applied to achieve calibration between
any two type of sensors with different data collection resolution
and/or rate. Accordingly, the present disclosure and associated
technology can encompass other embodiments not expressly shown or
described herein.
[0095] To the extent any materials incorporated herein conflict
with the present disclosure, the present disclosure controls.
* * * * *