U.S. patent application number 17/353769 was filed with the patent office on 2021-10-07 for imaging system and method.
The applicant listed for this patent is SZ DJI TECHNOLOGY CO., LTD.. Invention is credited to Zhiyuan WU, Honghui ZHANG, Zhenyu ZHU.
Application Number | 20210314543 17/353769 |
Document ID | / |
Family ID | 1000005669614 |
Filed Date | 2021-10-07 |
United States Patent
Application |
20210314543 |
Kind Code |
A1 |
ZHU; Zhenyu ; et
al. |
October 7, 2021 |
IMAGING SYSTEM AND METHOD
Abstract
A method of distance measuring includes obtaining a depth map
and a stereo pair of images of a scene of interest, and enhancing a
precision of the depth map based on disparity values of
corresponding points between the images. The images have a higher
resolution than the depth map. Enhancing the precision of the depth
map includes optimizing an energy function of the images over a
predetermined range of disparity values to obtain an optimized
energy function; determining the disparity values based on the
optimized energy function; and replacing low precision values of
the depth map with corresponding high precision values based on the
disparity values.
Inventors: |
ZHU; Zhenyu; (Shenzhen,
CN) ; ZHANG; Honghui; (Shenzhen, CN) ; WU;
Zhiyuan; (Shenzhen, CN) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
SZ DJI TECHNOLOGY CO., LTD. |
Shenzhen |
|
CN |
|
|
Family ID: |
1000005669614 |
Appl. No.: |
17/353769 |
Filed: |
June 21, 2021 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
16109211 |
Aug 22, 2018 |
11044452 |
|
|
17353769 |
|
|
|
|
PCT/CN2016/074520 |
Feb 25, 2016 |
|
|
|
16109211 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
H04N 2013/0081 20130101;
G06T 7/593 20170101; H04N 13/271 20180501; H04N 13/239 20180501;
H04N 13/243 20180501; G06T 2207/10012 20130101; G06T 2207/10028
20130101; G06T 2207/10032 20130101; H04N 13/128 20180501 |
International
Class: |
H04N 13/128 20060101
H04N013/128; G06T 7/593 20060101 G06T007/593; H04N 13/271 20060101
H04N013/271; H04N 13/239 20060101 H04N013/239; H04N 13/243 20060101
H04N013/243 |
Claims
1. A method of distance measuring, comprising: obtaining a depth
map and a stereo pair of images of a scene of interest, the images
having a higher resolution than the depth map; and enhancing a
precision of the depth map based on disparity values of
corresponding points between the images, including: optimizing an
energy function of the images over a predetermined range of
disparity values to obtain an optimized energy function;
determining the disparity values based on the optimized energy
function; and replacing low precision values of the depth map with
corresponding high precision values based on the disparity
values.
2. The method of claim 1, wherein optimizing the energy function
over the predetermined range of disparity values includes
optimizing the energy function over a range of disparity values
within a predetermined disparity threshold.
3. The method of claim 2, wherein the predetermined disparity
threshold corresponds to a predetermined threshold distance.
4. The method of claim 1, wherein optimizing the energy function
over the predetermined range of disparity values includes
determining a similarity component of the energy function over the
predetermined range of disparity values, the similarity component
reflecting correspondences between pixel intensities of the
images.
5. The method of claim 4, wherein determining the similarity
component includes determining a sum of absolute differences of a
pixel dissimilarity metric.
6. The method of claim 5, wherein determining the sum of the
absolute differences of the pixel dissimilarity metric includes
determining a sum of absolute differences of a Birchfield-Tomasi
pixel dissimilarity metric.
7. The method of claim 1, wherein optimizing the energy function
includes determining a smoothness component of the energy function
reflecting continuity of depth values within the depth map.
8. The method of claim 7, wherein the smoothness component is a
weighted sum of trigger functions.
9. The method of claim 8, wherein each of the trigger functions is
a function of a disparity difference between a disparity value
corresponding to a pixel within the depth map and a disparity value
corresponding to one of a plurality of neighboring pixels of the
pixel within the depth map.
10. The method of claim 9, wherein a first weight is applied to one
or more of the trigger functions of disparity differences that are
equal to a non-zero threshold, and a second weight is applied to
another one or more of the trigger functions of disparity
differences that are larger than the non-zero threshold.
11. The method of claim 1, wherein optimizing the energy function
includes optimizing the energy function using at least one of
dynamic programming or non-local optimization.
12. The method of claim 11, wherein optimizing the energy function
includes optimizing the energy function using recursive
filtering.
13. The method of claim 1, wherein replacing the low precision
values of the depth map with the corresponding high precision
values includes: replacing all low precision values of the depth
map with corresponding high precision values.
14. The method of claim 1, wherein replacing the low precision
values of the depth map with the corresponding high precision
values includes: replacing selected low precision values of the
depth map with corresponding high precision values based on the low
precision values being within a predetermined threshold
disparity.
15. The method of claim 1, wherein replacing the low precision
values of the depth map with the corresponding high precision
values includes: replacing selected low precision values of the
depth map with corresponding high precision values based on the low
precision values being within a disparity range that corresponds to
a predetermined threshold distance.
16. The method of claim 1, wherein: the stereo pair of images are a
first stereo pair of images of the scene of interest; and obtaining
the depth map includes obtaining the depth map from a second stereo
pair of images of the scene of interest, the second stereo pair of
images having a same resolution as the depth map.
17. The method of claim 16, wherein: the depth map is a first depth
map; and obtaining the first depth map includes obtaining the first
depth map from the second stereo pair of images and a second depth
map having a lower resolution than the second stereo pair of
images.
18. The method of claim 1, further comprising: rectifying the
stereo pair of images prior to enhancing the precision of the depth
map.
19. An imaging system, comprising: a pair of imaging devices
configured to obtain a stereo pair of images of a scene of
interest; and one or more processors configured to enhance a
precision of a depth map of the scene of interest based on
disparity values of corresponding points between the images, the
images having a higher resolution than the depth map, and enhancing
the precision of the depth map includes: optimizing an energy
function of the images over a predetermined range of disparity
values to obtain an optimized energy function; determining the
disparity values based on the optimized energy function; and
replacing low precision values of the depth map with corresponding
high precision values based on the disparity values.
20. A non-transitory computer readable storage medium, comprising:
instruction for obtaining a depth map and a stereo pair of images
of a scene of interest, the images having a higher resolution than
the depth map; and instruction for enhancing a precision of the
depth map based on disparity values of corresponding points between
the images, including: instruction for optimizing an energy
function of the images over a predetermined range of disparity
values to obtain an optimized energy function; instruction for
determining the disparity values based on the optimized energy
function; and instruction for replacing low precision values of the
depth map with corresponding high precision values based on the
disparity values.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. application Ser.
No. 16/109,211, filed on Aug. 22, 2018, which is a continuation of
International Application No. PCT/CN2016/074520, filed on Feb. 25,
2016, the entire contents of both of which are incorporated herein
by reference.
COPYRIGHT NOTICE
[0002] A portion of the disclosure of this patent document contains
material which is subject to copyright protection. The copyright
owner has no objection to the facsimile reproduction by anyone of
the patent document or the patent disclosure, as it appears in the
Patent and Trademark Office patent file or records, but otherwise
reserves all copyright rights whatsoever.
FIELD
[0003] The disclosed embodiments relate generally to digital
imaging and more particularly, but not exclusively, to systems and
methods for enhancing precision of depth perception in stereoscopic
imaging.
BACKGROUND
[0004] Stereoscopic imaging, a technique whereby multiple imaging
devices are used to form a three dimensional image through
stereopsis, is becoming increasingly common in many fields.
Stereoscopic imaging is particularly useful in robotics, where it
is often desirable to gather three-dimensional information about a
machine's environment. Stereoscopic imaging simulates the binocular
visions of human eyes, which apply the principle of stereopsis to
achieve depth perception. This technique can be reproduced by
artificial imaging devices by viewing a given object of interest
using multiple imaging devices from slightly different vantage
points. Differences between varying views of the object of interest
convey depth information about the object, thereby enabling
three-dimensional imaging of the object.
[0005] The ability of stereoscopic imaging to resolve depth is a
function of the resolution of images that are taken from different
vantage points and compared. Higher resolution images yields more
precise depth measurements. Obtaining greater precision of depth
perception is especially important in applications for viewing
distant objects, such as in outdoor imaging applications. However,
existing methods of determining depth by stereoscopic imaging scale
poorly as image resolution increases, and are ill-suited for such
imaging applications. Accordingly, there is a need for systems and
methods that more efficiently increase depth perception precision
in stereo imaging.
SUMMARY
[0006] In accordance with a first aspect disclosed herein, there is
set forth a method of distance measuring, comprising: obtaining a
depth map and a stereo pair of images of a scene of interest, the
images having a higher resolution than the depth map; and enhancing
a precision of the depth map based on disparity values of
corresponding points between the images.
[0007] In accordance with another aspect disclosed herein, there is
set forth an imaging system, comprising: a pair of imaging devices
configured to obtain a stereo pair of images of a scene of
interest; and one or more processors configured to enhance a
precision of a depth map of the scene of interest based on
disparity values of corresponding points between the images,
wherein the images have a higher resolution than the depth map.
[0008] In accordance with another aspect disclosed herein, there is
set forth an apparatus for imaging, comprising one or more
processors configured to: obtain a depth map of the scene of
interest; obtain a stereo pair of images of a scene of interest,
the images having a higher resolution than the depth map; and
enhance a resolution of a depth map based on disparity values of
corresponding points between the images.
[0009] In accordance with another aspect disclosed herein, there is
set forth a computer readable storage medium, comprising:
instruction for obtaining a depth map and a stereo pair of images
of a scene of interest, the images having a higher resolution than
the depth map; and instruction for enhancing a resolution of the
depth map based on disparity values of corresponding points between
the images.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] FIG. 1 is an exemplary top-level block diagram illustrating
an embodiment of a stereoscopic imaging system including a
plurality of imaging devices.
[0011] FIG. 2 is an exemplary diagram illustrating the stereoscopic
imaging system of FIG. 1 as used in determining an object distance
using triangulation.
[0012] FIG. 3 is an exemplary diagram illustrating the stereoscopic
imaging system of FIG. 1 as used in determining an object distance
using triangulation based on a disparity.
[0013] FIG. 4 is an exemplary diagram illustrating a method of
determining a depth map using a pair of corresponding images.
[0014] FIG. 5 is an exemplary diagram illustrating a method for
determining a depth map using a lower resolution depth map and a
pair of corresponding images.
[0015] FIG. 6 is an exemplary flow chart illustrating the method of
FIG. 5 for determining a depth map using a lower resolution depth
map and a pair of corresponding images.
[0016] FIG. 7 is an exemplary flow chart illustrating an embodiment
of the method of FIG. 6, wherein disparity values are determined by
optimizing an energy function.
[0017] FIG. 8 is an exemplary diagram illustrating another
embodiment of the method of FIG. 6, wherein pixels of a low
resolution depth map are replaced with high precision values.
[0018] FIG. 9 is an exemplary diagram illustrating another
embodiment of the method of FIG. 6, wherein precision of a depth
map is enhanced using interval sampling
[0019] FIG. 10 is an exemplary diagram illustrating an embodiment
of the stereoscopic imaging system of FIG. 1, as mounted aboard an
unmanned aerial vehicle.
[0020] FIG. 11 is an exemplary diagram illustrating experimental
results obtained using the method of FIG. 5 for determining a depth
map using a lower resolution depth map and a pair of corresponding
images.
[0021] It should be noted that the figures are not drawn to scale
and that elements of similar structures or functions are generally
represented by like reference numerals for illustrative purposes
throughout the figures. It also should be noted that the figures
are only intended to facilitate the description of the embodiments.
The figures do not illustrate every aspect of the described
embodiments and do not limit the scope of the present
disclosure.
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0022] The present disclosure sets forth systems and methods for
enhancing the precision of depth measurements obtained using
stereoscopic imaging, which overcome limitations of traditional
systems and methods. More particularly, prior systems and methods
for finding a disparity between corresponding points in two
separate images are inefficient, scaling with the cube of the
resolution of the images. For example, increasing image resolution
from 320.times.240 pixels to 640.times.480 can increase
computational costs by a factor of eight, even though the precision
of the resulting depth map is increased only by a factor of two.
The present systems and methods significantly enhance efficiency of
obtaining high precision depth information.
[0023] Turning now to FIG. 1, an exemplary imaging system 100 is
shown as including a plurality of imaging devices 110a, 110b. The
imaging devices 110a and 110b can each be configured to acquire
corresponding images 200a, 200b (shown in FIG. 3) of a scene of
interest 10. For purposes of stereoscopic depth perception, a
disparity d between the positions of an object of interest 15
within the scene of interest 10 can be found by comparison of the
images 200a, 200b. The disparity d can be used to find a distance Z
between the object of interest 15 and the imaging devices 110a,
110b.
[0024] The imaging system 100 can include any number of imaging
devices 110, as desired, though two imaging devices 110a and 110b
are shown for illustrative purposes only. For example, the imaging
system 100 can have 2, 3, 4, 5, 6, or even a greater number of
imaging devices 110. The imaging devices 110 can be arranged in any
desired manner in the imaging system 100. The specific arrangement
of imaging devices 110 can depend on the imaging application. In
some embodiments, for example, a pair of imaging devices 110 can be
positioned side-by-side as a left imaging device 110a and a right
imaging device 110b. In some embodiments, the imaging devices 110a
and 110b can be configured to have parallel optical axes (not shown
in FIG. 1).
[0025] Each imaging device 110 can perform the function of sensing
light and converting the sensed light into electronic signals that
can be ultimately rendered as an image. Exemplary imaging devices
110 suitable for use with the disclosed systems and methods,
include, but are not limited to, commercially-available cameras and
camcorders. Suitable imaging devices 110 can include analog imaging
devices (for example, video camera tubes) and/or digital imaging
devices (for example, charge-coupled device (CCD), complementary
metal-oxide-semiconductor (CMOS), N-type metal-oxide-semiconductor
(NMOS) imaging devices, and hybrids/variants thereof). Digital
imaging devices, for example, can include a two-dimensional grid or
array of photosensor elements (not shown) that can each capture one
pixel of image information. In some embodiments, each imaging
device 110 has a resolution of at least 0.01 Megapixels, 0.02
Megapixels, 0.05 Megapixels, 0.1 Megapixels, 0.5 Megapixels, 1
Megapixel, 2 Megapixels, 5 Megapixels, 10 Megapixels, 20
Megapixels, 50 Megapixels, 100 Megapixels, or an even greater
number of pixels. Exemplary image resolutions that can be used for
the present systems and methods include 320.times.240 pixels,
640.times.480 pixels, 800.times.600 pixels, 1024.times.786 pixels,
1280.times.960 pixels, 1536.times.1180 pixels, 2048.times.1536
pixels, 2560.times.1920 pixels, 3032.times.2008 pixels,
3072.times.2304 pixels, 3264.times.2448 pixels, and other image
resolutions.
[0026] Each imaging device 110 can also include a lens 105 for
focusing light onto the photosensor elements, such as a digital
single-lens reflex (DSLR) lens, pin-hole lens, biological lens,
simple convex glass lens, macro lens, zoom lens, telephoto lens,
fisheye lens, wide-angle lens, or the like.
[0027] Each imaging device 110 can also include apparatus (not
shown) that separates and/or filters the sensed light based on
color and directs the light onto the appropriate photosensor
elements. For example, the imaging device 110 can include a color
filter array that passes red, green, or blue light to selected
pixel sensors and forms an interlaced color mosaic grid in a Bayer
pattern. Alternatively, for example, each imaging device 110 can
include an array of layered pixel photosensor elements that
separates light of different wavelengths based on the properties of
the photosensor elements.
[0028] Each imaging device 110 can have specialty functions for use
in various applications such as thermography, creation of
multi-spectral images, infrared detection, gamma detection, x-ray
detection, and the like. Each imaging device 110 can include, for
example, electro-optical sensors, thermal/infrared sensors, color
or monochrome sensors, multi-spectral imaging sensors,
spectrophotometers, spectrometers, thermometers, and/or
illuminometers.
[0029] As shown in FIG. 1, the imaging devices 110 can interface
with one or more processors 120. Although a single processor 120 is
shown for illustrative purposes only, the imaging system 100 can
include any number of processors 120, as desired. Without
limitation, each processor 120 can include one or more general
purpose microprocessors (for example, single or multi-core
processors), application-specific integrated circuits (ASIC),
field-programmable gate arrays (FPGA), application-specific
instruction-set processors, digital signal processing units,
coprocessors, network processing units, audio processing units,
encryption processing units, and the like. In certain embodiments,
the processor 120 can include an image processing engine or media
processing unit, which can include specialized hardware for
enhancing the speed and efficiency of focusing, image capture,
filtering, Bayer transformations, demosaicing operations, noise
reduction operations, image sharpening operations, image softening
operations, and the like. The processors 120 can be configured to
perform any of the methods described herein, including but not
limited to a variety of operations relating to stereoscopic imaging
and/or depth precision enhancement. In some embodiments, the
processors 120 can include specialized software and/or hardware for
processing operations relating to stereoscopic imaging and/or depth
precision enhancement.
[0030] In some embodiments, the processor 120 is physically located
adjacent to the imaging devices 110, in which case data between the
processor 120 and the imaging devices 110 can be communicated
locally. An advantage of local communication is that transmission
delay can be reduced to facilitate real-time image processing, and
depth precision enhancement. In other embodiments, the processor
120 can be located remotely from the imaging devices 110. Remote
processing may be preferable, for example, because of weight
restrictions or other reasons relating to an operational
environment of the imaging system 100. As a non-limiting example,
if the imaging devices 110 are mounted aboard a mobile platform,
such as an unmanned aerial vehicle 50 (UAV) (shown in FIG. 10),
conveying imaging data to a remote terminal (not shown) for
centralized processing, such as a ground terminal or base station,
can be desirable. Various communication protocols can be used for
remote communication between the imaging devices 110 and the
processors 120, such as Suitable communication protocols include,
for example, radio, Wireless Fidelity (Wi-Fi), cellular, satellite,
broadcasting, and others.
[0031] As shown in FIG. 2, the imaging system 100 can include one
or more memories 130 (alternatively referred to herein as a
computer readable storage medium). Suitable memories 130 can
include, for example, random access memory (RAM), static RAM,
dynamic RAM, read-only memory (ROM), programmable ROM, erasable
programmable ROM, electrically erasable programmable ROM, flash
memory, secure digital (SD) card, and the like. Image data from the
imaging devices 110a, 110b can be transmitted to and stored within
the memory 130. The memory 130 can also be used to store a depth
map (for example, a depth map 300 shown in FIG. 4) both prior to
and after depth precision enhancement. Furthermore, instruction for
performing any of the methods described herein can be stored in the
memory 130. The memory 130 is in operative communication with the
processors 120, and instructions can be transmitted from the memory
130 to the processors 120 for execution.
[0032] Data from the processors 120 and/or the memories 130 can be
communicated with one or more input/output devices 140 (for
example, buttons, a keyboard, keypad, trackball, displays, and/or a
monitor). The input/output devices 140 can each have a suitable
interface to deliver content to a user 20. The input/output devices
140 can be used to provide a user interface for interacting with
the user 20 to obtain images and control a process for enhancing
depth precision. Various user interface elements (for example,
windows, buttons, menus, icons, pop-ups, tabs, controls, cursors,
insertion points, and the like) can be used to interface with the
user 20. The video synchronization system 100 can further include
one or more additional hardware components (not shown), as
desired.
[0033] Turning now to FIG. 2, a method of ascertaining an object
distance Z using stereoscopic imaging is illustrated therein with
reference to two imaging devices 110: a left imaging device 110a;
and a right imaging device 110b. Each of the imaging devices 110a
and 110b perceives the same object of interest 15, but in different
spatial coordinates as illustrated by the coordinate axes (x.sub.1,
y.sub.1, z.sub.1) and (x.sub.2, y.sub.2, z.sub.2). The imaging
devices 110a and 110b perceive the object of interest 15 along
respective optical axes 130a and 130b and thereby arrive at two
different two-dimensional images 200a and 200b of the same object
of interest 15. The two-dimensional images 200a and 200b are
typically different, being taken from different positions, unless
the imaging devices 110a and 110b are positioned such that their
optical axes 130a and 130b coincide. Accordingly, under most
circumstances, a disparity d can be found between the corresponding
positions of the object of interest 15 within the images 200a and
200b.
[0034] Turning now to FIG. 3 to further illustrate depth
measurement using stereoscopic imaging, a left image 200a and right
image 200b can be compared to ascertain an object distance Z
between a pair imaging devices 110a and 110b (or equivalently, the
imaging system 100) and the object of interest 15. A method of
triangulation can be used to ascertain the object distance Z using
a disparity d between the images 200a, 200b for each object of
interest 15 within a scene of interest 10. Specifically, the
position of a particular object of interest 15 having an index i,
represented by coordinates (X.sub.i, Y.sub.i, Z.sub.i), can be
given as follows:
X i = b d .times. ( x i l - c x ) , Equation .times. .times. ( 1 )
Y i = b d .times. ( y i l - c y ) , Equation .times. .times. ( 2 )
Z i = b d .times. f Equation .times. .times. ( 3 ) ##EQU00001##
where c.sub.x and c.sub.y represent respective center coordinates
of the imaging devices 110a and 110b, x.sub.i and y.sub.1 represent
the coordinates of the object 150 of interest in one or both of the
images 200a and 200b, b is the baseline (in other words, the
distance between the center coordinates of the imaging devices 110a
and 110b), f is the focal length of each imaging devices 110a and
110b (assuming here that the imaging devices have the same focal
length), i is an index over the objects of interest 15, and d.sub.i
is the disparity of the object of interest 15 between the images
200a and 200b, represented as:
d.sub.i=x.sup.l.sub.i-x.sup.t.sub.i. Equation (4)
[0035] Turning now to FIG. 4, a construction of a depth map from
left and right images acquired by binocular imaging is shown. The
top of FIG. 4 shows a low resolution left image 200a (for example,
320.times.240 pixels 210a) and a low resolution right image 200b
(for example, 320.times.240 pixels 210b). A corresponding low
resolution depth map 300, which shows object distance at each pixel
310 (darker pixels here correspond to closer objects), can be
constructed from the low resolution left and right images 200a,
200b. For example, a given pixel 210a of the low resolution left
image 200a can be used to search for a corresponding pixel 210b in
the low resolution right image 200b. Alternatively, and/or
additionally, a given pixel 210b of the low resolution right image
200b can (by symmetry) be used to search for a corresponding pixel
210a in the low resolution left image 200a. Corresponding pixels
within the low resolution images 200a, 200b can be located using,
for example, local block matching techniques and/or global
optimization techniques.
[0036] In some embodiments, the low resolution left and right
images 200a, 200b can be rectified prior to searching for pixel
correspondence, so as to improve search performance. For example,
the left and right images 200a, 200b can be rotated such that the
horizontal axes of the images are parallel to each other. The left
and right images 200a, 200b can be rectified prior to performing
depth measurement precision, as described herein.
[0037] After a corresponding pixel is located, a disparity d can be
found between corresponding pixels. The disparity d can be
represented as a number of pixels or as an absolute distance (where
the distance width of each pixel is known). As each pixel 210a,
210b produces a depth measurement for a corresponding pixel 310 in
the low resolution depth map 300, the x-y resolution of the low
resolution depth map 300 is dependent on the x-y resolution of the
image pair 200a, 200b. Similarly, the depth precision of the low
resolution depth map 300 is also dependent on the x-y resolution of
the image pair 200a, 200b, as the depth precision is determined by
the granularity of the disparity d. Thus, the low resolution images
200a, 200b can generate a corresponding low resolution depth map
300.
[0038] Depth precision of a depth map can be increased using higher
resolution binocular image pairs (for example, 640.times.480 pixels
rather than 320.times.240 pixels). As shown in FIG. 4, a high
resolution depth map 350 can be constructed from high resolution
left and right images 250a, 250b. For example, a given pixel 260a
of the high resolution left image 250a can be used to search for a
corresponding pixel 260b in the high resolution right image 250b.
Alternatively, and/or additionally, a given pixel 260b of the high
resolution right image 250b can (by symmetry) be used to search for
a corresponding pixel 260a in the high resolution left image 250a.
In some embodiments, the high resolution left and right images
250a, 250b can be rectified prior to searching for pixel
correspondence, so as to improve search performance.
[0039] Values of disparity d for each pair of corresponding pixels
can be used to produce a high resolution depth map 350, where each
pixel 360 of the high resolution depth map 350 conveys the
disparity d for a given location. In this example, the disparity
din the high resolution depth map 350 has twice the precision of
the low resolution depth map 300, since the disparity range for any
given object distance is represented by twice the number of pixels.
However, to achieve this two-fold increase in depth precision,
computational intensity increased by a factor of 8 because pixel
correspondence is searched in the x, y, and depth dimensions.
[0040] Turning now to FIG. 5, an improved construction of a depth
map from high resolution left and right images 250a, 250b acquired
by binocular imaging is shown. The improved depth map construction
uses as an input a low resolution depth map 300 that has a lower
resolution relative to the high resolution images 250a, 250b. In
some embodiments, the resolution of the high resolution images
250a, 250b can be an integer multiple of the resolution of the low
resolution depth map 300. For example, the resolution of the high
resolution images 250a, 250b can be 640.times.480 pixels, while the
resolution of the low resolution depth map 300 can be 320.times.240
pixels. In some embodiments, the high resolution images 250a, 250b
can have the same aspect ratio (for example, three by two or four
by three) as the low resolution depth map 300. In other
embodiments, the high resolution images 250a, 250b can have a
different aspect ratio as the low resolution depth map 300.
[0041] The low resolution depth map 300 can be obtained using any
means. For example, the low resolution depth map 300 can be
acquired from low resolution images 200a, 200b (shown in FIG. 4)
having the same low resolution as the low resolution depth map 300.
For example, a 320.times.240 pixel low resolution depth map 300 can
be acquired from 320.times.240 pixel resolution images through
stereopsis, as discussed above with reference to FIG. 4. In some
embodiments, the low resolution images 200a, 200b can be acquired
using the same imaging devices 110 that are used to acquire the
high resolution images 250a, 250b. For example, a 320.times.240
pixel resolution images can be acquired using an imaging device 110
having a 640.times.480 pixel or higher resolution. An initial
640.times.480 pixel image acquired by the imaging device 110 can be
scaled down in resolution to a 320.times.240 pixel resolution using
suitable images processing techniques (for example, averaging over
pixels). In other embodiments, the low resolution images 200a, 200b
and the high resolution images 250a, 250b can be acquired using
different imaging devices 110. For example, one or more
320.times.240 resolution imaging devices 110 can be used to acquire
320.times.240 pixel resolution images, while separate 640.times.480
pixel resolution imaging devices 110 can be used to acquire
640.times.480 pixel resolution images.
[0042] In some embodiments, the low resolution depth map 300 can be
obtained using the present systems and methods by a "bootstrapping"
process using, as input, a depth map having a still lower
resolution, as well as a pair of low resolution images 200a, 200b
having the same resolution as the low resolution depth map 300a.
For example, a 320.times.240 pixel depth map can be constructed
from a 160.times.120 pixel depth map, as well as a pair of images
200a, 200b having a 320.times.240 pixel resolution. The
bootstrapping process can continue for multiple iterations. For
example, a 160.times.120 pixel resolution depth map can be
constructed from a 80.times.60 pixel depth map, as well as a pair
of images having a 160.times.120 pixel resolution, and so forth. In
some embodiments, a pair of images can be used as input for
multiple levels of this bootstrapping process. For example, a given
pair of 640.times.480 pixel resolution images 250a, 250b can be
processed to reduce resolution to 320.times.240 pixels as input for
one level of the bootstrapping process, reduced to a resolution of
160.times.120 pixels for a subsequent level of the process, and so
forth. This bootstrapping process advantageous enables efficient
scaling for obtaining more precise depth measurements during
stereoscopic imaging.
[0043] Accordingly, turning now to FIG. 6, a method 1000 is shown
for efficiently enhancing stereoscopic depth measurement precision
using the above-described techniques. At 1100, a low resolution
depth map 300 and a stereo pair of high resolution images 250a,
250b of a scene of interest 10 are obtained, the high resolution
images 250a, 250b having a higher resolution than the low
resolution depth map 300. The high resolution images 250a, 250b can
be obtained, for example, using a pair of imaging devices 110a,
110b, as discussed above in reference to FIGS. 1-3. The depth map
300, which has a lower resolution than the images 200a, 200b, can
be obtained using any suitable means, as described above with
reference to FIG. 5. At 1200, the precision of the depth map 300 is
enhanced based on disparity values d of corresponding points 210a,
210b between the images 200a, 200b.
[0044] Corresponding pixels 260a, 260b between the images 250a,
250b can be identified and/or acquired using any suitable method,
such as machine vision and/or artificial intelligence methods, and
the like. Suitable methods include feature detection, extraction
and/or matching techniques such as RANSAC (RANdom SAmple
Consensus), Shi & Tomasi corner detection, SURF blob (Speeded
Up Robust Features) detection, MSER blob (Maximally Stable Extremal
Regions) detection, SURF (Speeded Up Robust Features) descriptors,
SIFT (Scale-Invariant Feature Transform), FREAK (Fast REtinA
Keypoint) descriptors, BRISK (Binary Robust Invariant Scalable
Keypoints) descriptors, HOG (Histogram of Oriented Gradients)
descriptors, and the like. Size and shape filtered can be applied
to identify corresponding pixels 260a, 260b between the images
250a, 250b, as desired.
[0045] Turning now to FIG. 7, step 1200 is shown in more detail for
enhancing the precision of a depth map 300 based on disparity
values d. At 1210, the disparity values d can be determined by
optimizing an energy function E(d) (also known as a cost function
or objective function) of the images 200a, 200b. An exemplary
energy function is shown in Equation (5):
E(d)=E.sub.d(d)+pE.sub.s(d) Equation (5)
[0046] wherein E.sub.d(d) is a similarity component reflecting
correspondences between pixel intensities of the images 200a, 200b,
E.sub.s(d) is a smoothness component reflecting continuity of depth
transitions between elements of the depth map 300, and p is a
weighing term. The energy function E(d) is a function of the
disparity values d of the depth map 300, such that optimizing the
energy function E(d) can yield disparity values d that best reflect
actual distances of objects imaged. In some embodiments, the
similarity component E.sub.d(d) can include a sum of absolute
differences (SAD) of a pixel dissimilarity metric, such as a
Birchfield-Tomasi (BT) pixel dissimilarity metric. An exemplary
similarity component E.sub.d(d) that includes a sum of absolute
differences of a Birchfield-Tomasi pixel dissimilarity metric
E.sub.d.sub.BT-SAD is shown in Equations (6)-(10) below:
E.sub.d(d)=.SIGMA..sub.d.sub.BT-SAD(x, y, d(x,y)=d) Equation
(6)
E.sub.d.sub.BT-SAD(x, y, d(x,y)=d)=.SIGMA.min{C.sub.1, C.sub.2}
Equation (7)
E.sub.d.sub.BT(x, y, d(x,y)=d)=min{C.sub.1, C.sub.2} Equation
(6)
C 1 = min x - d - 0 . 5 .ltoreq. x ' .ltoreq. x - d + 0 . 5 .times.
I L .function. ( x ) - I R .function. ( x ' ) Equation .times.
.times. ( 9 ) C 2 = min x - 0 . 5 .ltoreq. x ' .ltoreq. x + 0 . 5
.times. I L .function. ( x ' ) - I R .function. ( x - d ) Equation
.times. .times. ( 10 ) ##EQU00002##
wherein x and y are pixel coordinates, d is the disparity, I.sub.L
is an array of image pixel intensities of a left image 200a, and
I.sub.R is an array of image pixel intensities of a right image
200b. Although a Birchfield-Tomasi pixel dissimilarity metric is
shown herein for illustrative purposes only, any suitable pixel
dissimilarity metric can be used for the present systems and
methods.
[0047] In some embodiments, the smoothness component E.sub.s(d) can
be based on a sum of trigger functions. An exemplary smoothness
component E.sub.s(d) that is based on a sum of trigger functions is
shown in Equation (11) below:
E.sub.smoothness(d)=.SIGMA.p.sub.1T(|d(x, y)-d(x',
y')|==1)+p.sub.2T(|d(x, y)-d(x', y')|>1) Equation (11)
wherein T is a trigger function, p.sub.1 and p.sub.2 are weighing
terms, and the sum is taken over neighboring pixels (for example,
four neighboring pixels) of a pixel at pixel coordinates (x, y).
Although a smoothness component E.sub.s(d) based on a sum of
trigger functions is shown herein for illustrative purposes only,
any suitable smoothness component E.sub.s(d) can be used for the
present systems and methods.
[0048] At 1220, low precision values of the depth map 300 can be
replaced with corresponding high precision depth values based on
the disparity values d determined at 1210. In some embodiments, all
low precision values of the depth map 300 can be replaced with
corresponding high precision depth values. In some embodiments,
some, but not all low precision values of the depth map 300 can be
replaced with corresponding high precision depth values. In some
embodiments, selected low precision values of the low precision
depth map 300 can be replaced with corresponding high precision
values based on the low precision values being within a
predetermined threshold disparity d.sub.T. In some embodiments, the
threshold disparity d.sub.T can correspond to a predetermined
threshold distance D.sub.T.
[0049] The method of replacing low precision values with high
precision values in a depth map 300 is illustrated with reference
to FIG. 8. An exemplary depth map 300 is shown on the left side of
FIG. 8 as including pixels 310 having low depth precision. In
particular, the low precision pixels 310 of the depth map 300
include low precision distant pixels 310a (light) and low precision
nearby pixels 310b (dark). In one embodiment, shown in the upper
right portion of FIG. 8, all of the low precision pixels 310 of the
depth map 300 are replaced with high precision pixels 330,
regardless of whether the pixel is a distant pixel 310a or a nearby
pixel 310b. Accordingly, the low precision distant pixels 310a are
resolved into high precision pixels 330a, 330b, and the low
precision nearby pixels 310b are resolved into high precision
pixels 330c, 330d.
[0050] In another embodiment, shown in the lower right portion of
FIG. 8, only select low precision pixels 310 of the depth map 300
are replaced with high precision pixels 330. Low precision pixels
310 can be selectively replaced based on whether the pixel is a
distant pixel 310a or a nearby pixel 310b. In particular, low
resolution distant pixels 310a can advantageously be selectively
replaced with high resolution pixels 330, thereby increasing depth
precision for distant objects of interest while avoiding costs of
precision enhancement for nearby objects. Accordingly, the low
precision distant pixels 310a are resolved into high precision
pixels 330a, 330b, while the intensities of low precision nearby
pixels 310b are unaffected. A predetermined threshold disparity
d.sub.T can be used as a cutoff value for selecting which low
precision pixels 310 to replace. The threshold disparity d.sub.T
can take any suitable value, depending on the application, the
desired level of depth precision, and the imaging resolution.
Exemplary threshold disparity dr values include 1 pixel, 2 pixels,
4 pixels, 6 pixels, 8 pixels, 10 pixels, 12 pixels, 20 pixels, 40
pixels, 60 pixels, 80 pixels, 100 pixels, and an even greater
number of pixels.
[0051] In some embodiments, the efficiency of enhancing depth
precision can be improved by optimizing the energy function E(d)
over a predetermined range of disparity values d (rather than, for
example, optimizing over all possible disparity values d). In some
embodiments, the energy function can be optimized over a range of
disparity values that are within a predetermined disparity
threshold d.sub.T. The predetermined disparity threshold d.sub.T
can correspond, for example, to a predetermined threshold distance
D.sub.T. For example, to resolve distance measurements for distant
objects, a disparity threshold d.sub.T of 8 pixels can be preset
that corresponds to objects of, for example, 100 meters or greater
from the imaging device. Accordingly, only disparities between 0
pixels to 7 pixels are sampled when optimizing the data term E(x,
y, d) with respect to the disparity d. Optimization of the energy
function E(d) over a predetermined range of disparity values d can
advantageously reduce computational costs.
[0052] In some embodiments, the efficiency of enhancing depth
precision can be improved by optimizing the energy function E(d)
using an interval sampling technique, as illustrated in FIG. 9. The
interval sampling of an image 200 can be based on a resolution of a
depth map 300. The top of FIG. 9 shows an exemplary low precision
depth map 300 (for example, a 320.times.240 pixel depth map). The
bottom of FIG. 9 shows three exemplary ways of interval sampling a
high resolution image 200 (for example, a 640.times.480 pixel
image) based on the lower resolution of the depth map 300. In some
embodiments, the high resolution image 200 can be sampled
horizontally. Here, horizontal sampling of the exemplary
640.times.480 pixel image based on the 320.times.240 pixel depth
map yields sampling every other row of pixels 210. In some
embodiments, the high resolution image 200 can be sampled
vertically. Here, vertical sampling of the exemplary 640.times.480
pixel image based on the 320.times.240 pixel depth map yields
sampling every other column of pixels 210. In some embodiments, the
high resolution image 200 can be sampled both horizontally and
vertically. Here, horizontal and vertical sampling of the exemplary
640.times.480 pixel image based on the 320.times.240 pixel depth
map yields sampling the of pixels 210 in a grid-like pattern, as
shown. More generally, for a low resolution depth map of dimensions
(w.sub.1, h.sub.1) and a high resolution image of dimensions
(w.sub.2, h.sub.2), interval sampling of a data term E(x, y, d) can
be represented as:
E .function. ( x , y , d ) , x .di-elect cons. { x i mod .function.
( x i , w .times. 2 w .times. 1 ) = 0 , .times. mod .function. ( y
i , h .times. 2 h .times. 1 ) = 0 } Equation .times. .times. ( 12 )
##EQU00003##
[0053] The energy function E(d) can be optimized using any suitable
technique. In some embodiments, the energy function E(d) can be
optimized using dynamic programming. An exemplary dynamic
programing technique is based on the recurrence relation below:
L .function. ( x , y , d ) = E S .function. ( x , y , d ) + min
.times. { L .function. ( x - 1 , y , d ) , L .function. ( x - 1 , y
, d - 1 ) + p 1 , L .function. ( x - 1 , y , d + 1 ) + p 1 , min d
' .times. L ( p - 1 , .times. y , d ' } + p 2 } - min d ' .times. L
.function. ( x - 1 , y , d ' ) Equation .times. .times. ( 13 )
##EQU00004##
[0054] wherein optimal values of the disparity d* can be given
by:
d'=argmin.sub.d.SIGMA.L(x, y, d) Equation (14)
[0055] In some embodiments, the energy function E(d) can be further
optimized using non-local optimization. An exemplary non-local
optimization is recursive filtering. In some embodiments, non-local
optimization of the energy function E(d) can be performed according
to Equation (15) as follows:
E(d)=.SIGMA.=|d(x, y)-d'(x, y)|.sup.2+.SIGMA.exp(|I.sub.L(x,
y)-I.sub.L(x', y')|+|x'-x|+|y'-y|)|d(x, y)-d(x' y')| Equation
(15)
[0056] Depth precision enhancement according to the present systems
and methods can be used for images taken by mobile platforms. In
some embodiments, the mobile platform is an unmanned aerial vehicle
(UAV) 50, as shown in FIG. 10, showing imaging devices 110a, 110b
mounted aboard the UAV 50 for imaging a scene of interest 10. UAVs
50, colloquially referred to as "drones," are aircraft without a
human pilot onboard the vehicle whose flight is controlled
autonomously or by a remote pilot (or sometimes both). UAVs 50 are
now finding increased usage in civilian applications involving
various aerial operations, such as data-gathering or delivery. The
present depth precision enhancement systems and methods are
suitable for use with many types of UAVs 50 including, without
limitation, quadcopters (also referred to a quadrotor helicopters
or quad rotors), single rotor, dual rotor, trirotor, hexarotor, and
octorotor rotorcraft UAVs, fixed wing UAVs, and hybrid
rotorcraft-fixed wing UAVs. Other suitable mobile platforms for use
with the present video synchronization systems and methods include,
but are not limited to, bicycles, automobiles, trucks, ships,
boats, trains, helicopters, aircraft, various hybrids thereof, and
the like.
Example 1
[0057] Turning now to FIG. 11, an example of depth precision
enhancement using the present systems and method is shown. Left and
right high resolution images 200a, 200b having 640.times.480 pixel
resolutions and a low precision depth map 300 having a
320.times.240 pixel resolution are used as inputs. A high precision
depth map 320 is shown as the output, which has visibly greater
depth resolution than the input depth map 300. In this example, the
present depth enhancement technique improved performance over prior
techniques by 25%.
[0058] The disclosed embodiments are susceptible to various
modifications and alternative forms, and specific examples thereof
have been shown by way of example in the drawings and are herein
described in detail. It should be understood, however, that the
disclosed embodiments are not to be limited to the particular forms
or methods disclosed, but to the contrary, the disclosed
embodiments are to cover all modifications, equivalents, and
alternatives.
* * * * *