U.S. patent application number 15/569259 was filed with the patent office on 2018-05-03 for image processing apparatus, image processing method, and program.
The applicant listed for this patent is SONY CORPORATION. Invention is credited to YASUTAKA HIRASAWA, AKIHIKO KAINO, YUHI KONDO, YING LU, AYAKA NAKATANI.
Application Number | 20180122086 15/569259 |
Document ID | / |
Family ID | 57248184 |
Filed Date | 2018-05-03 |
United States Patent
Application |
20180122086 |
Kind Code |
A1 |
LU; YING ; et al. |
May 3, 2018 |
IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND
PROGRAM
Abstract
A normal-line information generation block 30 generates
normal-line information for a frame subject to detection. A data
storage block 50 stores normal-line information and the like of a
key frame. A motion amount detection block 40 detects a motion
amount of an imaging position of the frame subject to detection
relative to an imaging position of the key frame on the basis of
the normal-line information of the key frame stored in the data
storage block 50 and the normal-line information of the frame
subject to detection generated by the normal-line information
generation block 30. Even if a positional difference of a same
point is small or a luminance difference occurs between a taken
image of the key frame and a taken image of a current frame, a
motion amount of the imaging position of the frame subject to
detection relative to the imaging position of the key frame can be
accurately detected on the basis of normal-line information.
Therefore, an observation position can be accurately detected.
Inventors: |
LU; YING; (TOKYO, JP)
; HIRASAWA; YASUTAKA; (TOKYO, JP) ; KONDO;
YUHI; (KANAGAWA, JP) ; NAKATANI; AYAKA;
(KANAGAWA, JP) ; KAINO; AKIHIKO; (KANAGAWA,
JP) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
SONY CORPORATION |
TOKYO |
|
JP |
|
|
Family ID: |
57248184 |
Appl. No.: |
15/569259 |
Filed: |
March 1, 2016 |
PCT Filed: |
March 1, 2016 |
PCT NO: |
PCT/JP2016/056193 |
371 Date: |
October 25, 2017 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06T 2207/10028
20130101; G06T 2207/30244 20130101; G06T 7/73 20170101; G06T 7/20
20130101; G06T 7/246 20170101; G06T 7/579 20170101 |
International
Class: |
G06T 7/246 20060101
G06T007/246; G06T 7/73 20060101 G06T007/73 |
Foreign Application Data
Date |
Code |
Application Number |
May 14, 2015 |
JP |
2015-099021 |
Claims
1. An image processing apparatus comprising: a normal-line
information generation block configured to generate normal-line
information of a scene at an observation position; and a
self-localization estimation block configured to estimate the
observation position on the basis of the normal-line information
generated by the normal-line information generation block.
2. The image processing apparatus according to claim 1, wherein the
normal-line information generation block generates normal-line
information of a scene at a reference position different from the
observation position, and the self-localization estimation block
estimates the observation position on the basis of the normal-line
information of the scene at the observation position and the
normal-line information of the scene at the reference position
generated by the normal-line information generation block.
3. The image processing apparatus according to claim 2, wherein the
self-localization estimation block has a motion amount detection
block configured to detect an amount of motion from the reference
position to the observation position, and an absolute position
estimation block configured to estimate the observation position in
a world coordinate system on the basis of the motion amount and the
reference position in the world coordinate system.
4. The image processing apparatus according to claim 3, wherein the
normal-line information generation block generates normal-line
information of a scene corresponding to a frame subject to
detection imaged at the observation position and normal-line
information of a scene corresponding to a key frame imaged at a
reference position different from the observation position, and the
self-localization estimation block estimates the observation
position on the basis of the normal-line information of the scene
corresponding to the frame subject to detection and the normal-line
information of the scene corresponding to the key frame generated
by the normal-line information generation block.
5. The image processing apparatus according to claim 4, wherein the
motion amount detection block has a feature point detection block
configured to detect a feature point from an image of the frame
subject to detection, a feature point matching block configured to
execute matching processing between the feature point detected by
the feature point detection block and a feature point detected from
an image of the key frame so as to detect a pair of feature points
corresponding to each other between the key frame and the frame
subject to detection, a rotational amount detection block
configured to detect a rotational amount of an imaging position of
the frame subject to detection relative to an imaging position of
the key frame on the basis of normal-line information of each of
the feature points of the key frame and the frame subject to
detection that are detected by the feature point matching block,
and a travel amount detection block configured to detect a travel
amount of the imaging position of the frame subject to detection
relative to the imaging position of the key frame on the basis of
the rotational amount detected by the rotational amount detection
block, the feature point detected by the feature point matching
block, and a three-dimensional position of the feature point of the
key frame or a two-dimensional position on an image.
6. The image processing apparatus according to claim 4, wherein the
motion amount detection block has a feature point detection block
configured to detect a feature point from an image of the frame
subject to detection, a feature point matching block configured to
execute matching processing between the feature point detected by
the feature point detection block and a feature point detected from
an image of the key frame so as to detect a pair of feature points
corresponding to each other between the key frame and the frame
subject to detection, and a rotational/travel amount detection
block configured to detect a rotational and/or travel motion amount
of an imaging position of the frame subject to detection relative
to an imaging position of the key frame on the basis of normal-line
information of each of the feature points of the key frame and the
frame subject to detection that are detected by the feature point
matching block and a three-dimensional position of the detected
feature point of the key frame or a two-dimensional position on an
image.
7. The image processing apparatus according to claim 4, wherein the
motion amount detection block has a rotational/travel amount
detection block configured to detect a rotational and/or travel
motion amount of an imaging position of the frame subject to
detection relative to an imaging position of the key frame on the
basis of an image and normal-line information of the frame subject
to detection and an image, normal-line information and a depth of
the key frame.
8. The image processing apparatus according to claim 4, wherein the
motion amount detection block has a feature point detection block
configured to detect a feature point from a normal-line image based
on normal-line information of the frame subject to detection
generated by the normal-line information generation block, a
feature point matching block configured to execute matching
processing between the feature point detected by the feature point
detection block and a feature point detected from an image of the
key frame so as to detect a pair of feature points corresponding to
each other between the key frame and the frame subject to
detection, and a motion amount detection processing block
configured to detect a motion amount of an imaging position of the
frame subject to detection relative to an imaging position of the
key frame on the basis of normal-line information of each of the
feature points of the key frame and the frame subject to detection
that are detected by the feature point matching block.
9. The image processing apparatus according to claim 1, wherein the
normal-line information generation block generates the normal-line
information by use of a plurality of polarization images having
different polarization directions of the scene at the observation
position.
10. The image processing apparatus according to claim 9, wherein
the normal-line information of the scene at the observation
position generated by the normal-line information generation block
has indefiniteness, and the motion amount detection block resolves
the indefiniteness of the normal-line information of the scene at
the observation position so as to detect the motion amount by use
of the normal-line information resolved with the
indefiniteness.
11. The image processing apparatus according to claim 9, further
comprising: a polarization image acquisition block configured to
acquire the plurality of polarization images having different
polarization directions of the scene at the observation
position.
12. The image processing apparatus according to claim 2, further
comprising: a data storage block configured to store data including
at least the normal-line information of the scene at the reference
position, wherein the self-localization estimation block estimates
the observation position by use of the data of the scene at the
reference position stored in the data storage block.
13. The image processing apparatus according to claim 1, further
comprising: an environment mapping block configured to compute a
depth of the scene at the observation position on the basis of the
observation position estimated by the self-localization estimation
block so as to add a three-dimensional point group based on the
computed depth and the observation position to an environment
map.
14. The image processing apparatus according to claim 13, wherein
the environment map includes a three-dimensional position and
normal-line information of the three-dimensional point group.
15. An image processing method comprising: generating, by a
normal-line information generation block, normal-line information
of a scene at an observation position; and estimating, by a
self-localization estimation block, the observation position on the
basis of the normal-line information generated by the normal-line
information generation block.
16. A program for making a computer execute: a normal-line
information generation procedure for generating normal-line
information of a scene at an observation position; and a
self-localization estimation procedure for estimating the
observation position on the basis of the normal-line information
generated by the normal-line information generation procedure.
Description
TECHNICAL FIELD
[0001] The present technology relates to an image processing
apparatus, an image processing method, and a program and is
intended to detect observation positions accurately.
BACKGROUND ART
[0002] Simultaneous localization and mapping (SLAM) capable of
simultaneously executing self-localization estimate and environment
mapping has been conventionally used in the areas of robotics and
unattended operations. With SLAM, self-localization estimate and
environment mapping are executed on the basis of feature points in
NPL 1 below, for example. Further, in NPL 2, self-localization
estimate and environment mapping are executed without use of
feature points.
CITATION LIST
Non Patent Literature
[NPL 1]
[0003] Georg Klein and David Murray, "Parallel Tracking and Mapping
for Small AR Workspaces," In Proc. International Symposium on Mixed
and Augmented Reality (ISMAR'07, Nara)
[NPL 2]
[0003] [0004] Richard A. Newcombe, Steven J. Lovegrove and Andrew
J. Davison, "DTAM: Dense Tracking and Mapping in Real-Time," IEEE
International Conference on Computer Vision (ICCV 2011) Pages
2320-2327
SUMMARY
Technical Problems
[0005] Meanwhile, with the related-art SLAM, by use of an imaging
apparatus having a general image sensor and on the basis of
information about the luminance of a taken image and a pixel
position therein, a rotational amount and a travel amount of the
imaging apparatus are simultaneously estimated so as to create an
environment map. Therefore, if a positional difference at a same
point between a taken image of a key frame and a taken image of a
current frame is small, for example, it is difficult to determine
the motion amount of an imaging position (an observation position),
namely, a rotational amount and a travel amount of the imaging
apparatus. In addition, since the related-art SLAM presupposes that
the luminance of a same point when viewed from different viewpoints
remain unchanged, if a luminance difference occurs at a same point
between a taken image of the key frame and a taken image of the
current frame, an error is caused in the detection of a motion
amount.
[0006] Therefore, it is an object of the present technology to
provide an image processing apparatus, an image processing method,
and a program that are capable of detecting observation positions
with enhanced accuracy.
Solution to Problems
[0007] In carrying out the present technology and according to a
first aspect thereof, there is provided an image processing
apparatus. This image processing apparatus has a normal-line
information generation block configured to generate normal-line
information of a scene at an observation position and a
self-localization estimation block configured to estimate the
observation position on the basis of the normal-line information
generated by the normal-line information generation block.
[0008] In this technology, two or more polarization images having
different polarization directions in a frame subject to detection,
for example, are acquired at an imaging position (an observation
position) by a polarization image acquisition block and normal-line
information is generated by the normal-line information generation
block on the basis of the acquired two or more polarization images.
In a motion amount detection block, a motion amount of an imaging
position of the frame subject to detection relative to an imaging
position of the key frame is detected on the basis of normal-line
information of the key frame imaged at an imaging position (a
reference position) different from the observation position stored
in a data storage block or the like and the normal-line information
of the frame subject to detection generated by the normal-line
information generation block. The motion amount detection block
resolves indefiniteness if the normal-line information generated by
the normal-line information generation block has any and detects a
motion amount by use of the normal-line information resolved with
the indefiniteness.
[0009] For example, the motion amount detection block has a feature
point detection block configured to detect a feature point from an
image of the frame subject to detection, a feature point matching
block configured to execute matching processing between the feature
point detected by the feature point detection block and a feature
point detected from an image of the key frame so as to detect a
pair of feature points corresponding to each other between the key
frame and the frame subject to detection, a rotational amount
detection block configured to detect a rotational amount of an
imaging position of the frame subject to detection relative to an
imaging position of the key frame on the basis of normal-line
information of each of the feature points of the key frame and the
frame subject to detection that are detected by the feature point
matching block, and a travel amount detection block configured to
detect a travel amount of the imaging position of the frame subject
to detection relative to the imaging position of the key frame on
the basis of the rotational amount detected by the rotational
amount detection block, the feature point detected by the feature
point matching block, and a three-dimensional position of the
feature point of the key frame or a two-dimensional position on an
image.
[0010] Alternatively, the motion amount detection block has a
feature point detection block configured to detect a feature point
from an image of the frame subject to detection, a feature point
matching block configured to execute matching processing between
the feature point detected by the feature point detection block and
a feature point detected from an image of the key frame so as to
detect a pair of feature points corresponding to each other between
the key frame and the frame subject to detection, and a
rotational/travel amount detection block configured to detect a
rotational and/or travel motion amount of an imaging position of
the frame subject to detection relative to an imaging position of
the key frame on the basis of normal-line information of each of
the feature points of the key frame and the frame subject to
detection that are detected by the feature point matching block and
a three-dimensional position of the detected feature point of the
key frame or a two-dimensional position on an image.
[0011] Alternatively, the motion amount detection block has a
rotational/travel amount detection block configured to detect a
rotational and/or travel motion amount of an imaging position of
the frame subject to detection relative to an imaging position of
the key frame on the basis of an image and normal-line information
of the frame subject to detection and an image, normal-line
information and a depth of the key frame.
[0012] Alternatively, the motion amount detection block has a
feature point detection block configured to detect a feature point
from a normal-line image based on normal-line information of the
frame subject to detection generated by the normal-line information
generation block, a feature point matching block configured to
execute matching processing between the feature point detected by
the feature point detection block and a feature point detected from
an image of the key frame so as to detect a pair of feature points
corresponding to each other between the key frame and the frame
subject to detection, and a motion amount detection processing
block configured to compute a motion amount of an imaging position
of the frame subject to detection relative to an imaging position
of the key frame on the basis of normal-line information of each of
the feature points of the key frame and the frame subject to
detection that are detected by the feature point matching
block.
[0013] Further, the image processing apparatus further has an
environment mapping block configured to compute a depth of the
frame subject to detection on the basis of the motion amount
detection result from the motion amount detection block and the
depth of the key frame so as to add a three-dimensional point group
based on the computed depth and the motion amount detection result
to an environment map including a three-dimensional position and
normal-line information of a three-dimensional point group, and a
data storage block configured to store data of the key frame and
the environment map.
[0014] In carrying out the present technology and according to a
second aspect thereof, there is provided an image processing
method. This image processing method includes generating, by a
normal-line information generation block, normal-line information
of a scene at an observation position, and estimating, by a
self-localization estimation block, the observation position on the
basis of the normal-line information generated by the normal-line
information generation block.
[0015] In carrying out the present technology and according to a
third aspect thereof, there is provided a program for making a
computer execute a normal-line information generation procedure for
generating normal-line information of a scene at an observation
position, and a self-localization estimation procedure for
estimating the observation position on the basis of the normal-line
information generated by the normal-line information generation
procedure.
[0016] It should be noted that the program according to the present
technology is a program that can be provided to, for example, a
general-purpose computer capable of executing various program codes
through storage media or communication media for providing the
program in a computer-readable format, namely, storage media such
as an optical disc, a magnetic disc, or a semiconductor memory or
communication media such as a network, for example. Providing such
a program in a computer-readable format realizes the processing in
accordance with the program on the computer.
Advantageous Effects of Invention
[0017] According to the present technology, normal-line information
of a scene at an observation position is generated and, on the
basis of the generated normal-line information, the observation
position is estimated. For example, on the basis of normal-line
information of a scene corresponding to a frame subject to
detection imaged at the observation position and normal-line
information of a scene corresponding to a key frame imaged at a
reference position that differs from the observation position, a
motion amount of an imaging position of the frame subject to
detection relative to an imaging position of the key frame is
accurately detected so as to estimate the observation position.
That is, the observation position can be accurately detected. It
should be noted that the effects described herein are illustrative
only and therefore not limited thereto. There may be additional
effects other than those described herein.
BRIEF DESCRIPTION OF DRAWINGS
[0018] FIG. 1 is a diagram indicative of a configuration of an
image processing apparatus.
[0019] FIG. 2 is a diagram indicative of a configuration of a first
embodiment.
[0020] FIG. 3 depicts diagrams describing a polarization image
acquired by a polarization image acquisition block.
[0021] FIG. 4 is a diagram describing the shape of an object and a
polarization image thereof.
[0022] FIG. 5 is a diagram illustrating a relation between
luminance and a polarization angle.
[0023] FIG. 6 is a diagram indicative of a relation between the
degree of polarization and a zenith angle.
[0024] FIG. 7 is a diagram describing indefiniteness of 180
degrees.
[0025] FIG. 8 is a diagram describing a difference in normal-line
information when observing a same object from different
viewpoints.
[0026] FIG. 9 is a diagram describing computation of the depth of a
current frame.
[0027] FIG. 10 is a diagram describing generation processing of a
high-density depth map.
[0028] FIG. 11 is a flowchart indicative of an operation in the
first embodiment.
[0029] FIG. 12 is a flowchart indicative of update processing that
is executed in an environment mapping block.
[0030] FIG. 13 is a diagram indicative of a configuration of a
first example in the first embodiment.
[0031] FIG. 14 is a diagram indicative of a configuration of a
rotational amount detection block.
[0032] FIG. 15 is a diagram indicative of an example of an
indefiniteness resolution method.
[0033] FIG. 16 is a diagram describing detection of a travel
amount.
[0034] FIG. 17 is a flowchart indicative of an operation of a
motion amount detection block.
[0035] FIG. 18 is a flowchart indicative of a rotational amount
detection operation.
[0036] FIG. 19 is a diagram indicative of a configuration of a
second example in the first embodiment.
[0037] FIG. 20 is a diagram illustrating a case in which there are
two points at which an object looks the same.
[0038] FIG. 21 is a flowchart indicative of an operation of a
motion amount detection block.
[0039] FIG. 22 is a flowchart indicative of a detection operation
of a motion amount.
[0040] FIG. 23 is a diagram indicative of a configuration of a
third example in the first embodiment.
[0041] FIG. 24 is a flowchart indicative of an operation of a
motion amount detection block.
[0042] FIG. 25 is a diagram indicative of a configuration of a
second embodiment.
DESCRIPTION OF EMBODIMENTS
[0043] The following describes embodiments of the present
technology. It should be noted that the description will be done in
the following sequence.
1. Configuration of Image Processing Apparatus
2. First Embodiment
3. First Example in First Embodiment
4. Second Example in First Embodiment
5. Third Example in First Embodiment
6. Second Embodiment
7. Other Embodiments
1. Configuration of Image Processing Apparatus
[0044] FIG. 1 depicts a configuration of an image processing
apparatus. The image processing apparatus 10 has a normal-line
information generation block 30 and a motion amount detection block
40. The normal-line information generation block 30 generates
normal-line information of a scene at an observation position, for
example, a scene corresponding to a frame subject to detection
(hereafter simply referred to as "frame subject to detection"). On
the basis of the normal-line information of a scene at a reference
position different from an observation position stored in a data
storage block 50 or the like, for example, a scene corresponding to
a key frame imaged at a reference position different from an
observation position (hereafter simply referred to as "key frame"),
and the normal-line information of a frame subject to detection
generated by the normal-line information generation block 30, the
motion amount detection block 40 detects a motion amount of an
imaging position of the frame subject to detection relative to an
imaging position of the key frame. That is, the motion amount
detection block 40 detects an amount of change in the position or
attitude of an imaging block that generates an image of the frame
subject to detection from an imaging position of the imaging block
that generates a key frame image. In addition, by detecting a
motion amount of the imaging position of the frame subject to
detection relative to the imaging position of the key frame, the
position and attitude at the time of the generation of an image of
the frame subject to detection with reference to the position and
attitude at the time of the generation of the key frame image can
be detected. Thus, the image processing apparatus detects an
observation position at the time of the generation of a frame image
on the basis of the normal-line information generated by the
normal-line information generation block.
[0045] The following describes, in the first embodiment, the case
in which the generation of normal-line information and the
detection of motion amount by use of polarization images are
executed. Further, in the second embodiment, the following
describes the case in which the detection of motion amount is
executed on the basis of normal-line information without use of
polarization images.
2. First Embodiment
[0046] FIG. 2 depicts a configuration of the first embodiment. The
image processing apparatus 10 has the normal-line information
generation block 30 and the motion amount detection block 40. In
addition, the image processing apparatus 10 may have a polarization
image acquisition block 20, a data storage block 50, and an
environment mapping block 60. The following describes the case in
which the polarization image acquisition block 20, the data storage
block 50, and the environment mapping block 60 are also
arranged.
[0047] The polarization image acquisition block 20 acquires a
polarization image of a frame subject to detection. The
polarization image acquisition block 20 acquires two or more
polarization images having different polarization direction;
polarization images having three or more polarization directions,
for example. The polarization image acquisition block 20 may have a
configuration in which an imaging block that generates polarization
images having three or more polarization directions is arranged or
a configuration in which polarization images having three or more
polarization directions are acquired from an external device, a
recording medium or the like. The polarization image acquisition
block 20 uses a current frame polarization image generated by an
imaging block as a polarization image of the frame subject to
detection, for example. In what follows, a polarization image of
the frame subject to detection is described as a current frame
polarization image.
[0048] FIG. 3 depicts a diagram describing a polarization image
that is acquired by the polarization image acquisition block. As
depicted in FIG. 3(a), for example, a polarization image is
generated by rotating a polarization plate PL arranged in front of
an imaging block CM so as to image an object in each of three or
more polarization directions. In addition, as depicted in FIG.
3(b), a polarization image may be generated by imaging an object
after arranging a polarization filter 111 on an image sensor 110,
the polarization filter 111 having a pixel configuration of three
or more polarization directions. It should be noted that, in FIG.
3(b), the case in which the polarization filter 111 in which each
pixel becomes a pixel having any one of four different types of
polarization directions (indicated by arrow marks) is arranged on
the front face of the image sensor 110 is illustrated, for example.
In addition, as depicted in FIG. 3(c), a polarization image may be
generated by sequentially imaging an object by imaging blocks CM1
through CM4 from a same position after arranging polarization
plates PL1 through PL4 having different polarization directions in
front of the imaging blocks CM1 through CM4. Further, as depicted
in FIG. 3(d), two or more polarization images having different
polarization directions may be generated by use of a multi-lens
array configuration. For example, by arranging two or more lenses
112 (four in the diagram) in front of the image sensor 110, an
optical image of an object is focused on the imaging surface of the
image sensor 110 through each lens 112. In addition, by arranging
polarization plates 113 in front of the respective lenses 112, the
polarization directions of the polarization plates 113 are made
different from each other. This configuration can generate
polarization images having different polarization directions by use
of the image sensor 110.
[0049] Further, if no color filter is used by the imaging block,
then the polarization image acquisition block 20 can acquire a
luminance polarization image. In the cases of FIG. 3(a) and FIG.
3(c), an image equivalent to an ordinary luminance image that is
not polarized can be acquired by averaging the luminance levels of
luminance polarization images having different polarization
directions for each pixel. Still further, in the case of FIG. 3(b),
an image equivalent to an ordinary luminance image having no
polarization can be acquired by averaging the luminance levels of
the four pixels adjacent to each other with different polarization
directions. Yet further, in the case of FIG. 3(d), if the distance
between the lenses 112 relative to the distance to an object is
short enough to be ignored, the parallax may be ignored with two or
more polarization images having different polarization directions.
Therefore, averaging the luminance levels of polarization images
having different polarization directions allows the acquisition of
an image equivalent to an ordinary luminance image having no
polarization. If the parallax cannot be ignored, an image
equivalent to an ordinary luminance image having no polarization
can be acquired by aligning the polarization images having
different polarization directions in accordance with an amount of
parallax and averaging the luminance levels of the aligned
polarization images. Further, the polarization image acquisition
block 20 may generate not only luminance polarization images but
also three-primary-color images at the same time by arranging color
filters in the imaging block or generate infrared images or the
like at the same time.
[0050] The normal-line information generation block 30 generates
normal-line information from two or more polarization images having
different polarization directions acquired by the polarization
image acquisition block 20. The following describes the shape of an
object and a polarization image thereof with reference to FIG. 4.
As depicted in FIG. 4, for example, an object OB is illuminated by
use of a light source LT and an imaging block CM images the object
OB through a polarization plate PL. In this case, with a taken
image, the luminance of the object OB varies in accordance with
polarization directions of the polarization plate PL. It should be
noted that, for the ease of description, it is assumed that two or
more polarization images be acquired by rotating the polarization
plate PL so as to perform imaging, the highest luminance being Imax
and the lowest luminance being Imin, for example. It is also
assumed that, if the x-axis and the y-axis in a two-dimensional
coordinate are on the plane of the polarization plate PL, the angle
in the y-axis direction relative to the x-axis when the
polarization plate PL is rotated be polarization angle .nu..
[0051] The polarization plate PL returns to the original
polarization state when rotated by 180 degrees and has a cycle of
180 degrees. Further, let polarization angle .nu. when the maximum
luminance Imax has been observed be azimuth angle .alpha..
According to such definition as described above, luminance I that
is observed when the polarization plate PL is rotated is expressed
as equation (1) below. It should be noted that FIG. 5 depicts an
example of a relation between luminance and polarization angle.
Further, this example depicts a model of diffuse reflection; in the
case of specular reflection, the azimuth angle is offset by 90
degrees relative to the polarization angle.
[ Math . 1 ] I = I max + I min 2 + I max - I min 2 cos ( 2 v - 2
.alpha. ) ( 1 ) ##EQU00001##
[0052] With equation (1), polarization angle .nu. is apparent at
the time of the generation of a polarization image, maximum
luminance Imax, minimum luminance Imin, and azimuth angle .alpha.
being variables. Therefore, executing the fitting into the model
expression indicated in equation (1) by use of the luminance of
polarization images having three or more polarization directions
allows the decision of azimuth angle .alpha. that is a polarization
angle providing the maximum luminance on the basis of the model
expression indicative of the relation between luminance and
polarization angle.
[0053] It is also assumed that the normal line of the surface of an
object be expressed by a polar coordinate system, the normal line
being the azimuth angle .alpha. and the zenith angle .theta.. It
should be noted that the zenith angle .theta. be an angle toward
the normal line from the z-axis and the azimuth angle .alpha. be an
angle in the y-axis direction relative to the x-axis as described
above. Here, if the minimum luminance Imin and the maximum
luminance Imax are obtained when the polarization plate PL is
rotated, then polarization degree .rho. can be computed on the
basis of equation (2) below.
[ Math . 2 ] .rho. = I max - I min I max + I min ( 2 )
##EQU00002##
[0054] The relation between the polarization degree and the zenith
angle is known to have a characteristic depicted in FIG. 6, for
example, from Fresnel formula in the case of diffuse reflection.
Therefore, from the characteristic depicted in FIG. 6, the zenith
angle .theta. can be decided on the basis of the polarization
degree .rho.. It should be noted that the characteristic depicted
in FIG. 6 is illustrative only and therefore varies in accordance
with the refractive index and the like of an object. For example,
as the refractive index increases, the polarization degree
increases.
[0055] The normal-line information generation block 30 computes the
azimuth angle .alpha. and the zenith angle .theta. as described
above and generates normal-line information indicative of the
computed azimuth angle .alpha. and zenith angle .theta.. It should
be noted that the normal-line information indicative of the azimuth
angle .alpha. and the zenith angle .theta. provides a normal-line
angle.
[0056] The normal-line information generated by the normal-line
information generation block 30 as described above has
indefiniteness of 180 degrees. FIG. 7 depicts a diagram describing
the indefiniteness of 180 degrees. In the case where an object OB
depicted in FIG. 7(a) is imaged by an imaging block CM so as to
compute a normal line, a luminance variation in accordance with the
rotation in polarization direction has a cycle of 180 degrees.
Therefore, in a region GA that is the upper half of the object OB,
as depicted in FIG. 7(a), for example, the normal-line direction
(indicated by arrow marks) provides the correct direction; in a
region GB that is the lower half, there is a possibility that the
normal-line direction be reversed.
[0057] The motion amount detection block 40 uses a phenomenon that
the normal-line direction of a same object as observed from
different viewpoints depends only on the variation in the
rotational amount of an imaging block so as to detect from the
variation in the normal-line direction a rotational amount of the
imaging block, namely, a rotational amount of an imaging position
of a current frame relative to a key frame. To be more specific, on
the basis of the normal-line information generated from the
acquired current frame polarization image and the data stored in
the data storage block 50, the motion amount detection block 40
detects a rotational amount of an imaging position of the current
frame polarization image relative to an imaging position of a key
frame polarization image. It should be noted that the rotation of
the imaging block denotes, if an optical axis is the z-axis, a
motion around a rotation axis that is at least one of the z-axis,
the x-axis orthogonal to the z-axis, and the y-axis orthogonal to
the z-axis and the x-axis. Further, a motion in at least one of the
axis directions of the x-axis, the y-axis, and the z-axis is a
travel of the imaging block.
[0058] FIG. 8 depicts a diagram describing the difference in the
normal-line information obtained when a same object has been
observed from different viewpoints. The normal line of an object OB
obtained with a key frame v provides a camera coordinate system of
a viewpoint of the key frame, namely, a coordinate system with an
optical axis of viewpoint V of the key frame being Z.sup.v-axis, a
right direction of an image surface perpendicular to the optical
axis of the key frame being X.sup.v-axis, and an upward direction
perpendicular to the X.sup.v-axis relative to the image surface
being Y.sup.v-axis. Likewise, the normal line of the object OB
obtained with a current frame l provides a camera coordinate system
of a viewpoint of the current frame, namely, a coordinate system
with an optical axis of a viewpoint of the current frame being
Z.sup.l-axis, a right direction of an image surface perpendicular
to the optical axis of the current frame being X.sup.l-axis, and an
upward direction perpendicular to the X.sup.l-axis relative to the
image surface being Y.sup.l-axis. That is, the normal-line
direction of the observed object OB is a direction to the camera
coordinate system of the viewpoint being observed. For example, let
viewpoint variations of the current frame l relative to the key
frame v be rotational amount R.sub.lv and travel amount T.sub.lv,
then, if the normal line of one point of the object is N.sup.v in
the coordinate system of the key frame, normal line N.sup.l of the
same point of the object in the coordinate system of the current
frame can be computed on the basis of equation (3) below.
[Math. 3]
N.sup.l=R.sub.lv.sup.-1N.sup.v (3)
[0059] Further, the normal-line information generated from a
polarization image has indefiniteness of 180 degrees as described
above. Hence, in the motion amount detection block 40, the
indefiniteness of the normal-line information is resolved. If a
point of an object corresponding to the key frame is in the current
frame, for example, the motion amount detection block 40 resolves
the indefiniteness of the normal line of the point in the object of
the current frame by use of the normal line of the point in the
object of the key frame. It should be noted that, if, at the start
of SLAM, the normal-line information of the key frame has
indefiniteness and the depth of the key frame is not known, the
indefiniteness of the normal-line information is resolved by use of
a statistical method or the like that will be described later.
[0060] On the basis of a result of motion amount detection obtained
by the motion amount detection block 40, the environment mapping
block 60 generates a depth corresponding to the current frame
polarization image. In addition, if the depth of the key frame has
not been acquired, the environment mapping block 60 obtains the
depth of the key frame and the depth of the current frame by use of
a method such as stereo matching by use of a result of motion
amount detection by the motion amount detection block 40 and the
images of the current frame and the key frame.
[0061] If the depth of the key frame has been acquired, the depth
of the current frame can be obtained by re-projecting to the
current frame a three-dimensional position to which the depth of
the key frame corresponds on the basis of an imaging position of
the current frame polarization image relative to an imaging
position of the key frame polarization image. Equation (4) is
indicative of a computation expression of the depth of the current
frame. FIG. 9 depicts a diagram describing the computation of the
depth of the current frame. In equation (4), D.sub.v(u) is
indicative of a depth of a pixel position u (a pixel position
indicative of an object position OBu) in the key frame and
D.sub.l(u) is indicative of a depth of a pixel position u' (a pixel
position indicative of an object position OBu) in the current
frame. [X.sub.l(u') Y.sub.l(u') D.sub.l(u')] is indicative of
three-dimensional coordinates of the pixel position u' in the
camera coordinate system of the current frame. .pi. is indicative
of a function for projecting a three-dimensional position (x, y, z)
to (x/z, y/z). .pi..sup.-1(u, D.sub.v(u)) is indicative of a
function for returning the pixel position u to the camera space if
the pixel position u is depth D.sub.v(u), and K is indicative of an
internal parameter matrix of the camera, indicating information
related with focus distance, etc. G.sub.lv(.psi.) is indicative of
a motion amount (rotational amount R.sub.lv and travel amount
T.sub.lv) from the key frame to the current frame as indicated in
equation (5).
[ Math . 4 ] [ X 1 ( u ' ) Y 1 ( u ' ) D 1 ( u ' ) ] T = KG lv (
.psi. ) .pi. - 1 ( u , D v ( u ) ) ( 4 ) G lv ( .psi. ) = G ( R lv
, T lv ) = [ R 11 R 12 R 13 T 1 R 21 R 22 R 23 T 2 R 31 R 32 R 33 T
3 0 0 0 1 ] ( 5 ) ##EQU00003##
[0062] Meanwhile, in the case where depths are computed by use of a
method such as stereo matching, a pixel for which no depth has been
obtained may be caused in an image region having no texture, for
example; if this happens, a depth map indicative of a relation
between pixel position and depth becomes a map of low density.
Therefore, on the basis of the depth obtained by the processing
described above and the normal-line information resolved with the
indefiniteness obtained by the motion amount detection block 40,
the environment mapping block 60 computes a depth for each pixel so
as to generate a depth map of high density.
[0063] FIG. 10 is a diagram for describing the processing of
generating a high-density depth map. It should be noted that, for
the ease of description, the processing of one line, for example,
will be described. Assume that, from a polarization image of the
object OB generated by the imaging block CM as depicted in FIG.
10(a), a depth depicted in FIG. 10(b) be obtained by the
environment mapping block 60 and the normal-line information
resolved with the indefiniteness by the motion amount detection
block 40 as depicted in FIG. 10(c) be obtained. Also assume that
the depth obtained from the polarization image be low in density;
for example, a depth value for the left-end pixel is "2 (meters)"
and no depth value has been obtained for other pixels indicated by
.times.. On the basis of the normal-line information, the
environment mapping block 60 estimates a surface shape of the
object OB. Here, the pixel that is the second from the left end can
be decided, on the basis of the normal-line information of this
pixel, that this pixel is equivalent to a slope surface approaching
the direction of the imaging block from the object surface
corresponding to the left-end pixel. Therefore, the environment
mapping block 60 follows the surface shape of the object OB
starting from the left-end pixel as the origin so as to estimate a
depth value of the second pixel from the left end, obtaining value
"1.5 (meter)," for example. Further, the third pixel from the left
end can be decided to be equivalent to a surface opposite to the
imaging block on the basis of the normal-line direction of this
pixel. Therefore, the environment mapping block 60 follows the
surface shape of the object OB starting from the left-end pixel as
the origin so as to estimate a depth value of the third pixel from
the left end, obtaining value "1 (meter)," for example. The fourth
pixel from the left end can be decided to be equivalent to a slope
surface in the direction moving away from the imaging block from
the object surface corresponding to the third pixel from the left
end. Therefore, the environment mapping block 60 follows the
surface shape of the object OB starting from the left-end pixel as
the origin so as to estimate a depth value of the fourth pixel from
the left end, obtaining value "1.5 (meter)," for example. In
addition, the environment mapping block 60 estimates a depth value
of the fifth pixel from the left end in the same manner, obtaining
"2 (meters)," for example. Thus the environment mapping block 60
generates the depth of high density depicted in FIG. 10(d) from the
depth of low density depicted in FIG. 10(b) by use of the
normal-line information resolved with the indefiniteness generated
by the motion amount detection block 40.
[0064] Using the generated depth, the environment mapping block 60
updates an environment map stored in the data storage block 50. The
environment map is information in which three-dimensional point
groups are put in a database and is indicative of coordinate
positions (three-dimensional positions) of points in the world
coordinate system and indicative of normal-line information. On the
basis of the generated high-density depth of the current frame and
a result of the detection of motion amount obtained by the motion
amount detection block 40, the environment mapping block 60
transforms the camera coordinate system of each point with a depth
obtained into the world coordinate system. That is, the environment
mapping block 60 estimates an observation position in the world
coordinate system. Further, the environment mapping block 60 adds
the coordinate position of each point in the transformed world
coordinate system and the normal-line information to the
environment map. It should be noted that the world coordinate
system is assumed to be the camera coordinate system of the key
frame at the time of the start of SLAM, for example. Therefore, by
sequentially acquiring polarization images by the polarization
image acquisition block 20 and executing the processing mentioned
above by the normal-line information generation block 30 and the
motion amount detection block 40, the information indicative of the
surface shapes of objects included in the polarization images is
sequentially stored in the environment map. It should be noted that
the environment map may include not only the three-dimensional
position and normal-line information of an object but also color
information and the like of each point indicated by a
three-dimensional position.
[0065] In addition, the environment mapping block 60 updates the
key frame polarization image stored in the data storage block 50 by
use of the current frame polarization image, thereby providing the
current frame polarization image as a new key frame polarization
image.
[0066] FIG. 11 is a flowchart indicative of an operation of the
first embodiment. In step ST1, the image processing apparatus
acquires a current frame polarization image. The polarization image
acquisition block 20 of the image processing apparatus 10 acquires
two or more polarization images having different polarization
directions in the current frame and goes to step ST2.
[0067] In step ST2, the image processing apparatus generates
normal-line information. By use of the luminance of two or more
polarization images having different polarization directions, the
normal-line information generation block 30 of the image processing
apparatus 10 executes fitting and the like on a model expression so
as to generate the normal-line information of the current frame,
going to step ST3.
[0068] In step ST3, the image processing apparatus acquires data of
the key frame. The motion amount detection block 40 of the image
processing apparatus 10 acquires the data of the key frame stored
in the data storage block 50 and then goes to step ST4.
[0069] In step ST4, the image processing apparatus detects a motion
amount. On the basis of the normal-line information generated in
step ST2 and the data of the key frame acquired in step ST3, the
motion amount detection block 40 of the image processing apparatus
10 detects the imaging position of the current frame polarization
image relative to the imaging position of the key frame
polarization image and then goes to step ST5.
[0070] In step ST5, the image processing apparatus executes update
processing. The environment mapping block 60 of the image
processing apparatus 10 updates the environment map and the key
frame.
[0071] FIG. 12 illustrates the update processing executed by the
environment mapping block. In step ST11, the environment mapping
block 60 detects a depth. The environment mapping block 60 detects
the depth of the current frame and then goes to step ST12. It
should be noted that, if the depth of the key frame has not been
detected, the environment mapping block 60 also detects the depth
of the key frame.
[0072] In step ST12, the environment mapping block 60 adds a new
point group. The environment mapping block 60 adds a new point
group with the depth detected in step ST11 to the environment map
and then goes to step ST13.
[0073] In step ST13, the environment mapping block 60 executes
storage processing. The environment mapping block 60 stores the
environment map with the new point group added in step ST12 into
the data storage block 50, thereby updating the environment map
stored in the data storage block 50.
[0074] In addition, the environment mapping block 60 updates the
key frame polarization image stored in the data storage block 50 by
use of the current frame polarization image.
[0075] According to the first embodiment described above, a motion
amount of the imaging position of the current frame relative to the
imaging position of key frame can be detected by use of the
normal-line information. Further, if a motion amount is detected on
the basis of the luminance of a taken image and the information of
a pixel position in the taken image as with the related-art SLAM,
it is difficult to detect the motion amount if a positional
difference of the same point in the taken image of the key frame
and the taken image of the current frame is small. Also, with the
related-art SLAM, it is presupposed that the luminance of the same
point as viewed from different viewpoints be not changed, so that
an error is caused in a detection result if the luminance changes.
However, with the first embodiment, a motion amount is detected by
use of normal-line information, so that, even with a small
positional difference of the same point, a motion amount can be
accurately detected if the normal-line direction changes. In
addition, with the first embodiment, a motion amount is detected by
use of normal-line information, so that, even if the luminance of
the same point varies due to motion, a motion amount can be
accurately detected.
[0076] Further, with the related-art SLAM, an environment map of
high density is generated by smoothing the depth by use of the
information about luminance variation. However, smoothing the depth
with the information about luminance variation may eliminate minute
shape variations, thereby causing a depth distortion. By contrast,
since the normal-line information includes minute shape variations,
the first embodiment allows more accurate detection of environment
shapes, thereby creating an environment map near a true value.
3. First Example in First Embodiment
[0077] In the first example in the first embodiment, a case is
described in which, on the basis of the normal-line information and
the like of the respective feature points corresponding to each
other between a key frame and a current frame, a rotational amount
and a travel amount of an imaging position of the current frame
relative to an imaging position of the key frame are sequentially
detected.
[0078] FIG. 13 depicts a configuration of the first example in the
first embodiment. An image processing apparatus of the first
example has a polarization image acquisition block 20, a
normal-line information generation block 30, a motion amount
detection block 41, a data storage block 50, and an environment
mapping block 60.
[0079] The polarization image acquisition block 20 acquires a
current frame polarization image. The polarization image
acquisition block 20 acquires two or more polarization images
having different polarization directions; polarization images
having three or more polarization directions, for example. The
normal-line information generation block 30 generates normal-line
information from the two or more polarization images having
different polarization directions acquired by the polarization
image acquisition block 20. The data storage block 50 stores data
of the key frame, an environment map, and so on. The environment
mapping block 60 generates a depth corresponding to the current
frame polarization image and executes processing of updating the
environment map by use of the generated depth, for example.
[0080] On the basis of a variation in the normal-line direction of
a same object imaged from different viewpoints, the motion amount
detection block 41 sequentially detects a rotational amount and a
travel amount from an imaging position at which an image of the key
frame has been generated to an imaging position at which an image
of the current frame has been generated. The motion amount
detection block 41 has a feature point detection block 401, a
feature point matching block 402, a rotational amount detection
block 403, and a travel amount detection block 404.
[0081] The feature point detection block 401 detects a feature
point from a polarization image acquired by the polarization image
acquisition block 20. By use of the polarization image of the
current frame acquired by the polarization image acquisition block
20, the feature point detection block 401 performs averaging of
pixel values of two or more polarization images or other processing
for each pixel of the polarization images for each pixel position
so as to generate a non-polarization image equivalent to an image
taken without use of a polarization plate, a polarization filter or
the like. In addition, the feature point detection block 401
detects a feature point of a predetermined type from the
non-polarization image by use of such a method as Scale-Invariant
Feature Transform (SIFT), Speeded-Up Robust Features (SURF), or
Features from Accelerated Segment Test (FAST).
[0082] The feature point matching block 402 executes matching
processing on the feature point of the key frame stored in the data
storage block 50 and the feature point detected by the feature
point detection block 401. The feature point matching block 402
executes feature point matching processing by use of such a method
as sum of absolute differences (SAD) or normalized cross
correlation (NCC), for example, thereby detecting a pair of feature
points corresponding to each other between the key frame and the
current frame.
[0083] The rotational amount detection block 403 detects a
rotational amount on the basis of the normal-line information
having indefiniteness generated by the normal-line information
generation block 30, the pair of feature points obtained by the
feature point matching block 402, and the normal-line information
of the feature point of the key frame stored in the data storage
block 50.
[0084] FIG. 14 depicts a configuration of the rotational amount
detection block. The rotational amount detection block 403 has a
feature point indefiniteness resolution block 4031 and a
computation block 4032.
[0085] The feature point indefiniteness resolution block 4031
executes the processing of resolving indefiniteness on the
normal-line information having indefiniteness. FIG. 15 depicts one
example of a method of resolving indefiniteness. As depicted in
FIG. 15, a feature point u' is a feature point in the current frame
l and a feature point u is a feature point of the key frame v
corresponding to the feature point u'. Further, let a normal-line
angle having indefiniteness of the feature point u' be
N'.sub.u'.sup.l(.alpha.'.sub.u'.sup.l, .theta..sub.u'.sup.l) and a
normal-line angle having no indefiniteness be
N.sub.u'.sup.l(.alpha..sub.u'.sup.l, .theta..sub.u'.sup.l). Let a
normal-line angle having no indefiniteness of the feature point u
be N.sub.u.sup.v(.alpha..sub.u.sup.v, .theta..sub.u.sup.v). A
rotational amount of the current frame relative to the key frame is
R.sub.lv and a travel amount is T.sub.lv.
[0086] If an azimuth angle .alpha..sub.u.sup.v of the normal-line
angle N.sub.u.sup.v(.alpha..sub.u.sup.v, .theta..sub.u.sup.v) of
the key frame has no indefiniteness and a motion amount of an
imaging block is small, then the feature point indefiniteness
resolution block 4031 resolves the indefiniteness on the basis of
equation (6) below, thereby generating the normal-line angle
N.sub.u'l(.alpha..sub.u'.sup.l, .theta..sub.u'.sup.l). That is, the
feature point indefiniteness resolution block 4031 references the
azimuth angle .alpha..sub.u.sup.v of the normal-line angle of the
key frame so as to resolve the indefiniteness of an azimuth angle
.alpha.'.sub.u'.sup.l of the normal-line angle
N'.sub.u'.sup.l(.alpha.'.sub.u'.sup.l, .theta..sub.u'.sup.l) having
the indefiniteness of the corresponding feature point in the
current frame, thereby providing an azimuth angle
.alpha..sub.u'.sup.l.
[ Math . 5 ] .alpha. u ' 1 = { .alpha. u ' '1 + 180 , if .alpha. u
' '1 + 180 - .alpha. u v < .alpha. u ' '1 - .alpha. u v .alpha.
u ' '1 otherwise ( 6 ) ##EQU00004##
[0087] On the other hand, if the normal-line information of the key
frame has indefiniteness or a motion amount of the imaging block is
large, then the feature point indefiniteness resolution block 4031
resolves the indefiniteness of the normal line of the feature
points of the key frame and the current frame in a statistical
manner by use of a relation between the positional change of
viewpoint and the change in the normal-line direction as observed
from each viewpoint.
[0088] Here, if the normal-line information of the current frame
has indefiniteness, a value that the normal line N.sub.u'.sup.l of
the feature point u' can take is bidirectional. If the normal line
of the key frame has indefiniteness, a value that the normal line
N.sub.u.sup.v of the feature point u can take is also
bidirectional. Therefore, the number of candidates for a
combination of normal lines for each feature point is two (if the
normal line of the key frame has no indefiniteness) or four (if the
normal line of the key frame has indefiniteness). That is, the
number of computed rotational amounts may be four. Hence, the
feature point indefiniteness resolution block 4031 executes
statistical processing over all corresponding feature points,
providing a most likely rotational amount as the rotational amount
R.sub.lv. That is, the feature point indefiniteness resolution
block 4031 computes a rotational amount for each combination of the
normal lines of the key frame and the current frame for each
feature point so as to detect the most likely rotational amount by
statistically processing the computed rotational amount, thereby
providing the rotational amount R.sub.lv. In addition, the feature
point indefiniteness resolution block 4031 supposes a pair
(N.sub.u'.sup.l, N.sub.u.sup.v) of the normal lines of each feature
point corresponding to the rotational amount R.sub.lv be the normal
line resolved with indefiniteness.
[0089] The computation block 4032 detects the rotational amount
R.sub.lv by use of the normal-line information of the current frame
and the key frame resolved with indefiniteness. To be more
specific, the computation block 4032 computes the rotational amount
R.sub.lv on the basis of equation (7) below by use of the normal
line N.sub.u'.sup.l without indefiniteness of the current frame and
the normal line N.sub.u.sup.v without indefiniteness of the key
frame.
[ Math . 6 ] R lv = arg min all ( u , u ' ) N u ' 1 - R lv - 1 N u
v 2 ( 7 ) ##EQU00005##
[0090] It should be noted that, in solving equation (7) and
resolving the indefiniteness of the normal line of each feature
point, if the azimuth angle .alpha..sub.u.sup.v of the normal-line
angle N.sub.u.sup.v(.alpha..sub.u.sup.v, .theta..sub.u.sup.v) of
the key frame has no indefiniteness and a motion amount of the
imaging block is small, then only one pair of corresponding feature
points is enough for the resolution of the indefiniteness of the
feature point of the current frame. In addition, if the normal-line
information of the key frame has indefiniteness or a motion amount
of the imaging block is large, since at least two pairs of feature
points are required for the computation of a most likely rotational
amount by statistical processing, the number of pairs of feature
points required by the rotational amount detection block 403 is one
or two or more.
[0091] The travel amount detection block 404 detects a travel
amount on the basis of the rotational amount R.sub.lv detected by
the rotational amount detection block 403, the feature point
detected by the feature point matching block 402, and the
three-dimensional or two-dimensional position of the feature point
of the key frame stored in the data storage block 50.
[0092] With respect to the position of the feature point of the key
frame, the necessary number of feature points depends on whether
the three-dimensional position is known or not. In what follows, a
case where the three-dimensional position of the feature point is
known and a case where it is unknown (the two-dimensional position
is known) are described.
[0093] For example, at the time of the start of SLAM, the
three-dimensional position of the feature point in the key frame is
unknown. Therefore, a method for the case where the
three-dimensional position of a feature point is unknown is
applied. Further, if the three-dimensional position of the feature
point of the key frame has been obtained by processing two or more
frames, a method for the case where the three-dimensional position
of a feature point is known is applied. In addition, the depth of
the key frame can be obtained at the start of SLAM by use of a
marker (an augmented reality (AR) marker, for example) for
specifying a position through image recognition or the like.
Therefore, also in such a case, the method for the case where the
three-dimensional position of a feature point is known is
applied.
[0094] Operation to be Executed if the Three-Dimensional Position
of a Feature Point in the Key Frame is Known
[0095] The travel amount detection block 404 detects a travel
amount by minimizing the difference between the position at which
each feature point of the key frame is re-projected to the current
frame and the position of the corresponding feature point in the
current frame. That is, the travel amount detection block 404
computes the travel amount T.sub.lv by use of equation (8) below.
In equation (8), .pi. is indicative of a function for projecting a
three-dimensional position (x, y, z) to (x/z, y/z). .pi..sup.-1 (u,
D.sub.v(u)) is indicative of a function for returning the feature
point u on an image of the key frame to the camera space if the
feature point u is depth D.sub.v(u) as depicted in FIG. 16(a), and
K is indicative of an internal parameter matrix of the camera.
G.sub.lv (.psi.) is indicative of a motion amount (rotational
amount R.sub.lv and travel amount T.sub.lv) from the key frame to
the current frame as indicated in equation (9) below. Further, qa
in equation (8) and equations described later is indicative of all
feature points to be paired.
[ Math . 7 ] T lv = arg min F ( .psi. ) = arg min ( 1 2 u .di-elect
cons. qa ( f u ( .psi. ) ) 2 ) = arg min ( 1 2 u .di-elect cons. qa
( .pi. ( KG lv ( .psi. ) .pi. - 1 ( u , D v ( u ) ) ) - u ' ) 2 ) (
8 ) G lv ( .psi. ) = G ( R lv , T lv ) = [ R 11 R 12 R 13 T 1 R 21
R 22 R 23 T 2 R 31 R 32 R 33 T 3 0 0 0 1 ] ( 9 ) ##EQU00006##
[0096] When the travel amount T.sub.lv is computed by use of
equation (9), there are three variables T.sub.1, T.sub.2, and
T.sub.3. Further, since the travel amounts in x direction and y
direction of the imaging block for each pair of feature points are
each computed on the basis of equation (8), if there are two or
more pairs of feature points, the variables T.sub.1, T.sub.2, and
T.sub.3 of equation (9) can be determined on the basis of four or
more equations. Therefore, the travel amount detection block 404
computes the travel amount T.sub.lv by use of the variables
T.sub.1, T.sub.2, and T.sub.3 determined by use of two or more
pairs of feature points.
[0097] Operation to be Executed if the Three-Dimensional Position
of a Feature Point in the Key Frame is Unknown (a Two-Dimensional
Position is Known)
[0098] If the three-dimensional position of a feature point in the
key frame is unknown, the travel amount detection block 404 detects
the travel amount by minimizing the difference between the position
at which each feature point of the key frame is re-projected to the
current frame and the position of the corresponding feature point
in the current frame. That is, the travel amount detection block
404 computes the travel amount T.sub.lv by use of equation (10). It
should be noted that, in equation (10), the depth of the key frame
is unknown, so that the depth of the feature point u of the key
frame is D.sub.v(u)'' as depicted in FIG. 16(b).
[ Math . 8 ] ( T lv , D v '' ) = arg min F ( .psi. ) = arg min ( 1
2 u .di-elect cons. qa ( f u ( .psi. ) ) 2 ) = arg min ( 1 2 u
.di-elect cons. qa ( .pi. ( KG lv ( .psi. ) .pi. - 1 ( u , D v ( u
) '' ) ) - u ' ) 2 ) ( 10 ) ##EQU00007##
[0099] Since the rotational amount R.sub.lv has been obtained, the
number of variables of equation (10) is n (the depth of each
feature point)+3 (travel amount). That is, if the number of feature
points is n, 2n equations are obtained; if 2n.gtoreq.n+3, namely,
n.gtoreq.3, then the depth value and the travel amount of each
feature point in the key frame are obtained. Further, if the depth
of each feature point in the key frame is obtained,
three-dimensional coordinates [X.sub.l(u') Y.sub.l(u') D.sub.l(u')]
in the camera coordinate system of the current frame with respect
to the feature point u' of the current frame corresponding to the
feature point u of the key frame can be computed on the basis of
equation (11). It should be noted that D.sub.l(u') is the depth of
the feature point u' in equation (11).
[Math. 9]
[X.sub.l(u') Y.sub.l(u') D.sub.l(u')].sup.T=K
G.sub.lv(.psi.).pi..sup.-1(u,D.sub.v(u)'') (11)
[0100] Executing the processing as described above by the motion
amount detection block 41 allows the individual detection of a
rotational amount and a travel amount on the basis of the feature
points of the current frame and the key frame and the normal-line
information of the feature points.
[0101] FIG. 17 is a flowchart indicative of an operation of the
motion amount detection block. In step ST21, the motion amount
detection block 41 detects a feature point. The motion amount
detection block 41 detects a feature point from a polarization
image acquired by the polarization image acquisition block 20 and
then goes to step ST22.
[0102] In step ST22, the motion amount detection block 41 executes
feature point matching processing. The motion amount detection
block 41 executes matching processing between the feature point of
the key frame stored in the data storage block 50 and the feature
point detected in step ST21 so as to detect a pair of corresponding
feature points between the key frame and the current frame and then
goes to step ST23.
[0103] In step ST23, the motion amount detection block 41 detects a
rotational amount. The motion amount detection block 41 detects a
rotational amount on the basis of the normal-line information
having indefiniteness generated by the normal-line information
generation block 30, the pair of feature points detected in step
ST22, and the normal-line information of the feature point of the
key frame stored in the data storage block 50.
[0104] FIG. 18 is a flowchart indicative of a rotational amount
detection operation. In step ST31, the motion amount detection
block 41 resolves the indefiniteness of the normal-line
information. The motion amount detection block 41 executes the
processing described above with reference to the feature point
indefiniteness resolution block 4031 so as to resolve the
indefiniteness of the normal-line information and then goes to step
ST32.
[0105] In step ST32, the motion amount detection block 41 computes
a rotational amount. The motion amount detection block 41 executes
the processing described above with reference to the computation
block 4032 so as to compute a rotational amount and then goes to
step ST24 in FIG. 17.
[0106] In step ST24, the motion amount detection block 41 detects a
travel amount. The motion amount detection block 41 detects a
travel amount on the basis of the rotational amount R.sub.lv
detected in step ST23, the feature point detected in step ST22, and
the three-dimensional or two-dimensional position of the feature
point of the key frame stored in the data storage block 50.
[0107] According to the first example as described above, the
effects provided by the first embodiment described above can be
obtained. Further, according to the first example, the rotational
amount and the travel amount of the imaging position of the current
frame relative to the imaging position of the key frame can be
individually detected, so that the rotational amount of the imaging
block can be correctly detected without being affected by the
travel of the imaging block, for example. Therefore, as compared
with the related-art SLAM in which a motion amount is detected on
the basis of the information about the luminance of a taken image
and the pixel position therein, the accuracy of motion amount
detection can be enhanced. In addition, if a motion amount is
detected on the basis of the information about the luminance of a
taken image and the pixel position therein, a taken image with less
texture makes it difficult to find a corresponding point between
the current frame and the key frame. However, since the first
example uses normal-line information, even a taken image with less
texture allows the detection of a corresponding point between the
current frame and the key frame if a variation in shape takes
place, thereby detecting a motion amount. Further, inclusion of
polarization information in a descriptor of a feature point allows
the enhancement in matching accuracy as compared with the execution
of matching processing on feature points by use of only an ordinary
image. It should be noted that the related-art SLAM needs at least
four pairs of feature points in the detection of a motion amount;
however, the first example allows the detection of a motion amount
only with two pairs of feature points as described above.
4. Second Example in First Embodiment
[0108] In the second example in the first embodiment, a case will
be described in which a rotational and/or travel motion amount of
the imaging position of a current frame relative to the imaging
position of a key frame is detected on the basis of the normal-line
information corresponding to each other between the key frame and
the current frame and the three-dimensional position of a detected
feature point of the key frame or the two-dimensional position on
an image. A motion amount detection result in this case is
equivalent to a result of simultaneous detection and integration of
a rotational amount and a travel amount.
[0109] FIG. 19 depicts a configuration of the second example in the
first embodiment. An image processing apparatus in the second
example has a polarization image acquisition block 20, a
normal-line information generation block 30, a motion amount
detection block 42, a data storage block 50, and an environment
mapping block 60.
[0110] The polarization image acquisition block 20 acquires a
current frame polarization image. The polarization image
acquisition block 20 acquires two or more polarization images
having different polarization directions; for example, polarization
images having three or more polarization directions. The
normal-line information generation block 30 generates normal-line
information from the two or more polarization images having
different polarization directions acquired by the polarization
image acquisition block 20. The data storage block 50 stores data
of the key frame, an environment map, and so on. The environment
mapping block 60 generates a depth corresponding to the current
frame polarization image and executes processing of updating the
environment map by use of the generated depth, for example.
[0111] The motion amount detection block 42 detects a rotational
and/or travel motion amount of an imaging position of the current
frame relative to an imaging position of the key frame on the basis
of a variation in the normal line of a same object imaged from
different viewpoints.
[0112] The motion amount detection block 42 has a feature point
detection block 401, a feature point matching block 402, and a
rotational/travel amount detection block 405.
[0113] The feature point detection block 401 detects a feature
point from the polarization image acquired by the polarization
image acquisition block 20. The feature point detection block 401
performs averaging of pixel values of two or more polarization
images or other processing for each pixel of the polarization
images by use of the polarization image of the current frame
acquired by the polarization image acquisition block 20 and
generates a non-polarization image equivalent to an image taken
without use of a polarization plate, a polarization filter or the
like. In addition, the feature point detection block 401 detects a
feature point of a predetermined type from the non-polarization
image by use of a method such as SIFT or SURF.
[0114] The feature point matching block 402 executes matching
processing between the feature point of the key frame stored in the
data storage block 50 and the feature point detected by the feature
point detection block 401. The feature point matching block 402
executes feature point matching processing by use of SAD or the
like so as to detect a pair of feature points corresponding to each
other between the key frame and the current frame.
[0115] The rotational/travel amount detection block 405 executes
generally the same processing as that of the rotational amount
detection block 403 described above on the normal-line information
having indefiniteness generated by the normal-line information
generation block 30 so as to resolve the indefiniteness of the
normal-line information relative to the current frame and the key
frame. Next, the rotational/travel amount detection block 405
detects a motion amount.
[0116] With generally the same method (hereafter referred to as
"first related-art technique") as that described in NPL 1, in the
key frame, a motion amount is detected by use of a position at
which each feature point is re-projected to the current frame and a
position of a corresponding feature point in the current frame.
That is, a travel amount G.sub.1 can be computed by use of equation
(12) below. In equation (12), .pi. is indicative of a function for
projecting a three-dimensional position (x, y, z) to (x/z, y/z).
.pi..sup.-1(u, D.sub.v(u)) is indicative of a function for
returning a point u on an image to the camera space if the point u
is depth D.sub.v(u), and K is indicative of an internal parameter
matrix of the camera. G.sub.lv(.psi.) is indicative of a motion
amount (rotational amount R.sub.lv and travel amount T.sub.lv) from
the key frame to the current frame as indicated in equation
(13).
[ Math . 10 ] G lv = arg min F ( .psi. ) = arg min ( 1 2 u
.di-elect cons. qa ( f u ( .psi. ) ) 2 ) = arg min ( 1 2 u
.di-elect cons. qa ( .pi. ( KG lv ( .psi. ) .pi. - 1 ( u , D v ( u
) ) ) - u ' ) 2 ) ( 12 ) G lv ( .psi. ) = G ( R lv , T lv ) = [ R
11 R 12 R 13 T 1 R 21 R 22 R 23 T 2 R 31 R 32 R 33 T 3 0 0 0 1 ] (
13 ) ##EQU00008##
[0117] Here, the number of feature point pairs necessary for
solving equation (12) depends on whether the depth of the feature
point of the key frame is known or not. For example, in equation
(12), if the depth D.sub.v(u) of the feature point u is known, then
there are six variables (travel amount of three degrees of
freedom+rotational amount of three degrees of freedom). Therefore,
equation (12) should be solved by three pairs of feature points;
however, as depicted in FIG. 20, if only the position of a feature
point on the image is seen, there are two points at which the
object looks the same. It should be noted that, in FIG. 20, one of
the points at which the object looks the same is assumed to be an
imaging position of an imaging block CMa and the other point is
assumed to be an imaging position of an imaging block CMb. In this
case, the positions of the feature points in images taken by the
imaging blocks CMa and CMb are the same, but the normal-line maps
are different from each other because of different normal-line
directions. It should also be noted that, in FIG. 20, with the
imaging blocks CMa and CMb, the zenith side of a quadrangular
pyramid is the image sensor side and the bottom surface side is the
lens side. In the normal-line maps, the difference in normal-line
direction is indicated by different line types.
[0118] As described above, if there are two points at which the
object looks the same, obtaining the only viewpoint requires one
more feature point in addition to the three feature points. That
is, in order to obtain the only G.sub.lv(.psi.), four or more
feature points are necessary.
[0119] On the other hand, if the depth D.sub.v(u) of the feature
point u in the key frame is unknown, then equation (12) can be
described as equation (14). For example, if the number of feature
points is n, then the number of variables is six (travel amount of
three degrees of freedom+rotational amount of three degrees of
freedom)+n (a depth of n pairs of feature points). Therefore, if
n.gtoreq.6, then a depth value of each feature point and a travel
amount and a rotational amount of each camera can be obtained. It
should be noted that D.sub.v(u)'' is indicative of a depth with the
feature point u unknown.
[ Math . 11 ] ( G lv , D v '' ) = arg min F ( .psi. ) = arg min ( 1
2 u .di-elect cons. qa ( f u ( .psi. ) ) 2 ) = arg min ( 1 2 u
.di-elect cons. qa ( .pi. ( KG lv ( .psi. ) .pi. - 1 ( u , D v ( u
) '' ) ) - u ' ) 2 ) ( 14 ) ##EQU00009##
[0120] Since normal lines can be obtained in the feature points on
the key frame and the current frame, the rotational/travel amount
detection block 405 adds restrictions of the normal line of
corresponding feature point to the first related-art technique such
as described above by use of the different normal lines of a same
point obtained from different viewpoints. Equation (15) is an
equation obtained by adding restrictions of the normal lines of the
corresponding feature points to the first related-art technique.
That is, in equation (15), restrictions for minimizing a difference
between the normal line R.sub.lv(u)N.sub.l(u') of a position at
which each feature point u' of the current frame is re-projected to
the key frame and the normal line N.sub.v(u) of a position of the
corresponding feature point u in the key frame are added. It should
be noted that the normal lines N.sub.v(u), N.sub.l(u) and so on in
the second example and the third example to be described later are
equivalent to the normal lines N.sub.u.sup.v, N.sub.u.sup.l, and so
on in the first example.
[ Math . 12 ] G lv = arg min F ( .psi. ) = arg min ( 1 2 u
.di-elect cons. qa ( f u ( .psi. ) ) 2 + h u ( .psi. ) 2 ) = arg
min ( 1 2 u .di-elect cons. qa ( .pi. ( KG lv ( .psi. ) .pi. - 1 (
u , D v ( u ) ) ) - u ' ) 2 + ( R lv ( u ) N l ( u ' ) - N v ( u )
) 2 ) ( 15 ) ##EQU00010##
[0121] Further, the number of feature points necessary for solving
equation (15) depends on whether the depth of a feature point of
the key frame is known or not. For example, in equation (15), if
the depth of the feature point u is known, then there are six
variables (travel amount of three degrees of freedom+rotational
amount of three degrees of freedom) like the first related-art
technique. However, as depicted in FIG. 20, the normal-line
directions are different even with the same look. Unlike a method
in which only the positional information of feature points is used
in the first related-art technique, the rotational/travel amount
detection block 405 also uses the normal-line directions of feature
points, so that, if there are three pairs of feature points, the
only solution of equation (15) described above can be obtained.
Further, if the depth D.sub.v(u) of the feature point u is unknown,
then equation (15) is replaced by equation (16) below.
[ Math . 13 ] ( G lv , D v '' ) = arg min F ( .psi. ) = arg min ( 1
2 u .di-elect cons. qa ( f u ( .psi. ) ) 2 + h u ( .psi. ) 2 ) =
arg min ( 1 2 u .di-elect cons. qa ( .pi. ( KG lv ( .psi. ) .pi. -
1 ( u , D v ( u ) '' ) ) - u ' ) 2 + ( R lv ( u ) N l ( u ' ) - N v
( u ) ) 2 ) ( 16 ) ##EQU00011##
[0122] Here, if there are n feature points, the number of variables
is six (travel amount of three degrees of freedom+rotational amount
of three degrees of freedom)+n (a depth of n pairs of feature
points). Therefore, if n.gtoreq.6, then the depth value of each
feature point and the rotational and/or travel motion amount of the
imaging position can be obtained. In addition, since restrictions
of the normal line are added in this method, the accuracy of motion
amount detection can be enhanced as compared with the first
related-art technique.
[0123] FIG. 21 is a flowchart indicative of an operation of the
motion amount detection block. In step ST41, the motion amount
detection block 42 detects a feature point. The motion amount
detection block 42 detects a feature point from the polarization
image acquired by the polarization image acquisition block 20 and
then goes to step ST42.
[0124] In step ST42, the motion amount detection block 42 executes
feature point matching processing. The motion amount detection
block 42 executes matching processing between the feature point of
the key frame stored in the data storage block 50 and the feature
point detected in step ST41 so as to detect a pair of corresponding
feature points between the key frame and the current frame, and
then goes to step ST43.
[0125] In step ST43, the motion amount detection block 42 detects a
motion amount. The motion amount detection block 42 detects a
motion amount on the basis of the normal-line information having
indefiniteness generated by the normal-line information generation
block 30, the feature point detected in step ST42, and the
normal-line information of the feature point of the key frame
stored in the data storage block 50.
[0126] Referring to FIG. 22, there is depicted a flowchart
indicative of a motion amount detection operation. In step ST51,
the motion amount detection block 42 resolves the indefiniteness of
the normal-line information. The motion amount detection block 42
executes generally the same processing as that of the feature point
indefiniteness resolution block 4031 described above so as to
resolve the indefiniteness of the normal-line information and then
goes to step ST52.
[0127] In step ST52, the motion amount detection block 42 detects a
motion amount. On the basis of the normal-line information resolved
with indefiniteness and the three-dimensional position of the
feature point detected from the key frame or the two-dimensional
position on the image, the motion amount detection block 42 detects
a rotational and/or travel motion amount as described above by use
of the different normal lines of the same point obtained from
different viewpoints.
[0128] According to the second example as described above, the
effects provided by the first embodiment described above can be
obtained. Further, in the second example, normal-line information
is used, so that, if a rotational and/or travel motion amount is
detected, the motion amount can be detected more accurately than
the related-art techniques. Still further, since the second example
uses normal-line information, even with a taken image having less
texture, a corresponding point between the current frame and the
key frame can be detected if there occurs a variation in shape,
thereby detecting a motion amount like the first example. In
addition, inclusion of polarization information in a descriptor of
a feature point allows the enhancement in matching accuracy as
compared with the execution of matching processing on feature
points by use of only an ordinary image.
5. Third Example in First Embodiment
[0129] In the third example in the first embodiment, a case will be
described in which, unlike the technique based on feature points of
the first and second examples, a motion amount is detected by use
of an image all over the screen.
[0130] FIG. 23 depicts a configuration of the third example in the
first embodiment. An image processing apparatus of the third
example has a polarization image acquisition block 20, a
normal-line information generation block 30, a motion amount
detection block 43, a data storage block 50, and an environment
mapping block 60.
[0131] The polarization image acquisition block 20 acquires a
current frame polarization image. The polarization image
acquisition block 20 acquires two or more polarization images
having different polarization directions; for example, polarization
images having three or more polarization directions. The
normal-line information generation block 30 generates normal-line
information from the two or more polarization images having
different polarization directions acquired by the polarization
image acquisition block 20. The data storage block 50 stores data
of the key frame, an environment map, and so on. The environment
mapping block 60 generates a depth corresponding to the current
frame polarization image and executes processing of updating the
environment map by use of the generated depth, for example.
[0132] The motion amount detection block 43 has a rotational/travel
amount detection block 406. On the basis of an image and
normal-line information of the current frame and an image,
normal-line information and a depth of the key frame, the
rotational/travel amount detection block 406 detects a rotational
and/or travel motion amount of an imaging position of the current
frame relative to an imaging position of the key frame.
[0133] Presupposing that a three-dimensional position of the key
frame be known, the rotational/travel amount detection block 406
detects a motion amount by use of luminance information and
normal-line information of the key frame and the current frame. It
should be noted that if a three-dimensional position of the key
frame is unknown, then a motion amount is computed after obtaining
a depth of the key frame by applying the technique of the first
example or the second example.
[0134] With a technique based on NPL 2 (hereafter referred to as
"second related-art technique"), a rotational amount and a travel
amount can be computed by an ordinary image of the current frame,
an ordinary image of the key frame, and a depth of the key frame.
In the key frame, minimizing, for a point u at which the depth is
known in the key frame, a difference between the luminance of a
point re-projected to the current frame and the luminance of the
point u of the key frame allows the expression by equation (17)
below by use of FIG. 9.
[ Math . 14 ] G lv = arg min F ( .psi. ) = arg min ( 1 2 u
.di-elect cons. qb ( f u ( .psi. ) ) 2 ) = arg min ( 1 2 u
.di-elect cons. qb ( I l ( .pi. ( KG lv ( .psi. ) .pi. - 1 ( u , D
v ( u ) ) ) ) - I v ( u ) ) 2 ) ( 17 ) ##EQU00012##
[0135] In equation (17), I stands for the luminance of an image and
qb for all points with the depth on the key frame known. Therefore,
solving equation (17) can compute a motion amount. However, the
second related-art technique supposes that the luminance of a same
point viewed from the key frame with the viewpoint changed and from
the current frame remains unchanged; in a real environment, the
luminance may vary depending on the variation in a viewpoint
relative to the same point, thereby causing an error in a result of
the detection of a motion amount due to the luminance variation. By
contrast, the rotational/travel amount detection block 406 executes
the detection of a motion amount by use of normal-line information
more accurately than the second related-art technique.
[0136] The rotational/travel amount detection block 406 executes
processing of resolving indefiniteness from the normal-line
information having indefiniteness generated by the normal-line
information generation block 30. If a motion amount of the imaging
block is small, for example, the rotational/travel amount detection
block 406 executes generally the same processing as that of the
rotational amount detection block 403 described above so as to
resolve the indefiniteness of the normal-line information for the
current frame. In addition, on the basis of image information, the
rotational/travel amount detection block 406 may obtain an
approximate value of the motion amount so as to resolve the
indefiniteness of the normal line of the current frame by use of
the obtained approximate value as indicated in equation (18) below.
It should be note that, in equation (18), G.sub.lv(.psi.)'' is
indicative of the approximation of the motion amount obtained by
the second related-art technique and u' is indicative of a position
that is obtained by equation (19) below.
[ Math . 15 ] .alpha. u ' 1 = { .alpha. u ' '1 + 180 , if .alpha. u
' '1 + 180 - .alpha. u v < .alpha. u ' '1 - .alpha. u v .alpha.
u ' '1 , otherwise ( 18 ) u ' = .pi. ( KG lv ( .psi. ) '' .pi. - 1
( u , D v ( u ) ) ) ( 19 ) ##EQU00013##
[0137] After resolving the indefiniteness of the normal line of the
current frame, the rotational/travel amount detection block 406
executes the computation of equation (20) below by use of the
ordinary images of the current frame and the key frame, the
normal-line information resolved with the indefiniteness, and a
depth of the key frame. By executing the computation of equation
(20), the rotational/travel amount detection block 406 computes a
motion amount G.sub.lv of the imaging block for minimizing the
difference in luminance between a point at which a point of the key
frame is re-projected to the current frame and a corresponding
point in the current frame and the difference in the normal
line.
[ Math . 16 ] G lv = arg min F ( .psi. ) = arg min ( 1 2 u
.di-elect cons. qb ( f u ( .psi. ) ) 2 + ( h u ( .psi. ) ) 2 ) =
arg min ( 1 2 u .di-elect cons. qb ( I l ( .pi. ( KG lv ( .psi. )
.pi. - 1 ( u , D v ( u ) ) ) ) - I v ( u ) ) 2 + R lv ( u ) N l (
.pi. ( KG lv ( .psi. ) .pi. - 1 ( u , D v ( u ) ) ) ) - N v ( u ) 2
) ( 20 ) ##EQU00014##
[0138] As described above, by adding restrictions of the normal
line, the rotational/travel amount detection block 406 can detect a
motion amount of an imaging position of the current frame relative
to an imaging position of the key frame more accurately than
before.
[0139] FIG. 24 is a flowchart indicative of an operation of the
motion amount detection block. In step ST61, the motion amount
detection block 43 resolves the indefiniteness of the normal-line
information of the current frame. The motion amount detection block
43 executes generally the same processing as that of the motion
amount detection block 41 described above so as to resolve the
indefiniteness of the normal-line information of the current frame
and then goes to step ST62.
[0140] In step ST62, the motion amount detection block 43 detects a
motion amount. The motion amount detection block 43 detects a
motion amount on the basis of the ordinary images of the current
frame and the key frame, the normal-line information of the key
frame and the normal-line information of the current frame resolved
with indefiniteness in step ST61, and a depth of the key frame.
[0141] According to the third example as described above, the
effects indicated by the first embodiment described above can be
obtained. Further, according to the third example, a motion amount
can be obtained by use of an image all over the screen without
executing the detection of feature points and the matching
processing on feature points.
6. Second Embodiment
[0142] In the second embodiment, a case will be described in which
a motion amount is detected on the basis of normal-line information
without use of a polarization image.
[0143] FIG. 25 depicts a configuration of the second embodiment. An
image processing apparatus has a normal-line information generation
block 30, a motion amount detection block 44, and a data storage
block 50. It should be noted that the normal-line information
generation block 30 and the data storage block 50 are configured in
generally the same manners as those of the first embodiment
described above and therefore the motion amount detection block 44
will be described below.
[0144] On the basis of the normal-line information of each of the
feature points corresponding between a normal-line image based on
the normal-line information of the current frame and a normal-line
image of the key frame, the motion amount detection block 44
detects a motion amount of an imaging position of the current frame
relative to an imaging position of the key frame. The motion amount
detection block 44 has a feature point detection block 441, a
feature point matching block 442, a rotational amount detection
block 443 and a travel amount detection block 444, the rotational
amount detection block 443 and the travel amount detection block
444 making up a motion amount detection processing block.
[0145] The feature point detection block 441 detects a feature
point from the normal-line information of the current frame
generated by the normal-line information generation block 30. The
feature point detection block 441 transforms the normal-line
information into a normal-line image, for example, and detects a
feature point from the normal-line image by use of a feature point
detection method such as SIFT or SURF. Equation (21) below is
indicative of a transform equation for transforming the normal-line
information of each pixel into a normal-line image. For example,
the feature point detection block 441 transforms the normal-line
information into a color normal-line image by setting a red level
according to a normal-line component in x direction, a green level
according to a normal-line component in y direction, and a blue
level according to a normal-line component in z direction.
[ Math . 17 ] [ r g b ] = ( [ N x N y N z ] * 0.5 + 0.5 ) * 255 (
21 ) ##EQU00015##
[0146] The feature point matching block 442 executes matching
processing between the feature point of the key frame stored in the
data storage block 50 and the feature point detected by the feature
point detection block 441. The feature point matching block 442
executes generally the same processing as that of the feature point
matching block 402 described in the first example of the first
embodiment so as to detect a pair of feature points corresponding
to each other between the current frame and the key frame.
[0147] The rotational amount detection block 443 detects a
rotational amount on the basis of the feature point detected by the
feature point matching block 442, the normal-line information of
the current frame generated by the normal-line information
generation block 30, and the normal-line information of the feature
point of the key frame stored in the data storage block 50. The
rotational amount detection block 443 detects a rotational amount
by executing generally the same processing as that of the
rotational amount detection block 403 described in the first
example described above.
[0148] The travel amount detection block 444 detects a travel
amount on the basis of the rotational amount detected by the
rotational amount detection block 443, the pair of feature points
obtained by the feature point matching block 442, and a
three-dimensional position or a two-dimensional position of the
feature point of the key frame stored in the data storage block 50.
The travel amount detection block 444 detects a travel amount by
executing generally the same processing as that of the travel
amount detection block 404 described in the first example described
above.
[0149] According to the second embodiment as described above, a
rotational and/or travel motion amount of an imaging position of
the current frame relative to an imaging position of the key frame
can be detected on the basis of normal-line information without use
of a polarization image. In addition, since the second embodiment
uses normal-line information, generally the same effects as those
of the first embodiment can be obtained. It should be noted that,
in the second embodiment, generally the same processing as that of
the second example of the first embodiment may be executed by use
of a feature point detected from a normal-line image so as to
detect a motion amount. In this case, the motion amount detection
processing block is configured in generally the same configuration
as that of the rotational/travel amount detection block 405 of the
second example.
7. Other Embodiments
[0150] In the first and second embodiments described above, the
cases have been described in which normal-line information is
generated by use of a polarization image; however, the generation
of normal-line information is not restricted to one using a
polarization image. For example, normal-line information may be
generated by a technique known as photometric stereo. Further,
normal-line information may be generated on the basis of a depth
obtained by a technique such as time of flight (TOF) in which a
time for a projected light to be reflected on an object and
returned is measured. Still further, indefiniteness in normal-line
information may be resolved by executing object recognition by use
of image recognition or the like and identifying a normal line by
referencing a shape presented by the recognized object.
[0151] The sequence of processing operations described herein can
be executed by hardware, software, or a combination of both. In the
execution of the processing by software, a program recording a
processing sequence installed in a memory of a computer built in
dedicated hardware is executed. Alternatively, it is practicable to
execute a program installed in a general-purpose computer by which
various kinds of processing operations are executable.
[0152] For example, a program may be recorded in a hard disk unit
as a recording medium, a solid state drive (SSD), or a read only
memory (ROM) in advance. Alternatively, a program can be stored (or
recorded) in a flexible disc, a compact disc ROM (CD-ROM), a
magneto optical (MO) disc, a digital versatile disc (DVD), a
Blu-ray disc (BD) (registered trademark), a magnetic disc, a
semiconductor memory card, and other removable recording media in a
temporary or permanent manner. Such removable recording media can
be provided in so-called package software.
[0153] Further, a program may not be only installed from removable
recording media to a computer, but also transferred from a download
site to a computer through a network such as a local area network
(LAN) or the Internet in a wired or wireless manner. A computer can
receive a program transferred in such a manner and install the
received program into recording media such as a hard disk unit
built in the computer.
[0154] The effects described herein are illustrative only and not
limited thereto, and therefore additional effects not described
above may be provided. The present technology should not be
interpreted only in the range of the above-mentioned embodiments of
the technology. The embodiments of this technology disclose the
present technology in the form of illustration, so that it is to be
understood by those skilled in the art that changes and variations
may be made without departing from the spirit of the present
technology. That is, judgement of the spirit of the present
technology should be based on the reference to the scope of the
claims attached hereto.
[0155] In addition, the image processing apparatus of the present
technology can also take the following configuration.
[0156] (1) An image processing apparatus including:
[0157] a normal-line information generation block configured to
generate normal-line information of a scene at an observation
position; and
[0158] a self-localization estimation block configured to estimate
the observation position on the basis of the normal-line
information generated by the normal-line information generation
block.
[0159] (2) The image processing apparatus according to (1) above,
in which
[0160] the normal-line information generation block generates
normal-line information of a scene at a reference position
different from the observation position, and
[0161] the self-localization estimation block estimates the
observation position on the basis of the normal-line information of
the scene at the observation position and the normal-line
information of the scene at the reference position generated by the
normal-line information generation block.
[0162] (3) The image processing apparatus according to (2) above,
in which
[0163] the self-localization estimation block has [0164] a motion
amount detection block configured to detect an amount of motion
from the reference position to the observation position, and [0165]
an absolute position estimation block configured to estimate the
observation position in a world coordinate system on the basis of
the motion amount and the reference position in the world
coordinate system.
[0166] (4) The image processing apparatus according to (3) above,
in which
[0167] the normal-line information generation block generates
normal-line information of a scene corresponding to a frame subject
to detection imaged at the observation position and normal-line
information of a scene corresponding to a key frame imaged at a
reference position different from the observation position, and
[0168] the self-localization estimation block estimates the
observation position on the basis of the normal-line information of
the scene corresponding to the frame subject to detection and the
normal-line information of the scene corresponding to the key frame
generated by the normal-line information generation block.
[0169] (5) The image processing apparatus according to (4) above,
in which
[0170] the motion amount detection block has [0171] a feature point
detection block configured to detect a feature point from an image
of the frame subject to detection, [0172] a feature point matching
block configured to execute matching processing between the feature
point detected by the feature point detection block and a feature
point detected from an image of the key frame so as to detect a
pair of feature points corresponding to each other between the key
frame and the frame subject to detection, [0173] a rotational
amount detection block configured to detect a rotational amount of
an imaging position of the frame subject to detection relative to
an imaging position of the key frame on the basis of normal-line
information of each of the feature points of the key frame and the
frame subject to detection that are detected by the feature point
matching block, and [0174] a travel amount detection block
configured to detect a travel amount of the imaging position of the
frame subject to detection relative to the imaging position of the
key frame on the basis of the rotational amount detected by the
rotational amount detection block, the feature point detected by
the feature point matching block, and a three-dimensional position
of the feature point of the key frame or a two-dimensional position
on an image.
[0175] (6) The image processing apparatus according to (4) above,
in which
[0176] the motion amount detection block has [0177] a feature point
detection block configured to detect a feature point from an image
of the frame subject to detection, [0178] a feature point matching
block configured to execute matching processing between the feature
point detected by the feature point detection block and a feature
point detected from an image of the key frame so as to detect a
pair of feature points corresponding to each other between the key
frame and the frame subject to detection, and [0179] a
rotational/travel amount detection block configured to detect a
rotational and/or travel motion amount of an imaging position of
the frame subject to detection relative to an imaging position of
the key frame on the basis of normal-line information of each of
the feature points of the key frame and the frame subject to
detection that are detected by the feature point matching block and
a three-dimensional position of the detected feature point of the
key frame or a two-dimensional position on an image.
[0180] (7) The image processing apparatus according to (4) above,
in which
[0181] the motion amount detection block has [0182] a
rotational/travel amount detection block configured to detect a
rotational and/or travel motion amount of an imaging position of
the frame subject to detection relative to an imaging position of
the key frame on the basis of an image and normal-line information
of the frame subject to detection and an image, normal-line
information and a depth of the key frame.
[0183] (8) The image processing apparatus according to (4) above,
in which
[0184] the motion amount detection block has [0185] a feature point
detection block configured to detect a feature point from a
normal-line image based on normal-line information of the frame
subject to detection generated by the normal-line information
generation block, [0186] a feature point matching block configured
to execute matching processing between the feature point detected
by the feature point detection block and a feature point detected
from an image of the key frame so as to detect a pair of feature
points corresponding to each other between the key frame and the
frame subject to detection, and [0187] a motion amount detection
processing block configured to detect a motion amount of an imaging
position of the frame subject to detection relative to an imaging
position of the key frame on the basis of normal-line information
of each of the feature points of the key frame and the frame
subject to detection that are detected by the feature point
matching block.
[0188] (9) The image processing apparatus according to any one of
(1) through (8) above, in which
[0189] the normal-line information generation block generates the
normal-line information by use of a plurality of polarization
images having different polarization directions of the scene at the
observation position.
[0190] (10) The image processing apparatus according to (9) above,
in which
[0191] the normal-line information of the scene at the observation
position generated by the normal-line information generation block
has indefiniteness, and
[0192] the motion amount detection block resolves the
indefiniteness of the normal-line information of the scene at the
observation position so as to detect the motion amount by use of
the normal-line information resolved with the indefiniteness.
[0193] (11) The image processing apparatus according to (9) above,
further including:
[0194] a polarization image acquisition block configured to acquire
the plurality of polarization images having different polarization
directions of the scene at the observation position.
[0195] (12) The image processing apparatus according to (2) above,
further including:
[0196] a data storage block configured to store data including at
least the normal-line information of the scene at the reference
position, in which
[0197] the self-localization estimation block estimates the
observation position by use of the data of the scene at the
reference position stored in the data storage block.
[0198] (13) The image processing apparatus according to any one of
(1) through (12) above, further including:
[0199] an environment mapping block configured to compute a depth
of the scene at the observation position on the basis of the
observation position estimated by the self-localization estimation
block so as to add a three-dimensional point group based on the
computed depth and the observation position to an environment
map.
[0200] (14) The image processing apparatus according to (13) above,
in which
[0201] the environment map includes a three-dimensional position
and normal-line information of the three-dimensional point
group.
INDUSTRIAL APPLICABILITY
[0202] In the image processing apparatus, the image processing
method, and the program according to the present technology,
normal-line information of a scene at an observation position is
generated and, on the basis of the generated normal-line
information, the observation position is estimated. Hence, on the
basis of the normal-line information of a scene corresponding to a
frame subject to detection taken at an observation position and the
normal-line information of a scene corresponding to a key frame
taken at a reference position differing from the observation
position, for example, a motion amount of an imaging position of
the frame subject to detection relative to an imaging position of
the key frame is accurately detected so as to estimate the
observation position. That is, observation positions can be
accurately detected. Therefore, the present technology is
applicable to devices that generate environment maps, robots and
devices of unattended operation, for example, that require the
function of simultaneously executing self-localization estimate and
environment mapping, and the like.
REFERENCE SIGNS LIST
[0203] 10 . . . Image processing apparatus [0204] 20 . . .
Polarization image acquisition block [0205] 30 . . . Normal-line
information generation block [0206] 40, 41, 42, 43, 44 . . . Motion
amount detection block [0207] 50 . . . Data storage block [0208] 60
. . . Environment mapping block [0209] 110 . . . Image sensor
[0210] 111 . . . Polarization filter [0211] 112 . . . Lens [0212]
113 . . . Polarization plate [0213] 401, 441 . . . Feature point
detection block [0214] 402, 442 . . . Feature point matching block
[0215] 403, 443 . . . Rotational amount detection block [0216] 404,
444 . . . Travel amount detection block [0217] 405, 406 . . .
Rotational/travel amount detection block [0218] 4031 . . . Feature
point indefiniteness resolution block [0219] 4032 . . . Computation
block
* * * * *