U.S. patent application number 15/037625 was filed with the patent office on 2016-10-06 for method of estimating the speed of displacement of a camera.
This patent application is currently assigned to Universite de Nice (UNS). The applicant listed for this patent is CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE, UNIVERSITE DE NICE (UNS). Invention is credited to Andrew Comport, Maxime Meilland.
Application Number | 20160292883 15/037625 |
Document ID | / |
Family ID | 50424399 |
Filed Date | 2016-10-06 |
United States Patent
Application |
20160292883 |
Kind Code |
A1 |
Comport; Andrew ; et
al. |
October 6, 2016 |
METHOD OF ESTIMATING THE SPEED OF DISPLACEMENT OF A CAMERA
Abstract
This method comprises the estimation of the speed X.sub.vR of
displacement of a camera by searching for the speed X.sub.vR which
minimizes a discrepancy directly between: -a first value of a
physical quantity at the level of a first point (p*) of a reference
image, and -a second value of the same physical quantity at the
level of a second point (p v.sup.2) of a current image, the first
value of the physical quantity at the level of the first point (p*)
of the reference image being constructed: -by selecting neighbour
points of the first point (p*) as a function of the speed X.sub.vR
and of a time to equal to the exposure time of the first camera,
then -by averaging the values of the physical quantity at the level
of the neighbour points selected and of the first point in such a
way as to generate a new value of the physical quantity at the
level of the first point.
Inventors: |
Comport; Andrew; (Biot,
FR) ; Meilland; Maxime; (Biot, FR) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
UNIVERSITE DE NICE (UNS)
CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE |
Nice
Paris |
|
FR
FR |
|
|
Assignee: |
Universite de Nice (UNS)
Nice
FR
Centre National de la Recherche Scientifique
Paris
FR
|
Family ID: |
50424399 |
Appl. No.: |
15/037625 |
Filed: |
November 17, 2014 |
PCT Filed: |
November 17, 2014 |
PCT NO: |
PCT/EP2014/074764 |
371 Date: |
May 18, 2016 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06T 2207/30241
20130101; G06T 7/74 20170101; G06T 7/248 20170101; H04N 5/247
20130101; G06T 2207/30244 20130101 |
International
Class: |
G06T 7/20 20060101
G06T007/20; H04N 5/247 20060101 H04N005/247; G06T 7/00 20060101
G06T007/00 |
Foreign Application Data
Date |
Code |
Application Number |
Nov 18, 2013 |
FR |
1361306 |
Claims
1-15. (canceled)
16. A method for estimating the speed of movement of a first video
camera at the moment at which that first video camera captures a
current image of a three-dimensional scene, this method including:
a) storing in an electronic memory a reference image corresponding
to an image of the same scene captured by a second video camera in
a different pose, the reference image including pixels organized in
parallel rows, the memory containing for each pixel of the
reference image the measurement of a physical quantity measured by
that pixel, that physical quantity being chosen in the group made
up of the intensity of radiation emitted by the point photographed
by that pixel and of a depth separating that pixel from the point
of the scene photographed by that pixel, b) storing in the
electronic memory the current image, the current image including
pixels organized in parallel rows, the memory containing for each
pixel of the current image the measurement of a physical quantity
measured by that pixel, that physical quantity being the same as
the physical quantity measured by the pixels of the reference
image, c) storing in the electronic memory for each pixel of the
reference image or of the current image the measurement of a depth
that separates that pixel from the point of the scene photographed
by that pixel, d) estimating the pose x.sub.pR of the first video
camera, e) estimating the speed x.sub.vR of movement of the first
video camera during the capture of the current image,wherein the
step e) is executed by seeking the speed x.sub.vR that minimizes,
for N points of the reference image, where N is an integer greater
than 10% of the number of pixels of the reference image, a
difference directly between: a first value of the physical quantity
at the level of a first point of the reference image, that first
value being constructed from at least one measurement of that
physical quantity stored in that reference image, and a second
value of the same physical quantity at the level of a second point
of the current image, that second value being constructed from
measurements of that physical quantity stored in the current image
and the coordinates of the second point, the coordinates of the
second point being obtained from a projection of the point of the
scene photographed by the first point onto the plane of the current
image, this projection being a function of the estimated pose
x.sub.pR and of the measurements of the depths stored in the
current or reference image, the first value of the physical
quantity at the level of the first point of the reference image
being constructed: by selecting points adjacent the first point,
each adjacent point corresponding to the projection onto the plane
of the reference image of a third point the coordinates of which
are obtained by shifting the first point a distance
T.sub.2(-tx.sub.vR), where t is a time elapsed since the beginning
of an exposure time t.sub.e, that time being less than or equal to
the exposure time t.sub.e, and T.sub.2( . . . ) is a function that
integrates the speed x.sub.vR during the time t, each adjacent
point corresponding to a respective value of the time t and the
time t.sub.e being equal to the exposure time of the first video
camera, then by averaging the values of the physical quantity at
the level of the selected adjacent points and the first point so as
to generate a new value of the physical quantity at the level of
the first point, that new value constituting an estimate of that
which would be measured if the exposure time of the pixels of the
second video camera were equal to t.sub.e and if the second video
camera were to move at the speed x.sub.vR during the exposure time
t.sub.e, the values of the physical quantity at the level of the
adjacent points being obtained from the measurements stored in the
reference image and the coordinates of the adjacent points.
17. The method as claimed in claim 16, wherein the method includes:
providing a current image in which the rows of pixels have been
captured one after the other so that a non-zero time t.sub..DELTA.
elapses between the moments of capture of two successive rows of
the current image, and obtaining the coordinates of the second
point in the plane of the current image: by determining the
coordinates of a third point in the plane of the current image that
corresponds to the projection onto that plane of the point of the
scene photographed by the first point, those coordinates being
determined from the estimated pose x.sub.pR and the measurements of
the depths stored in the current image or the reference image, then
by shifting the third point a distance equal and opposite to the
distance travelled by the first video camera between a time t.sub.1
at which a first row of the current image is captured and a time
t.sub.i at which the row of pixels to which the third point belongs
was captured, that distance being a function of the time
t.sub..DELTA. and the speed x.sub.vR, and finally by projecting the
third point shifted in this way onto the plane of the current image
to obtain the coordinates of the second point.
18. The method as claimed in claim 17, wherein, during the step e),
the coordinates of the second point are obtained with the aid of
the following relation: p.sup.w2=w.sub.2(T.sub.2(-.tau.x.sub.vR),
p.sup.w1), where: p.sup.w2 and p.sup.w1 are respectively the
coordinates of the second and third points in the plane of the
current image, .tau. is the time that has elapsed between the time
t.sub.1 and the time t.sub.i, T.sub.2(-.tau.x.sub.vR) is a function
that returns the opposite of the distance travelled by the first
video camera between the times t.sub.1 and t.sub.i by integrating
the speed -x.sub.vR during the time .tau., and w.sub.2( . . . ) is
a central projection that returns the coordinates in the plane of
the current image of the third point after it has been shifted by
the distance T.sub.2(-.tau.x.sub.vR), this central projection being
a function of intrinsic parameters of the first video camera
notably including its focal length.
19. The method as claimed in claim 16, wherein the speed x.sub.vR
is a vector with six coordinates coding the speed of movement in
translation and in rotation of the first video camera along three
mutually orthogonal axes so that during the step e) the speed in
translation and in rotation of the first video camera is
estimated.
20. The method as claimed in claim 16, wherein during the step e)
the coordinates of the pose x.sub.pR are considered as being
unknowns to be estimated so that the steps d) and e) are then
executed simultaneously by simultaneously seeking the pose x.sub.pR
and the speed x.sub.vR that minimize the difference between the
first and second values of the physical quantity.
21. The method as claimed in claim 20, wherein while the
simultaneously seeking the pose x.sub.pR and the speed x.sub.vR,
the coordinates of the pose x.sub.pR are defined by the relation
X.sub.pR=t.sub.px.sub.vR+x.sub.pR-1, where x.sub.pR-1 is the
estimate of the pose of the first video camera at the moment at
which that first video camera captured the preceding current image
and t.sub.p is the time that separates the moment of capture of the
current image from the moment of capture of the preceding current
image by the first video camera so that only six coordinates are to
be estimated during the steps d) and e) to obtain simultaneously
estimates of the speed x.sub.vR and the pose x.sub.pR.
22. The method as claimed in claim 16, wherein the reference image
is an image captured by a second immobile camera.
23. A method for estimating the speed of movement of a first video
camera at the moment at which that first video camera captures a
current image of a three-dimensional scene, this method including:
a) storing in an electronic memory a reference image corresponding
to an image of the same scene captured by a second video camera in
a different pose, the reference image including pixels organized in
parallel rows, the memory containing for each pixel of the
reference image the measurement of a physical quantity measured by
that pixel, that physical quantity being chosen in the group made
up of the intensity of radiation emitted by the point photographed
by that pixel and a depth separating that pixel from the point of
the scene photographed by that pixel, b) storing in the electronic
memory the current image, the current image including pixels
organized in parallel rows, the memory containing for each pixel of
the current image the measurement of a physical quantity measured
by that pixel, that physical quantity being the same as the
physical quantity measured by the pixels of the reference image, c)
storing in the electronic memory for each pixel of the reference
image or of the current image the measurement of a depth that
separates that pixel from the point of the scene photographed by
that pixel, d) estimating a pose x.sub.pR of the first video
camera, e) estimating the speed x.sub.vR of movement of the first
video camera during the capture of the current image,wherein the
step e) is executed by seeking the speed x.sub.vR that minimizes,
for N points of the current image, where N is an integer greater
than 10% of the number of pixels of the current image, a difference
directly between: a first value of the physical quantity at the
level of a first point of the current image, that first value being
constructed from at least one measurement of that physical quantity
stored in that current image, and a second value of the same
physical quantity at the level of a second point of the reference
image, that second value being constructed from measurements of
that physical quantity stored in the reference image and the
coordinates of the second point in the plane of the reference
image, the coordinates of the second point being obtained from a
projection of the point of the scene photographed by the first
point onto the plane of the reference image, this projection being
a function of the estimated pose x.sub.pR and of the measurements
of the depths stored in the current or reference image, the second
value of the physical quantity at the level of the second point of
the reference image being constructed: by selecting points adjacent
the second point, each adjacent point corresponding to the
projection onto the plane of the reference image of a third point
the coordinates of which are obtained by shifting the second point
a distance T.sub.2(-tx.sub.vR), where t is a time elapsed since the
beginning of an exposure time t.sub.e, that time being less than or
equal to the exposure time t.sub.e, and T.sub.2( . . . ) is a
function that integrates the speed x.sub.vR during the time t, each
adjacent point corresponding to a respective value of the time t
and the time t.sub.e being equal to the exposure time of the first
video camera, then by averaging the values of the physical quantity
at the level of the selected adjacent points and the second point
so as to generate a new value of the physical quantity at the level
of the second point, that new value constituting an estimate of
that which would be measured if the exposure time of the pixels of
the second video camera were equal to t.sub.e and if the second
video camera were to move at the speed x.sub.vR during the exposure
time t.sub.e, the values of the physical quantity at the level of
the adjacent points being obtained from the measurements stored in
the reference image and the coordinates of the adjacent points.
24. The method as claimed in claim 23, wherein the method includes:
providing a current image in which the rows of pixels have been
captured one after the other so that a non-zero time t.sub..DELTA.
elapses between the moments of capture of two successive rows of
the current image, and obtaining the coordinates of the second
point in the plane of the reference image: by determining the
coordinates of a third point in the plane of the reference image
that corresponds to the projection onto that plane of the point of
the scene photographed by the first point, those coordinates being
determined from the estimated pose x.sub.pR and the measurements of
the depths stored in the current image or the reference image, then
by shifting the third point a distance equal to and in the same
direction as the distance traveled by the first video camera
between a time t.sub.1 at which a first row of the current image is
captured and a time t.sub.i at which the row of pixels to which the
first point belongs was captured, that distance being a function of
the time t.sub..DELTA. and the speed x.sub.vR, and finally by
projecting the third point shifted in this way onto the plane of
the reference image to obtain the coordinates of the second
point.
25. The method as claimed in claim 24, wherein, during the step e),
the coordinates of the second point are obtained with the aid of
the following relation: p.sup.w5=w.sub.5(T.sub.2(.tau.x.sub.vR),
p.sup.w4), where: p.sup.w5 and p.sup.w4 are respectively the
coordinates of the second and third points in the plane of the
reference image, .tau. is the time that has elapsed between the
time t.sub.1 and the time t.sub.i, T.sub.2(.tau.x.sub.vR) is a
function that returns the distance travelled by the first video
camera between the times t.sub.1 and t.sub.i by integrating the
speed x.sub.vR during the time .tau., and w.sub.5( . . . ) is a
central projection that returns the coordinates in the plane of the
reference image of the third point after it has been shifted by the
distance T.sub.2(.tau.x.sub.vR), this central projection being a
function of intrinsic parameters of the second video camera notably
including its focal length.
26. The method as claimed in claim 23, wherein the speed x.sub.vR
is a vector with six coordinates coding the speed of movement in
translation and in rotation of the first video camera along three
mutually orthogonal axes so that during the step e) the speed in
translation and in rotation of the first video camera is
estimated.
27. The method as claimed in claim 23, wherein during the step e)
the coordinates of the pose x.sub.pR are considered as being
unknowns to be estimated so that the steps d) and e) are then
executed simultaneously by simultaneously seeking the pose x.sub.pR
and the speed x.sub.vR that minimize the difference between the
first and second values of the physical quantity.
28. The method as claimed in claim 27, wherein while the
simultaneously seeking the pose x.sub.pR and the speed x.sub.vR,
the coordinates of the pose x.sub.pR are defined by the relation
X.sub.pR=t.sub.px.sub.vR+x.sub.pR-1, where x.sub.pR-1 is the
estimate of the pose of the first video camera at the moment at
which that first video camera captured the preceding current image
and t.sub.p is the time that separates the moment of capture of the
current image from the moment of capture of the preceding current
image by the first video camera so that only six coordinates are to
be estimated during the steps d) and e) to obtain simultaneously
estimates of the speed x.sub.vR and the pose x.sub.pR.
29. The method as claimed in claim 23, wherein the reference image
is an image captured by a second immobile camera.
30. The method according to claim 20, wherein the method further
comprises ar construction of a trajectory of the first video
camera, said construction including: a) acquiring a
three-dimensional model of the scene, b) storing in an electronic
memory a succession of temporally ordered images captured by the
first video camera during its movement within the scene, each image
including pixels organized in parallel rows, the memory containing
for each pixel of the current image a measurement of a physical
quantity chosen in the group made up of the intensity of radiation
emitted by the point photographed by that pixel and a depth
separating that pixel from the photographed point of the scene, c)
for each current image: constructing or selecting from the
three-dimensional model of the scene a reference image including
pixels that have photographed the same points of the scene as the
pixels of the current image, estimating said pose x.sub.pR of the
first video camera at the moment at which the latter captures that
current image, constructing the trajectory of the first video
camera from the various estimated poses of the first video
camera.
31. The method according to claim 27, wherein the method further
comprises a construction of a trajectory of the first video camera,
that method including: a) acquiring a three-dimensional model of
the scene, b) storing in an electronic memory a succession of
temporally ordered images captured by the first video camera during
its movement within the scene, each image including pixels
organized in parallel rows, the memory containing for each pixel of
the current image a measurement of a physical quantity chosen in
the group made up of the intensity of radiation emitted by the
point photographed by that pixel and a depth separating that pixel
from the photographed point of the scene, c) for each current
image: constructing or selecting from the three-dimensional model
of the scene a reference image including pixels that have
photographed the same points of the scene as the pixels of the
current image, estimating said pose x.sub.pR of the first video
camera at the moment at which the latter captures that current
image, constructing the trajectory of the first video camera from
the various estimated poses of the first video camera.
32. The method according to claim 16, wherein the method comprises
a processing of a current image of a three-dimensional scene, the
current image including pixels organized in parallel rows, that
method including: a) estimating the speed x.sub.vR of movement of a
first video camera at the moment at which that video camera
captured the current image, b) automatically modifying the current
image to correct the current image as a function of the estimated
speed x.sub.vR so as to limit the distortions of the current image
caused by the motion blur.
33. The method according to claim 23, wherein the method comprises
a processing of a current image of a three-dimensional scene, the
current image including pixels organized in parallel rows, that
method including: a) estimating the speed x.sub.vR of movement of a
first video camera at the moment at which that video camera
captured the current image, b) automatically modifying the current
image to correct the current image as a function of the estimated
speed x.sub.vR so as to limit the distortions of the current image
caused by the motion blur.
34. An information storage medium, wherein it contains instructions
for the execution of a method as claimed in claim 16 when those
instructions are executed by an electronic computer.
35. A system for estimating the speed of movement of a first video
camera at the moment at which that first video camera captures a
current image of a three-dimensional scene, that system including:
an electronic memory containing: a reference image corresponding to
an image of the same scene captured by a second video camera in a
different pose, the reference image including pixels organized in
parallel rows, the memory containing for each pixel of the
reference image the measurement of a physical quantity measured by
that pixel, that physical quantity being chosen in the group made
up of the intensity of radiation emitted by the point photographed
by that pixel and a depth separating that pixel from the point of
the scene photographed by that pixel, the current image, the
current image including pixels organized in parallel rows, the
memory containing for each pixel of the current image the
measurement of a physical quantity measured by that pixel, that
physical quantity being the same as the physical quantity measured
by the pixels of the reference image, for each pixel of the
reference image or the current image, a measurement of a depth that
separates that pixel from the point of the scene photographed by
that pixel, an information processing unit adapted to: estimate a
pose x.sub.pR of the first video camera, estimate the speed
x.sub.vR of movement of the first video camera during the capture
of the current image,wherein the information processing unit is
able to estimate the speed x.sub.vR by seeking the speed x.sub.vR
that minimizes, for N points of the reference image, where N is an
integer greater than 10% of the number of pixels of the reference
image, a difference directly between: a first value of the physical
quantity at the level of a first point of the reference image, that
first value being constructed from at least one measurement of that
physical quantity stored in that reference image, and a second
value of the same physical quantity at the level of a second point
of the current image, that second value being constructed from
measurements of that physical quantity stored in the current image
and the coordinates of the second point, the coordinates of the
second point being obtained from a projection of the point of the
scene photographed by the first point onto the plane of the current
image, this projection being a function of the estimated pose
x.sub.pR and of the measurements of the depths stored in the
current or reference image, the first value of the physical
quantity at the level of the first point of the reference image
being constructed: by selecting points adjacent the first point,
each adjacent point corresponding to the projection onto the plane
of the reference image of a third point the coordinates of which
are obtained by shifting the first point a distance
T.sub.2(-tx.sub.vR), where t is a time elapsed since the beginning
of an exposure time t.sub.e, that time being less than the exposure
time t.sub.e, and T.sub.2( . . . ) is a function that integrates
the speed -x.sub.vR during the time t, each adjacent point
corresponding to a respective value of the time t and the time
t.sub.e being equal to the exposure time of the first video camera,
then by averaging the values of the physical quantity at the level
of the selected adjacent points and the first point so as to
generate a new value of the physical quantity at the level of the
first point, that new value constituting an estimate of that which
would be measured if the exposure time of the pixels of the second
video camera were equal to t.sub.e and if the second video camera
were to move at the speed x.sub.vR during the exposure time
t.sub.e, the values of the physical quantity at the level of the
adjacent points being obtained from the measurements stored in the
reference image and the coordinates of the adjacent points.
36. A system for estimating the speed of movement of a first video
camera at the moment at which that first video camera captures a
current image of a three-dimensional scene, that system including:
an electronic memory containing: a reference image corresponding to
an image of the same scene captured by a second video camera in a
different pose, the reference image including pixels organized in
parallel rows, the memory containing for each pixel of the
reference image the measurement of a physical quantity measured by
that pixel, that physical quantity being chosen in the group made
up of the intensity of radiation emitted by the point photographed
by that pixel and a depth separating that pixel from the point of
the scene photographed by that pixel, the current image, the
current image including pixels organized in parallel rows, the
memory containing for each pixel of the current image the
measurement of a physical quantity measured by that pixel, that
physical quantity being the same as the physical quantity measured
by the pixels of the reference image, for each pixel of the
reference image or the current image, a measurement of a depth that
separates that pixel from the point of the scene photographed by
that pixel, an information processing unit adapted to: estimate the
pose x.sub.pR of the first video camera, estimate the speed
x.sub.vR of movement of the first video camera during the capture
of the current image,wherein the processing unit is able to
estimate the speed x.sub.vR by seeking the speed x.sub.vR that
minimizes, for N points of the current image, where N is an integer
greater than 10% of the number of pixels of the current image, a
difference directly between: a first value of the physical quantity
at the level of a first point of the current image constructed from
at least one measurement of that physical quantity stored in that
current image, and a second value of the same physical quantity at
the level of a second point of the reference image, that second
value being constructed from measurements of that physical quantity
stored in the reference image and the coordinates of the second
point, the coordinates of the second point being obtained from a
projection of the point of the scene photographed by the first
point onto the plane of the reference image, this projection being
a function of the estimated pose x.sub.pR and of the measurements
of the depths stored in the current or reference image, the second
value of the physical quantity at the level of the second point of
the reference image being constructed: by selecting points adjacent
the second point, each adjacent point corresponding to the
projection onto the plane of the reference image of a third point
the coordinates of which are obtained by shifting the second point
a distance T.sub.2(-tx.sub.vR), where t is a time elapsed since the
beginning of an exposure time t.sub.e, that time being less than
the exposure time t.sub.e, and T.sub.2( . . . ) is a function that
integrates the speed x.sub.vR during the time t, each adjacent
point corresponding to a respective value of the time t and the
time t.sub.e being equal to the exposure time of the first video
camera, then by averaging the values of the physical quantity at
the level of the selected adjacent points and the second point so
as to generate a new value of the physical quantity at the level of
the second point, that new value constituting an estimate of that
which would be measured if the exposure time of the pixels of the
second video camera were equal to t.sub.e and if the second video
camera were to move at the speed x.sub.vR during the exposure time
t.sub.e, the values of the physical quantity at the level of the
adjacent points being obtained from the measurements stored in the
reference image and the coordinates of the adjacent points.
Description
RELATED APPLICATIONS
[0001] This application is the national stage, under 35 USC 371, of
PCT application PCT/EP2014/074764, filed on Nov. 17, 2014, which
claims the benefit of the Nov. 18, 2013 priority date of French
application 1361306, the content of which is herein incorporated by
reference.
FIELD OF INVENTION
[0002] The invention concerns a method and a system for estimating
the speed of movement of a video camera at the moment when that
video camera is capturing a current image of a three-dimensional
scene. The invention also concerns a method for constructing the
trajectory of a video camera and a method for processing an image
using the method for estimating the speed of movement. The
invention further consists in an information storage medium for
implementing those methods.
BACKGROUND
[0003] It is well known that moving a video camera while it is
capturing an image distorts the captured image. For example,
"motion blur" appears. This is caused by the fact that to measure
the luminous intensity of a point of a scene each pixel must
continue to be exposed to the light emitted by that point for an
exposure time t.sub.e. If the video camera is moved during this
time t.sub.e, the pixel is not exposed to light from a single point
but to that emitted by a plurality of points. The luminous
intensity measured by this pixel is then that from a plurality of
points of light, which causes motion blur to appear.
[0004] Nowadays, there also exist increasing numbers of rolling
shutter video cameras. In those video cameras, the rows of pixels
are captured one after the other, so that, in the same image, the
moment of capturing one row of pixels is offset temporally by a
time t.sub..DELTA. from the moment of capturing the next row of
pixels. If the video camera moves during the time t.sub..DELTA.,
that creates distortion of the captured image even if the exposure
time t.sub.e is considered negligible.
[0005] To correct such distortion, it is necessary to estimate
correctly the speed of the video camera at the moment at which it
captures the image.
[0006] To this end, methods known to the inventors for estimating
the speed of movement of a first video camera at the moment when
that first video camera is capturing a current image of a
three-dimensional scene have been developed. These known methods
are called feature-based methods. These feature-based methods
include steps of extracting particular points in each image known
as features. The features extracted from the reference image and
the current image must then be matched. These steps of extracting
and matching features are badly conditioned, affected by noise and
not robust. They are therefore complex to implement.
[0007] The speed of movement of the first video camera is estimated
afterwards on the basis of the speed of movement of these features
from one image to another. However, it is desirable to simplify the
known methods.
SUMMARY
[0008] To this end, the invention concerns a first method in
accordance with claim 1 for estimating the speed of movement of a
first video camera at the moment at which that first video camera
is capturing a current image of a three-dimensional scene.
[0009] The invention also consists in a second method in accordance
with claim 4 for estimating the speed of movement of a first video
camera at the moment at which that first video camera is capturing
a current image of a three-dimensional scene.
[0010] The above methods do not use any step of extracting features
in the images or of matching those features between successive
images. To this end, the above method directly minimizes a
difference between measured physical quantities in the reference
image and in the current image for a large number of pixels of
those images. This simplifies the method.
[0011] Moreover, given that the difference between the physical
quantities is calculated for a very large number of points of the
images, i.e. for more than 10% of the pixels of the current or
reference image, the number of differences to be minimized is much
greater than the number of unknowns to be estimated. In particular,
the number of differences taken into account to estimate the speed
is much greater than in the case of feature-based methods. There is
therefore a higher level of information redundancy, which makes the
above method more robust than the feature-based methods.
[0012] It will also be noted that in the above method only one of
the images has to associate a depth with each pixel. The first or
second video camera can therefore be a simple monocular video
camera incapable of measuring the depth that separates it from the
photographed scene.
[0013] Finally, the above methods make it possible to estimate the
speed accurately even in the presence of a motion blur in the
current image. To this end, in the above methods, a corresponding
motion blur is added to the values of the physical quantity
constructed from the measurements stored in the current or
reference image. The first and second values of the physical
quantity are therefore both affected by the same motion blur, which
improves the estimate of the speed.
[0014] The embodiments of these methods may include one or more of
the features of the dependent claims.
[0015] These embodiments of the methods for estimating the speed
moreover have the following advantages: [0016] determining the
coordinates of one of the first and second points by taking into
account the displacement of the first camera for the duration
t.sub..DELTA. makes it possible to improve the estimation of the
speed in the presence of a deformation of the current image caused
by the rolling shuttering of the pixels; [0017] carrying out the
steps d) and e) simultaneously makes it possible to estimate
simultaneously the pose and the speed of the first video camera and
therefore to reconstruct its trajectory in the photographed scene
without recourse to additional sensors such as an inertial sensor;
[0018] estimating the pose from the speed x.sub.vR and the time
t.sub.p elapsed between capturing two successive current images
makes it possible to limit the number of unknowns to be estimated,
which simplifies and accelerates the estimation of this speed
x.sub.vR, [0019] taking the speed of movement in translation and in
rotation as unknowns makes it possible to estimate simultaneously
the speed in translation and in rotation of the first video
camera.
[0020] The invention further consists in a method in accordance
with claim 11 constructing the trajectory of a first video
camera.
[0021] The invention further consists in a method in accordance
with claim 12 for processing a current image of a three-dimensional
scene.
[0022] The invention further consists in an information storage
medium containing instructions for executing one of the above
methods when those instructions are executed by an electronic
computer.
[0023] The invention further consists in a first system in
accordance with claim 14 for estimating the speed of movement of a
first video camera at the moment at which that first video camera
is capturing a current image of a three-dimensional scene.
[0024] Finally, the invention further consists in a second system
in accordance with claim 15 for estimating the speed of movement of
a first video camera at the moment at which that first video camera
is capturing a current image of a three-dimensional scene.
[0025] The invention will be better understood on reading the
following description, which is given by way of nonlimiting example
only and with reference to the drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0026] FIG. 1 is a diagrammatic illustration of a system for
estimating the speed of movement of a video camera at the moment at
which the latter is capturing an image and for processing and
correcting the images so captured;
[0027] FIGS. 2A and 2B are timing diagrams showing the moments of
acquisition of different rows of pixels, firstly in the case of a
rolling shutter video camera and secondly in the case of a global
shutter video camera;
[0028] FIG. 3 is a diagrammatic illustration of a step for
determining corresponding points between a reference image and a
current image;
[0029] FIG. 4 is a flowchart of a method for estimating the speed
of a video camera and for processing the images captured by that
video camera;
[0030] FIG. 5 is an illustration of another embodiment of a video
camera that can be used in the system from FIG. 1;
[0031] FIG. 6 is a flowchart of a method for estimating the speed
of the video camera from FIG. 5;
[0032] FIG. 7 is a partial illustration of another method for
estimating the speed of the video camera of the system from FIG.
1;
[0033] FIG. 8 is a diagrammatic illustration of a step of
determining corresponding points in the current image and the
reference image;
[0034] FIG. 9 is a timing diagram showing the evolution over time
of the error between the estimated speed and the real speed in four
different situations.
[0035] In these figures, the same references are used to designate
the same elements.
DETAILED DESCRIPTION
[0036] In the remainder of this description, features and functions
well known to a person skilled in the art are not described in
detail. For a description of the technological background and the
notation and concepts used in this description, the reader may
refer to the following book L1: Yi MA, S. SOATTO, J. KOSECKA, S.
SHANKAR SASTRY, "An invitation to 3-D vision. From images to
Geometric Models", Springer, 2004
[0037] FIG. 1 represents an image processing system 2 for
estimating the pose x.sub.pR and the speed x.sub.vR of a video
camera at the moment at which the latter is acquiring a current
image. This system is also adapted to use the estimated pose
x.sub.pR and speed x.sub.vR to construct the trajectory of the
video camera and/or to process the current images in order to
correct them.
[0038] This system 2 includes a video camera 4 that captures a
temporally ordered series of images of a three-dimensional scene 6.
The video camera 4 is mobile, i.e. it is movable within the scene 6
along a trajectory that is not known in advance. For example, the
video camera 4 is transported and moved by hand by a user or fixed
to a robot or a remote-controlled vehicle that moves inside the
scene 6. Here the video camera 4 is freely movable in the scene 6
so that its pose x.sub.pR, i.e. its position and its orientation,
is a vector in six unknowns.
[0039] The scene 6 is a three-dimensional space. It may be a space
situated inside a building such as an office, a kitchen or
corridors. It may equally be an exterior space such as a road, a
town or a terrain.
[0040] The video camera 4 records the ordered series of captured
images in an electronic memory 10 of an image processing unit 12.
Each image includes pixels organized into parallel rows. Here,
these pixels are organized in columns and rows. Each pixel
corresponds to an individual sensor that measures a physical
quantity. Here the measured physical quantity is chosen from
radiation emitted by a point of the scene 6 and the distance
separating this pixel from this point of the scene 6. This distance
is referred to as the "depth". In this first embodiment, the pixels
of the video camera 4 measure only the intensity of the light
emitted by the photographed point of the scene. Here each pixel
measures in particular the color of the point of the scene
photographed by this pixel. This color is coded using the RGB
(Red-Green-Blue) model, for example.
[0041] Here the video camera 4 is a rolling shutter video camera.
In such a video camera the rows of pixels are captured one after
the other, in contrast to what happens in a global shutter video
camera.
[0042] FIGS. 2A and 2B show more precisely the features of the
video camera 4 compared to those of a global shutter video camera.
In the graphs of FIGS. 2A and 2B the horizontal axis represents
time and the vertical axis represents the number of the row of
pixels. In these graphs each row of pixels, and therefore each
image, is captured with a period t.sub.p. The time necessary for
capturing the luminous intensity measured by each pixel of the same
row is represented by a shaded block 30. Each block 30 is preceded
by a time t.sub.e of exposure of the pixels to the light rays to be
measured. This time t.sub.e is represented by rectangles 32. Each
exposure time t.sub.e is itself preceded by a time for
reinitialization of the pixels represented by blocks 34.
[0043] In FIG. 2A, the pixels are captured by the video camera 4
and in FIG. 2B the pixels are captured by a global shutter video
camera. Accordingly, in FIG. 2A the blocks 30 are offset temporally
relative to one another because the various rows of the same image
are captured one after the other and not simultaneously as in FIG.
2B.
[0044] The time t.sub..DELTA. that elapses between the moments of
capturing two successive rows of pixels is non-zero in FIG. 2A.
Here it is assumed that the time t.sub..DELTA. is the same
whichever pair of successive rows may be selected in the image
captured by the video camera 4.
[0045] Moreover, it is assumed hereinafter that the time
t.sub..DELTA. is constant over time. Because of the existence of
this time t.sub..DELTA., a complete image can be captured by the
video camera 4 only in a time t.sub.r equal to the sum of the times
t.sub..DELTA. that separate the moments of capture of the various
rows of the complete image.
[0046] As indicated above, it is well known that if the video
camera 4 moves between the moments of capturing one row and the
next, this introduces distortion into the captured image. This
distortion is referred to hereinafter as "RS distortion".
[0047] Moreover, it is also well known that if the video camera 4
moves during the exposure time t.sub.e, this causes the appearance
of motion blur in the image. This distortion is referred to
hereinafter as "MB distortion".
[0048] In the remainder of this description it is assumed that the
images captured by the video camera 4 are simultaneously affected
by these two types of distortion, i.e. by RS distortion and MB
distortion. The following methods therefore take account
simultaneously of these two types of distortion.
[0049] Conventionally, each video camera is modeled by a model
making it possible to determine from the coordinates of a point of
the scene the coordinates of the point in the image plane that has
photographed that point. The plane of an image is typically the
plane of the space situated between a projection center C and the
photographed scene onto which a central projection, with center C,
of the scene makes it possible to obtain an image identical to that
photographed by the video camera. For example, the pinhole model is
used. More information on this model can be found in the following
papers: [0050] FAUGERAS, O. (1993). Three-dimensional computer
vision: a geometric viewpoint. MIT Press Cambridge, MA. 23 [0051]
HARTLEY, R. I. & ZISSERMAN, A. (2004). Multiple View Geometry
in Computer Vision. Cambridge University Press, 2nd edn. 23, 86
[0052] In these models, the position of each pixel is identified by
the coordinates of a point p in the image plane. Hereinafter, to
simplify the description, this point p is considered to be located
at the intersection of an axis AO passing through the point PS of
the scene photographed by this pixel (FIG. 1) and a projection
center C. The projection center C is located at the intersection of
all the optical axes of all the pixels of the image. The position
of the center C relative to the plane of the image in a
three-dimensional frame of reference F tied with no degree of
freedom to the video camera 4 is an intrinsic feature of the video
camera 4. This position depends on the focal length of the video
camera, for example. All the intrinsic parameters of the video
camera 4 that make it possible to locate the point p corresponding
to the projection of the point PS onto the plane PL along the axis
OA are typically grouped together in a matrix known as the matrix
of the intrinsic parameters of the video camera or the "intrinsic
matrix". This matrix is denoted K. It is typically written in the
following form:
K = [ f s u 0 0 f .times. r v 0 0 0 1 ] ##EQU00001##
where:
[0053] f is the focal length of the video camera expressed in
pixels,
[0054] s is the shear factor,
[0055] r is the dimensions ratio of a pixel, and
[0056] the pair (u.sub.0, v.sub.0) corresponds to the position
expressed in pixels of the principal point, i.e. typically the
center of the image.
[0057] For a video camera of good quality the shear factor is
generally zero and the dimensions ratio close to 1. This matrix K
is notably used to determine the coordinates of the point p
corresponding to the projection of the point PS onto the plane PL
of the video camera. For example, the matrix K may be obtained
during a calibration phase. For example, such a calibration phase
is described in the following papers: [0058] TSAI, R. Y. (1992).
Radiometry. chap. A versatile camera calibration technique for
high-accuracy 3D machine vision metrology using off-the-shelf TV
cameras and lenses, 221-244. 23 [0059] HEIKKILA, J. & SILVEN,
O. (1997). A four-step camera calibration procedure with implicit
image correction. In IEEE International Conference on Computer
Vision and Pattern Recognition, 1106-23, 24 [0060] ZHANG, Z.
(1999). Flexible camera calibration by viewing a plane from unknown
orientations. In International Conference on Computer Vision,
666-673. 23, 24.
[0061] It is equally possible to obtain this matrix K from an image
of an object or from a calibration pattern the dimensions of which
are known, such as a checkerboard or circles.
[0062] For a fixed focal length lens, this matrix is constant over
time. To facilitate the following description, it will therefore be
assumed that this matrix K is constant and known.
[0063] In this description, the pose of the video camera 4 is
denoted x.sub.pR, i.e. its position and its orientation in a frame
of reference R tied with no degree of freedom to the scene 6. Here
the frame of reference R includes two mutually orthogonal
horizontal axes X and Y and vertical axis Z. The pose x.sub.pR is
therefore a vector with six coordinates of which three are for
representing its position in the frame of reference R plus three
other coordinates for representing the inclination of the video
camera 4 relative to the axes X, Y and Z. For example, the position
of the video camera 4 is identified in the frame of reference R by
the coordinates of its projection center. Similarly, by way of
illustration, the axis used to identify the inclination of the
video camera 4 relative to the axes in the frame of reference R is
the optical axis of the video camera 4.
[0064] Hereinafter, it is assumed that the video camera 4 is
capable of moving with six degrees of freedom, so that the six
coordinates of the pose of the video camera 4 are unknowns that
must be estimated.
[0065] Also x.sub.vR denotes the speed of the video camera 4, i.e.
its speed in translation and in rotation expressed in the frame of
reference R. The speed x.sub.vR is a vector with six coordinates,
three of which coordinates correspond to the speed of the video
camera 4 in translation along the axes X, Y and Z, and of which
three other coordinates correspond to the angular speeds of the
video camera 4 about its axes X, Y and Z.
[0066] For each image captured by the video camera 4 and for each
pixel of that image, the following information is stored in the
memory 10:
[0067] the coordinates in the plane of the image of a point p
identifying the position of the pixel in the plane PL of the
image,
[0068] a measurement of the luminous intensity I(p) measured by
this pixel.
[0069] Here the function I( . . . ) is a function that associates
with each point of the plane PL of the image the measured or
interpolated intensity at the level of that point.
[0070] The processing unit 12 is a unit capable of processing the
images captured by the video camera 4 to estimate the pose x.sub.pR
and the speed x.sub.vR of that video camera at the moment at which
it captures an image. Moreover the unit 12 is also capable here
of:
[0071] constructing the trajectory of the video camera 4 in the
frame of reference R on the basis of the successive estimated poses
x.sub.pR, and
[0072] correcting the images captured by the video camera 4 to
eliminate or limit the RS or MB distortions.
[0073] To this end, the unit 12 includes a programmable electronic
calculator 14 capable of executing instructions stored in the
memory 12. The memory 12 notably contains the instructions
necessary for executing any one of the methods from FIGS. 4, 6 and
7.
[0074] The system 2 also includes a device 20 used to construct a
three-dimensional model 16 of the scene 6. The model 16 makes it
possible to construct reference augmented images. Here "augmented
image" designates an image including, for each pixel, in addition
to the intensity measured by that pixel, a measurement of the depth
that separates that pixel from the point of the scene that it
photographs. The measurement of the depth therefore makes it
possible to obtain the coordinates of the scene photographed by
that pixel. Those coordinates are expressed in the
three-dimensional frame of reference tied with no degree of freedom
to the video camera that has captured this augmented image. Those
coordinates typically take the form of a triplet (x, y, D(p)),
where: [0075] x and y are the coordinates of the pixel in the plane
PL of the image, and [0076] D(p) is the measured depth that
separates this pixel from the point PS of the scene that it has
photographed.
[0077] The function D associates with each point p of the augmented
image the measured or interpolated depth D(p).
[0078] Here the device 20 includes an RGB-D video camera 22 and a
processing unit 24 capable of estimating the pose of the video
camera 22 in the frame of reference R. The video camera 22 is a
video camera that measures both the luminous intensity I*(p*) of
each point of the scene and the depth D*(p*) that separates that
pixel from the photographed point of the scene. The video camera 22
is preferably a global shutter video camera. Such video cameras are
sold by the company Microsoft.RTM., such as the Kinect video
camera, for example, or by the company ASUS.RTM..
[0079] Hereinafter, the coordinates of the point PS are called
vertices and denoted "v*" when they are expressed in the frame of
reference F* tied with no degree of freedom to the video camera 22
and "v" when they are expressed in the frame of reference F. In a
similar way, all the data relating to the video camera 22 is
followed by the symbol "*" to differentiate it from the same data
relating to the video camera 4.
[0080] For example, the unit 24 is equipped with a programmable
electronic calculator and a memory containing the instructions
necessary for executing a simultaneous localization and mapping
(SLAM) process. For more details of these simultaneous localization
and mapping processes, the reader may refer to the introduction to
the following paper A1: M. Meilland and A. I. Comport, "On unifying
key-frame and voxel-based dense visual SLAM at large scales" IEEE
International Conference on Intelligence Robots and Systems, Nov.
3-8, 2013, Tokyo.
[0081] For example, the unit 24 is in the video camera 22.
[0082] The model 16 is constructed by the device 20 and stored in
the memory 10. In this embodiment, the model 16 is a database in
which the various reference images I* are stored. Moreover, in this
database, the pose x.sub.pR* of the video camera 22 at the moment
at which the latter captured the image I* is associated with each
of those images I*. Such a three-dimensional model of the scene 6
is known as a key-frame model. More information about such a model
can be found in the paper A1 cited above.
[0083] The operation of the system 2 will now be described with
reference to the FIG. 4 method.
[0084] The method begins with a learning phase 50 in which the
model 16 is constructed and then stored in the memory 10. To this
end, during a step 52, for example, the video camera 22 is moved
within the scene 6 to capture numerous reference images I* based on
numerous different poses. During this step, the video camera 22 is
moved slowly so that there is negligible motion blur in the
reference images. Moreover, such a slow movement also eliminates
the distortions caused by the rolling shutter effect.
[0085] In parallel with this, during a step 54, the unit 24
estimates the successive poses of the video camera 22 for each
captured reference image I*. It will be noted that this step 54 may
also be carried out after the step 52, i.e. once all the reference
images have been captured.
[0086] Then, in a step 56, the model 16 is constructed and then
stored in the memory 10. To this end, a plurality of reference
images and the pose of the video camera 22 at the moment at which
those reference images were captured are stored in a database.
[0087] Once the learning phase has ended, a utilization phase 60
may then follow.
[0088] During this phase 60, and to be more precise during a step
62, the video camera 4 is moved along an unknown trajectory within
the scene 6. As the video camera 4 is moved, it captures a temporal
succession of images based on different unknown poses. Each
captured image is stored in the memory 10. During the step 62 the
video camera 4 is moved at a high speed, i.e. a speed sufficient
for RS and MB distortions to be perceptible in the captured
images.
[0089] In parallel with this, during a step 64, the unit 12
processes each image acquired by the video camera 4 in real time to
estimate the pose x.sub.pR and the speed x.sub.vR of the video
camera 4 at the moment at which that image was captured.
[0090] Here `in real time` refers to the fact that the estimation
of the pose x.sub.pR and the speed x.sub.vR of the video camera 4
is effected as soon as an image is captured by the video camera 4
and terminates before the next image is captured by the same video
camera 4. Thereafter, the image captured by the video camera 4 used
to determine the pose of that video camera at the moment of the
capture of that image is referred to as the "current image".
[0091] For each current image acquired by the video camera 4, the
following operations are reiterated. During an operation 66, the
unit 12 selects or constructs a reference image I* that has
photographed a large number of points of the scene 6 common with
those that have been photographed by the current image. For
example, to this end, a rough estimate is obtained of the pose
x.sub.pR of the video camera 4 after which there is selected in the
model 16 the reference image whose pose is closest to this rough
estimate of the pose x.sub.pR. The rough estimate is typically
obtained by interpolation based on the latest poses and speeds
estimated for the video camera 4. For example, in a simplified
situation, the rough estimate of the pose x.sub.pR is taken as
equal to the last pose estimated for the video camera 4. Such an
approximation is acceptable because the current image capture
frequency is high, i.e. greater than 10 Hz or 20 Hz. For example,
in the situation described here, the acquisition frequency is
greater than or equal to 30 Hz.
[0092] Once the reference image has been selected during an
operation 68, the pose x.sub.pR and the speed x.sub.vR are
estimated. To be more precise, here there are estimated the
variations x.sub.p and x.sub.v of the pose and the speed,
respectively, of the video camera 4 since the latest pose and speed
that were estimated, i.e. the variations of the pose and the speed
since the latest current image that was captured. Accordingly,
x.sub.pR=x.sub.pR-1+x.sub.p and x.sub.vR=x.sub.vR-1+x.sub.v, where
x.sub.pR-1 and x.sub.vR-i are the pose and the speed of the video
camera 4 estimated for the preceding current image.
[0093] To this end, for each pixel of the reference image I* that
corresponds to a pixel in the image I, a search is performed for
the pose x.sub.p and the speed x.sub.v that minimize the difference
E.sub.1 between the terms I.sub.w(x, p*) and I*.sub.b(x, p*), where
x is a vector that groups the unknowns to be estimated. In this
embodiment, the vector x groups the coordinates of the pose x.sub.p
and the speed x.sub.v. The variable x therefore includes twelve
coordinates to be estimated.
[0094] However, in this first embodiment, to limit the number of
unknowns to be estimated and therefore to enable faster estimation
of the pose x.sub.p and the speed x.sub.v, it is assumed here that
the speed x.sub.vR is constant between the moments of capturing two
successive current images. Under these conditions, the pose x.sub.p
is linked to the estimate of the speed x.sub.vR by the following
equation: x.sub.vR=x.sub.p/t.sub.p, where t.sub.p is the current
image acquisition period. Consequently, in this first embodiment,
there are only six coordinates to be estimated, for example the six
coordinates of the speed x.sub.v.
[0095] The difference E.sub.1 is minimized by successive
iterations. To be more precise, to this end, the following
operations are reiterated:
[0096] 1) Choosing a value for the speed x.sub.v,
[0097] 2) Calculating the value of the difference E.sub.1 for that
value.
[0098] The operations 1) and 2) are reiterated in a loop. During
the operation 1) the chosen value is modified on each iteration to
attempt each time to find a new value of the speed x.sub.v that
further reduces the difference E.sub.1 more than the previous
values attempted.
[0099] Typically, the number of iterations is stopped when a stop
criterion is satisfied. For example, the iterations are stopped
when a value of the speed x.sub.v makes it possible to obtain a
value of the difference E.sub.1 below a predetermined threshold
S.sub.1. Another possible stopping criterion consists in
systematically stopping the iterations of the operations 1) and 2)
if the number of iterations carried out is above a predetermined
threshold S.sub.2.
[0100] During the first iteration, an initial value must be
assigned to the speed x.sub.v. For example, that initial value is
taken equal to zero, i.e. to a first approximation the speed
x.sub.vR is taken not to have varied since it was last
estimated.
[0101] After an iteration of the operations 1) and 2), the
automatic choice of a new value of the speed x.sub.v likely to
minimize the difference E.sub.1 is a well known difference
minimizing operation. Methods making it possible to choose this new
value of the speed x.sub.v are described in the following
bibliographic references, for example: [0102] MALIS, E. (2004).
Improving vision-based control using efficient second-order
minimization techniques. In IEEE International Conference on
Robotics and Automation, 1843-1848. 15, 30 [0103] BENHIMANE, S.,
& MALIS, E. (2004). Real-time image-based tracking of planes
using efficient second-order minimization. In IEEE International
Conference on Intelligent Robots and Systems, 943-948. 30
[0104] Other, even more robust methods are described in the
following bibliographic references: [0105] HAGER, G. &
BELHUMEUR, P. (1998). Efficient region tracking with parametric
models of geometry and illumination. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 20, 1025 -1039. 31, 100 [0106]
COMPORT, A. I., MALIS, E. & RIVES, P. (2010). Real-time
quadrifocal visual odometry. The International Journal of Robotics
Research, 29, 245-266. 16, 19, 21, 31, 102 [0107] ZHANG, Z. (1995).
Parameter Estimation Techniques: A Tutorial with Application to
Conic Fitting. Tech. Rep. RR-2676, INRIA. 31, 32
[0108] Consequently, choosing a new value for the speed x.sub.v
after each iteration will not be described in more detail here.
There will only be described now the detailed method for
calculating the various terms of the difference E.sub.1 for a given
value of the speed x.sub.v.
[0109] The term I.sub.w(x,p*) corresponds to the value of the
luminous intensity of the point p* in the reference image
constructed from the luminous intensities measured in the current
image taking into account the RS distortion. The construction of
the value of this term from a given value of the speed x.sub.v is
illustrated diagrammatically in FIG. 3. To simplify FIG. 3, only a
square of 3 by 3 pixels is represented for each image I and I*.
[0110] Here I.sub.w(x,p*) corresponds to the following composition
of functions:
I(w.sub.2(T.sub.2(-.tau.x.sub.vR)),w.sub.i(T.sub.1, v*)).
[0111] These various functions will now be explained. The vertex v*
corresponds to the coordinates expressed in the frame of reference
F*, of the point PS photographed by the pixel centered on the point
p* of the image plane PL*.
[0112] In a first time, the unit 12 seeks the point p.sup.w1 (FIG.
3) of the current image corresponding to the point p* by first
assuming that the time t.sub..DELTA. is zero. The points p.sup.w1
and p* correspond if they both photograph the same point PS of the
scene 6. If the time t.sub..DELTA. is zero, numerous known
algorithms make it possible to find the coordinates of the point
p.sup.w1 in the image I corresponding to the point p* in the image
I*. Consequently, here, only general information on one possible
method for doing this is given.
[0113] For example, the unit 12 selects in the reference image the
coordinates v* of the point PS associated with the point p*. After
that, the unit 12 effects a change of frame of reference to obtain
the coordinates v of the same point PS expressed in the frame of
reference F of the video camera 4. A pose matrix T.sub.1 is used
for this. Pose matrices are well known. The reader may consult
chapter 2 of the book L1 for more information.
[0114] The pose matrices take the following form if homogeneous
coordinates are used:
T = [ R t 0 1 ] ##EQU00002##
where:
[0115] R is a rotation matrix, and
[0116] t is a translation vector.
[0117] The matrix R and the vector t are functions of the pose
x.sub.pR* and the pose x.sub.pR associated with the images I* and
I, respectively. The pose x.sub.pR* is known from the model 16. The
pose x.sub.pR is equal to x.sub.pR-1+x.sub.p.
[0118] Once the coordinates v of the point PS in the frame of
reference F have been obtained, they are projected by a function
onto the plane PL of the image I to obtain the coordinates of a
point p.sup.w1. The point p.sup.w1 is the point that corresponds to
the intersection of the plane PL and the axis AO that passes
through the center C and the point PS of the scene 6.
[0119] The function w.sub.1( . . . ) that returns the coordinates
of the point p.sup.w1 corresponding to the point p* is known as
warping. It is typically a question of central projection with
center C. It has parameters set by the function T.sub.1.
Accordingly, p.sup.w1=w.sub.1(T.sub.1,p*)
[0120] At this stage, it will already have been noted that the
point p.sup.w1 is not necessarily at the center of a pixel of the
image I.
[0121] Because of the rolling shutter effect, the row of pixels to
which the point p.sup.w1 of the image belongs was not captured at
the same time as the first row of the image, but at a time .tau.
after that first row was captured. Here the first row of the image
I captured is the row at the bottom of the image, as shown in FIG.
2A. The pose x.sub.pR that is estimated is the pose of the video
camera 4 at the moment at which the latter captures the bottom row
of the current image.
[0122] The time .tau. may be calculated as being equal to
(n+1)t.sub..DELTA., where n is the number of rows of pixels that
separate the row to which the point p.sup.w1 belongs and the first
row captured. Here the number n is determined from the ordinate of
the point p.sup.w1. To be more precise, a function e.sub.1( . . . )
is defined that returns the number n+1 as a function of the
ordinate of the point p.sup.w1 in the plane of the image. The time
.tau. is therefore given by the following equation:
T=t.sub..DELTA.e.sub.1(p.sup.w1).
[0123] Moreover, as the video camera 4 moves at the speed x.sub.vR
during the capture of the image, the pixel containing the point
p.sup.w1 has photographed not the point PS of the scene 6 but
another point of the scene after the video camera 4 has been moved
a distance .tau.x.sub.vR. To find the point p.sup.w2 that has
photographed the point PS, it is therefore necessary to move the
point p.sup.w1 in the opposite direction and then to project it
again onto the plane PL.
[0124] This is effected with the aid of the following composition
of functions:
w.sub.2(T.sub.2(-.tau.x.sub.vR),p.sup.w1)
where:
[0125] T.sub.2(-.tau.x.sub.vR) is a function that returns the
coordinates of a point p.sup.T2(.tau.xvR) of the three-dimensional
space corresponding to the position of the point p.sup.w1 after the
latter has been moved in the direction opposite the movement of the
video camera 4 during the time .tau.,
[0126] w.sub.2(. . . . ) is a warping function that returns the
coordinates of the point p.sup.w2 corresponding to the projection
of the point p.sup.T2(.tau.xvR) onto the plane PL.
[0127] The point p.sup.w2 is at the intersection of the plane of
the current image and an optical axis passing through the center C
and the point p.sup.T2(.tau.xvR).
[0128] It will be noted that the symbol "-" in the expression
"-.tau.x.sub.vR" indicates that it is a movement in the opposite
direction to the movement .tau.x.sub.vR. The function T.sub.2
integrates the speed -x.sub.vR over the time .tau. to obtain a
displacement equal to the displacement of the video camera 4 during
the time .tau. but in the opposite direction. Here the speed
x.sub.vR is considered as constant during the time .tau.. The
distance travelled by the video camera 4 during the time .tau. at
the speed x.sub.vR is calculated by integrating that speed over the
time .tau.. For example, to this end, the function T.sub.2( . . . )
is the following exponential matrix:
T.sub.2(-.tau.x.sub.vR)=exp(-.tau.[x.sub.vR] )
where:
[0129] exp( . . . ) is the exponential function, and:
[0130] [x.sub.vR] is defined by the following matrix:
[ [ .omega. ] x v 0 0 ] .di-elect cons. se ( 3 ) ##EQU00003##
[0131] In the above equation, the vector v corresponds to the three
coordinates of the speed in translation of the video camera 4 and
the symbol [w].sub.x is the skew symmetric matrix of the angular
speed of the video camera 4, i.e. the following matrix:
[ .omega. ] x = [ 0 - .omega. z .omega. y .omega. z 0 - .omega. x -
.omega. y .omega. x 0 ] ##EQU00004##
in which .omega..sub.x, .omega..sub.y and .omega..sub.z are the
angular speeds of the video camera 4 about the axes X, Y and Z,
respectively, of the frame of reference R.
[0132] Like the point p.sup.w1, the point p.sup.w2 does not
necessarily fall at the center of a pixel. It is therefore then
necessary to estimate the luminous intensity at the level of the
point p.sup.w2 from the luminous intensities stored for the
adjacent pixels in the current image. This is the role of the
function I( . . . ) that returns the luminous intensity at the
level of the point p.sup.w2 that has been interpolated on the basis
of the luminous intensities stored for the pixels adjacent that
point p. Numerous interpolation functions are known. For example,
the simplest consists in returning the stored intensity for the
pixel within which the point p.sup.w2 is situated.
[0133] If it is assumed that the light radiated by the point PS of
the scene 6 does not vary over time and that it is the same
whatever the point of view, then the intensity I(p.sup.w2) must be
the same as the intensity I*(p*) stored in the reference image I*
after taking account of the RS distortion.
[0134] Nevertheless, it has been assumed here that motion blur in
the current image is not negligible. The luminous intensities
measured by the pixels of the current image are therefore affected
by motion blur whereas the luminous intensities stored for the
pixels of the reference image are not affected by motion blur. The
estimated intensity I(p.sup.w2) is therefore affected by motion
blur because it is constructed from the luminous intensities of the
pixels of the current image in which the MB distortion has not been
corrected. Consequently, if the exposure time t.sub.e is not
negligible, the intensity I(p.sup.w2) therefore does not correspond
exactly to the intensity I*(p*) even if the RS distortion has been
eliminated or at least reduced.
[0135] In this embodiment, it is for this reason that it is not the
difference between the terms I.sub.w(x, p*) and I*(p*) that is
minimized directly but the difference between the terms I.sub.w(x,
p*) and I*.sub.b(x, p*).
[0136] The term I*.sub.b(x, p*) is a value of the luminous
intensity that would be measured at the level of the point p* if
the exposure time of the pixels of the video camera 22 were equal
to that of the video camera 4 and if the video camera 22 were moved
at the speed x.sub.vR during the capture of the reference image I*.
In other words, the image I*.sub.b corresponds to the image I*
after a motion blur identical to that affecting the current image I
has been added to the reference image.
[0137] To simulate the MB distortion in the reference image, the
term I*.sub.b(x, p*) is constructed: [0138] by selecting points
adjacent the point p* of the image I* that would have photographed
the same point of the scene 6 as photographed by the point p* if
the video camera 22 were moved at the speed x.sub.vR during the
exposure time t.sub.e, and then [0139] by combining the intensities
of the adjacent points so selected with that of the point p* so as
to generate a new intensity at the level of the point p* with
motion blur.
[0140] Here the coordinates of the adjacent points are obtained
with the aid of the composition of functions
w.sub.3(T.sub.1.sup.-1T.sub.2(-tx.sub.vR)T.sub.1, p*). The
composition of functions T.sub.1.sup.-1T.sub.2(-tx.sub.vR)T.sub.1
performs the following operations:
[0141] the pose matrix T.sub.1 transforms the coordinates v* of the
point PS of the scene 6 expressed in the frame of reference F* into
coordinates v of that same point expressed in the frame of
reference F, where v* are the coordinates of the point PS
photographed by the point p*,
[0142] T.sub.2(-tx.sub.vR) moves the point PS a distance that is a
function of a time t and the speed x.sub.vR, to obtain the
coordinates of a new point p.sup.T2(-txvR) expressed in the frame
of reference F, where t is a time between zero and the exposure
time t.sub.e of the pixel, and
[0143] the pose matrix T.sub.1.sup.-1 transforms the coordinates of
the point p.sup.T2(-txvR) expressed in the frame of reference F
into coordinates expressed in the frame of reference F*.
[0144] Here the fact is exploited that moving the video camera a
distance tx.sub.vR relative to a fixed scene is equivalent to
moving the fixed scene a distance -tx.sub.vR relative to a fixed
video camera.
[0145] The functions T.sub.1 and T.sub.2 are the same as those
described above. The function T.sub.1.sup.-1 is the inverse of the
pose matrix T.sub.1.
[0146] The function w.sub.3( . . . ) is a warping function that
projects a point of the scene onto the plane PL* to obtain the
coordinates of a point that photographs that point of the scene.
This is typically a central projection with center C*. The point
p.sup.w3 is therefore here the point situated at the intersection
of the plane PL* and the axis passing through the center C* and the
point p.sup.T2(-txvR).
[0147] The coordinates of a point adjacent the point p* are
obtained for each value of the time t. In practice, at least 5, 10
or 20 values of the time t regularly distributed in the range [0;
t.sub.e] are used.
[0148] The intensity at the level of a point in the reference image
is obtained with the aid of a function I*(p.sup.w3). The function
I*( . . . ) is the function that returns the intensity at the level
of the point p.sup.w3 in the reference image I*. The point p.sup.w3
is not necessarily at the center of a pixel. Accordingly, like the
function I( . . . ) described above, the function I*( . . . )
returns an intensity at the level of the point p.sup.w3 constructed
by interpolation from the intensity stored for the pixels adjacent
the point p.sup.w3.
[0149] The intensity I*.sub.b(p*) is then taken as equal to the
mean of the intensities I*(p.sup.w3) calculated for the various
times t. For example, here this is the arithmetic mean using the
same weighting coefficient for each term.
[0150] After each iteration that minimizes the difference E.sub.1,
the pose matrix T.sub.1 is updated with the new estimate of the
pose x.sub.pR obtained from the new estimate of the pose
X.sub.vR.
[0151] Following a number of iterations of the operation 68, the
pose xpR and the speed x.sub.vR can be used for various additional
processing operations. These additional processing operations are
typically performed in real time if they do not take too long to
execute. Otherwise they are executed off-line, i.e. after all the
poses x.sub.pR and speeds x.sub.vR of the video camera 4 have been
calculated. Here, by way of illustration, only a step 70 of
constructing the trajectory of the video camera 4 is performed in
real time.
[0152] During the operation 70, after each new estimate of the pose
x.sub.pR and the speed x.sub.vR, the unit 12 stores the succession
of estimated poses x.sub.pR in the form of a temporally ordered
series. This temporally ordered series then constitutes the
trajectory constructed for the video camera 4.
[0153] By way of illustration, the unit 12 also effects various
processing operations off line. For example, during a step 72, the
unit 12 processes the current image to limit the RS distortion
using the estimate of the speed x.sub.vR. To this end, the pixels
of the current image are typically shifted as a function of the
speed x.sub.vR and the time .tau.. For example, this shifting of
each pixel is estimated by the function
w.sub.2(T.sub.2(-.tau.x.sub.vR),p) for each pixel p of the current
image. Such image processing methods are known and are therefore
not described in more detail here. For example, such methods are
described in the following paper: F. Baker, E. P. Bennett, S. B.
Kang, and R. Szeliski, "Removing rolling shutter wobble", IEEE,
Conference on Computer Vision and Pattern recognition, 2010.
[0154] In parallel with this, during a step 74, the unit 12 also
processes the current image to limit the distortion caused by
motion blur. Such image processing methods based on the estimate of
the speed x.sub.vR of the video camera at the moment at which it
captured the image are known. For example, such methods are
described in the following papers:
[0155] N. Joshi, F. Kang, L. Zitnick, R. Szeliski, "Image
deblurring with inertial measurement sensors", ACM Siggraph, 2010,
and
[0156] F. Navarro, F. J. Seron and D. Gutierrez, "Motion blur
rendering: state of the art", Computer Graphics Forum, 2011.
[0157] FIG. 5 represents a system identical to that from FIG. 1
except that the video camera 4 is replaced by a video camera 80. To
simplify FIG. 5, only the video camera 80 is shown. This video
camera 80 is a video camera identical to the video camera 22 or
simply the same video camera as the video camera 22. In the video
camera 80, the depth is acquired by the rolling shutter effect as
described with reference to FIG. 2A. The rows of pixels capture the
depths one after the other. The time between the moments of capture
of the depth by two successive rows of pixels is denoted
t.sub..DELTA.d. The time t.sub..DELTA.d may be equal to the time
t.sub..DELTA. for the capture of the intensities or not.
[0158] The exposure time of the pixels for capturing the depth is
denoted t.sub.ed. The time t.sub.ed is equal to the exposure time
t.sub..DELTA. or not. The video camera 80 acquires for each pixel
the same information as the video camera 4 and additionally a
vertex v coding in the frame of reference F tied with no degree of
freedom to the video camera 80 the depth of the point of the scene
photographed by that pixel.
[0159] The operation of the system from FIG. 1 in which the video
camera 4 is replaced by the video camera 80 will now be explained
with reference to the FIG. 6 method. This method is identical to
that from FIG. 4 except that the operation 68 is replaced by an
operation 84. During the operation 84, the pose x.sub.p and the
speed x.sub.v are estimated by minimizing, in addition to the
difference E.sub.1 described above, a difference E.sub.2 between
the following terms: D.sub.w(x,p*) and D*.sub.b(x,p*). The term
D.sub.w(x,p*) corresponds to the estimate of the depth measured at
the level of the point p* of the reference image constructed from
the depths stored in the current image and taking account of the RS
distortion.
[0160] Here the term D.sub.w(x,p*) is the composition of the
following functions:
D(w.sub.2(T.sub.2(-.tau..sub.dx.sub.vR)),w.sub.1(T.sub.1,v*))
[0161] Here this is the same composition of functions as described
above for the intensity I( . . . ) but with the function I( . . . )
replaced by the function D( . . . ). The function D( . . . )
returns the value of the depth at the level of the point p.sup.w2.
Like the intensities, the depth at the level of the point p.sup.w2
is estimated by interpolation from the depths measured by the
pixels adjacent the point p.sup.w2 in the current image. The time
.tau..sub.d is the time calculated like the time T but replacing
t.sub..DELTA. by t.sub..DELTA.d.
[0162] The term D.sub.w(x,p*) is therefore an approximation of the
depth at the level of the point p* in the reference image
constructed from the depths measured by the video camera 80.
[0163] The term D*.sub.b(x,p*) corresponds to the depths that would
be measured at the level of the point p* if the video camera 22
were moved at the speed x.sub.vR and if the exposure time of the
pixels of the video camera 22 for measuring the depth were equal to
the exposure time t.sub.ed. Here the term D*.sub.b(x,p*) is
constructed in a similar manner to that described for the term
I*.sub.b(x,p*). The term D*.sub.b(x,p*) is therefore constructed:
[0164] by selecting points adjacent the point p* of the image I*
that would have photographed the same point of the scene as
photographed by the point p* if the video camera 22 were moved at
the speed x.sub.vR during the exposure time t.sub.ed, then [0165]
by combining the depths of the adjacent points so selected with
that of the point p* so as to generate a new depth at the level of
the point p* with motion blur.
[0166] The adjacent points are selected in exactly the same way as
described above for the term I*.sub.b(x,p*) except that the time
t.sub.e is replaced by the time t.sub.ed. The depth measured by the
adjacent points is obtained with the aid of a function D*( . . . ).
The function D*(p.sup.w3) is the function that returns the depth at
the level of the point p.sup.w3 based on the depths measured for
the pixels adjacent the point p.sup.w3.
[0167] Moreover, in this particular case, it is assumed that the
times t.sub..DELTA., t.sub..DELTA.d, t.sub.e and t.sub.ed are
unknowns. The variable x therefore includes in addition to the six
coordinates of the speed x.sub.v four coordinates intended to code
the values of the times t.sub..DELTA., t.sub..DELTA.d, t.sub.e and
t.sub.ed. The steps of simultaneous minimization of the differences
E.sub.1 and E.sub.2 therefore lead also to estimating in addition
to the speed x.sub.v the value of the times t.sub..DELTA.,
t.sub..DELTA.d, t.sub.e and t.sub.ed.
[0168] FIG. 7 shows another method for estimating the pose x.sub.pR
and the speed x.sub.vR of the video camera 4 with the aid of the
system 2. This method is identical to that from FIG. 4 except that
the operation 68 is replaced by an operation 90. To simplify FIG.
7, only the portion of the method including the operation 90 is
shown. The other portions of the method are identical to those
described above.
[0169] The operation 90 will now be explained with reference to
FIG. 8. In FIG. 8, the same simplifications have been applied as in
FIG. 3.
[0170] During the step 90, the pose x.sub.p and the speed x.sub.v
are estimated by minimizing a difference E.sub.3 between the
following terms: I*.sub.w(x,p) and I(p).
[0171] The term I(p) is the intensity measured at the level of the
point p by the video camera 4.
[0172] The term I*.sub.w(x,p) corresponds to the estimate of the
intensity at the level of the point p of the current image
constructed from the intensities stored in the reference image I*
taking account of the RS and MB distortion of the video camera 4.
Here the term I*.sub.w(x,p) corresponds to the following
composition of functions:
I.sup.*.sub.b(w.sub.5(T2(.tau.x.sub.vR),w.sub.4(T.sub.1.sup.-1,
v))
[0173] The vertex v contains the coordinates in the frame of
reference F of the point PS photographed by the point p. This
vertex v is estimated from the vertices v* of the reference image.
For example, there is initially a search for the points p.sup.w1
closest to the point p, after which the vertex v is estimated by
interpolation from the coordinates T.sub.1v* of the vertices
associated with these closest points p.sup.w1.
[0174] T.sub.1.sup.-1 is the pose matrix that is the inverse of the
matrix T.sub.1. It therefore transforms the coordinates of the
vertex v expressed in the frame of reference F into coordinates v*
expressed in the frame of reference F*. The function w.sub.4 is a
warping function that projects the vertex v* onto the plane PL* of
the reference image to obtain the coordinates of a point p.sup.w4
(FIG. 8). The function w.sub.4( . . . ) is identical to the
function w.sub.3, for example.
[0175] In this embodiment, the aim is to obtain the intensity that
would have been measured at the level of the point p.sup.w4 if the
video camera 22 were a rolling shutter video camera identical in
this regard to the video camera 4. To this end, it is necessary to
shift the point p.sup.w4 as a function of T and the speed x.sub.vR.
Here this shift is T.sub.2(.tau.x.sub.vR), i.e. the same as for the
method from FIG. 4 but in the opposite direction. After shifting
the point p.sup.w4 by T.sub.2(.tau.x.sub.vR) a point
p.sup.T2(.tau.xvR) is obtained. After projection of the point
p.sup.T2(.tau.xvR) into the plane PL* by the function w.sub.5( . .
. ), the coordinates of the point P.sup.w5 are obtained.
[0176] The function I*.sub.b( . . . ) is the same as that defined
above, i.e. it makes it possible to estimate the value of the
intensity at the level of the point p.sup.w5 that would be measured
by the pixels of the video camera 22 if its exposure time were
equal to t.sub.e and if the video camera 22 were moved at the speed
x.sub.vR. The function I*.sub.b( . . . ) therefore introduces the
same motion blur into the reference image as that observed in the
current image.
[0177] The values of the pose x.sub.p and the speed x.sub.v that
minimize the difference E.sub.3 are estimated as in the case
described for the difference E.sub.1.
[0178] FIG. 9 represents the evolution over time of the difference
between the angular speed estimated using the method from FIG. 4
and the real angular speed of the video camera 4. The curves
represented were obtained experimentally. Each curve represented
was obtained using the same sequence of current images and the same
reference images. In FIG. 9, the abscissa axis represents the
number of current images processed and the ordinate axis represents
the error between the real angular speed and the estimated angular
speed. Here that error is expressed in the form of a root mean
square error (RMSE).
[0179] The curve 91 represents the evolution of the error without
correcting the RS and MB distortions. In this case, the estimates
of the speed x.sub.v are obtained using the method from FIG. 4, for
example, but taking the time t.sub.e and the time t.sub..DELTA. as
equal to zero.
[0180] The curve 92 corresponds to the situation where only the RS
distortion is corrected. This curve is obtained by executing the
method from FIG. 4 taking a non-zero value for the time
t.sub..DELTA. and fixing the time t.sub.e at zero.
[0181] The curve 94 corresponds to the situation in which only the
MB distortion is corrected. This curve is obtained by executing the
method from FIG. 4 taking a non-zero value for the time t.sub.e and
fixing the time t.sub..DELTA. at zero.
[0182] Finally, the curve 96 corresponds to the situation in which
the RS and MB distortions are corrected simultaneously. This curve
is obtained by executing the method from FIG. 4 taking non-zero
values for the time t.sub.e and the time t.sub..DELTA..
[0183] As these curves illustrate, in the situation tested
experimentally improved results are obtained as soon as the RS
and/or MB distortion is taken into account. Unsurprisingly, the
best results are obtained when both the RS and MB distortions are
taken into account simultaneously. In the situation tested, taking
into account only the MB distortion (curve 94) gives better results
than taking into account only the RS distortion (curve 92).
[0184] Numerous other embodiments are possible. For example, the
unit 24 may be inside the video camera 22. The video camera 22 can
also include sensors that directly measure its pose within the
scene 6 without having to perform any image processing for this.
For example, one such sensor is an inertial sensor that measures
the acceleration of the video camera 22 along three orthogonal
axes.
[0185] The video cameras 22 and 4 may be identical or different.
The video cameras 22 and 4 may measure the intensity of the
radiation emitted by a point of the scene at wavelengths other than
those visible by a human. For example, the video cameras 22 and 4
may operate in the infrared.
[0186] The unit 12 may be inside the video camera 4 to perform the
processing operations in real time. Nevertheless, the unit 12 may
also be mechanically separate from the video camera 4. In this
latter case, the images captured by the video camera 4 are
downloaded into the memory 10 in a second step and then processed
by the unit 12 afterwards.
[0187] The three-dimensional model 16 of the scene 6 may differ
from a model based on reference images. For example, the model 16
may be replaced by a three-dimensional volumetric computer model of
the scene 6 produced using computer-assisted design, for example.
Thereafter, each reference image is constructed on the basis of
this mathematical model. More details of these other types of
three-dimensional models can be found in the paper A1.
[0188] The reference image may be an image selected in the model 16
or an image constructed from the images contained in the model 16.
For example, the reference image can be obtained by combining a
plurality of images contained in the model 16, as described in the
paper A1, so as to obtain a reference image the pose of which is
closer to the estimated pose of the current image.
[0189] In another variant, the model 16 is not necessarily
constructed beforehand during a learning phase. To the contrary, it
may be constructed as the video camera 80 is moved within the scene
6. The simultaneous construction of the trajectory of the video
camera 80 and of the map of the scene 6 is known as simultaneous
localization and mapping (SLAM). In this case, the images from the
video camera 80 are added to the model 16 as they are moved within
the scene 6, for example. Before adding a reference image to the
model 16, the latter is preferably processed to limit the RS and/or
MB distortion as described in the steps 72 and 74. In this variant
the phase 50 and the device 20 are omitted.
[0190] Numerous other embodiments of the method are equally
possible. For example, the estimate x.sub.pR of the pose of the
video camera 4 may be obtained in a different manner. For example,
the video camera 4 is equipped with a sensor measuring its pose in
the scene, such as an inertial sensor, and the pose x.sub.pR is
estimated from measurements from this sensor inside the video
camera 4.
[0191] In another embodiment, the differences E.sub.1 or E.sub.3
are calculated not for a single reference image but for a plurality
of reference images. Thus in the method from FIG. 4 the difference
E.sub.1 is then replaced by the differences E.sub.1.1 or E.sub.1.2
where the difference E.sub.1.1 is calculated from a first reference
image and the difference E.sub.1.2 is calculated from a second
reference image separate from the first.
[0192] It is equally possible to use models other than the pinhole
model to model a video camera, in particular, the pinhole model is
preferably complemented by a model of the radial distortions to
correct the aberrations or distortions caused by the lenses of the
video camera. Such distortion models can be found in the following
paper: SLAMA, C. C. (1980). Manual of Photogrammetry. American
Society of Photogrammetry, 4th edn. 24.
[0193] Alternatively, the coordinates of the pose x.sub.pR may be
considered as being independent of the speed x.sub.vR. In this
case, the same method as described above is used except that the
variable x will contain the six coordinates of the pose x.sub.p as
well as the six coordinates of the speed x.sub.v. Conversely, it is
possible for the number of degrees of freedom of the video camera 4
or 80 to be less than 6. This is the case if the video camera can
move only in a horizontal plane or cannot turn on itself, for
example. This limitation of the number of degrees of freedom in
movement is then taken into account by reducing the number of
unknown coordinates necessary for determining the pose and the
speed of the video camera. Similarly, in another variant, if it is
necessary to estimate the acceleration x.sub.a of the video camera
4 at the moment at which it captures the image, six additional
coordinates may be added to the variable x each corresponding to
one of the coordinates of the acceleration x.sub.a. The
acceleration x.sub.a corresponds to the linear acceleration along
the axes X, Y and Z and the angular acceleration about those same
axes.
[0194] The various differences E.sub.1, E.sub.2 and E.sub.3
described above may be used in combination or alternately. For
example, the speed x.sub.v may be determined using only the
difference E.sub.2 between the depths. In this case it is not
necessary for the video cameras 4 and 22 to measure and to store
intensities for each pixel. Similarly, the method from FIG. 7 may
be adapted to the situation in which the physical quantity measured
by the video camera 4 is the depth and not the luminous intensity.
If the video camera 80 is used, it not necessary for the reference
image to include a depth associated with each pixel. In fact, the
vertex v is then known and the method from FIG. 7 may be used
without having to use the vertices v*, for example.
[0195] Other functions are possible for estimating the opposite
movement of the video camera 4 or 80 while it is capturing the
current image. For example, instead of using the transformation
T.sub.2(.tau.x.sub.vR), the transformation
T.sub.2.sup.-1(.tau.x.sub.vR) may also be used.
[0196] Nor is it necessary to use all of the pixels of the
reference images and the current image that match. Alternatively,
to reduce the number of calculations necessary to estimate the
speed x.sub.v, only 10% or 50% or 70% or 90% of the pixels of one
of the images having corresponding pixels in the other image are
taken into account when minimizing the differences E.sub.1, E.sub.2
or E.sub.3.
[0197] If the motion blur in the images captured by the video
camera 4 is negligible, then the function I*.sub.b( . . . ) may be
taken as equal to the function I*( . . . ). This therefore amounts
to setting the time t.sub.e at zero in the equations described
above.
[0198] Conversely, if the RS distortion is negligible in the images
captured by the video camera 4 or merely if that video camera 4 is
a global shuttering video camera, the function I.sub.w(p*) is taken
as equal to the function I(w.sub.1(T.sub.i,v*)). This therefore
simply amounts to taking the value of the time t.sub..DELTA. and/or
the time t.sub..DELTA.D as equal to zero in the previous
embodiments.
[0199] The times t.sub..DELTA., t.sub..DELTA.D, t.sub.e and
t.sub.eD may be measured during the learning phase or estimated
during the first iterations in the utilization phase 60.
[0200] The estimate of the speed x.sub.vR may be used for image
processing operations other than those described above.
[0201] The speed x.sub.vR and the pose x.sub.pR are not necessarily
estimated in real time. For example, they may be estimated when the
capture of the images by the video camera 4 or 80 has finished.
* * * * *