U.S. patent application number 11/249972 was filed with the patent office on 2006-04-20 for method for motion estimation.
Invention is credited to Muhammad Siddiqui, Klaus Zimmermann.
Application Number | 20060083407 11/249972 |
Document ID | / |
Family ID | 34926999 |
Filed Date | 2006-04-20 |
United States Patent
Application |
20060083407 |
Kind Code |
A1 |
Zimmermann; Klaus ; et
al. |
April 20, 2006 |
Method for motion estimation
Abstract
A method for motion estimation of sequences of images is
proposed, wherein for consecutive frames (f1, f2) respective
corresponding consecutive Fourier transformed frames (F1, F2) are
determined, and wherein motion parameters for translation, rotation
and/or for scaling are derived based on a phase relationship
between said respective corresponding consecutive Fourier
transformed frames (F1, F2), and in particular based on
translational, rotational and/or scaling properties of the Fourier
transforming process.
Inventors: |
Zimmermann; Klaus;
(Deizisau, DE) ; Siddiqui; Muhammad; (Stuttgart,
DE) |
Correspondence
Address: |
FROMMER LAWRENCE & HAUG LLP
745 FIFTH AVENUE
NEW YORK
NY
10151
US
|
Family ID: |
34926999 |
Appl. No.: |
11/249972 |
Filed: |
October 13, 2005 |
Current U.S.
Class: |
382/107 |
Current CPC
Class: |
H04N 19/527 20141101;
H04N 19/547 20141101; G06T 7/262 20170101; H04N 19/523
20141101 |
Class at
Publication: |
382/107 |
International
Class: |
G06K 9/00 20060101
G06K009/00 |
Foreign Application Data
Date |
Code |
Application Number |
Oct 15, 2004 |
EP |
04024615.9 |
Claims
1. Method for motion estimation of sequences of images, wherein for
consecutive frames (f1, f2) of a sequence of images/frames
respective corresponding consecutive Fourier transformed frames
(F1, F2) are determined, and wherein motion parameters for--in
particular global and/or local--translation, rotation and/or
scaling are derived or derivable based on a phase relationship
between said respective corresponding consecutive Fourier
transformed frames (F1, F2).
2. Method according to claim 1, wherein said motion parameters are
derived or derivable essentially based on translational, rotational
and/or scaling properties of the Fourier transformation.
3. Method according to claim 1, wherein in the presence of a
pure--in particular global and/or local--spatial translation
between two frames (f1, f2), in particular according to
f2(x,y).ident.f1(x-x0,y-y0) (1.1) with f1, f2 denoting the frames,
with x, y denoting the spatial coordinates or pixel coordinates
within the frames (f1, f2), and with x0, y0 describing spatial
translational parameters along the x direction and the y direction,
respectively, a process (I) of estimating the translational
parameters (x0, y0) is performed, wherein said process (I) of
estimating said translational parameters (x0, y0) comprises: a step
of determining values (Z1, Z2) of a cross power spectrum
(Z({circumflex over (,)})) for respective corresponding consecutive
Fourier transformed frames (F1, F2) with respect to two values (v1,
u1; v2, u2) for the respective frequency variables (u, v), in
particular according to the following formula (A) Z1:=Z(u1,v1) and
Z2:=Z(u2,v1), (A) with Z({umlaut over (,)}) denoting the cross
power spectrum function, with Z1, Z2 denoting the values of the
cross power spectrum (Z({umlaut over (,)})), and with v1, u1 and
v2, u2 denoting first and second values for the respective
frequency variables (u, v), a step of determining phase values
(.phi.1, .phi.2) for the respective cross power spectrum values
(Z1, Z2), in particular according to the following formula (B)
.phi.1:=arg(Z(u1,v1)/2.pi.) and .phi.2:=arg (Z(u2,v2)/2.pi.). (B)
with .phi.1, .phi.2 denoting the respective phase values and with
arg() denoting the argument function providing the phase value of
any complex number, and a step of determining translational
parameters (x0, y0), in particular according to the following
formulas (C1) and (C2): y0 = u1 .phi. .times. .times. 2 - u2 .phi.
.times. .times. 1 u1 v2 - u2 v1 ( C1 ) and x0 = .phi. .times.
.times. 2 u2 - v2 u2 u1 .phi. .times. .times. 2 - u2 .phi. .times.
.times. 1 u1 v2 - u2 v1 . ( C2 ) ##EQU13##
4. Method according to claim 3, wherein each value (Z) of the cross
power spectrum (Z({umlaut over (,)})) of two functions (F1, F2) for
two frequency coordinates (u, v) is defined by a process (II)
according to the following formula (D): Z .function. ( u , v ) :=
F1 .function. ( u , v ) F2 * .function. ( u , v ) F1 .function. ( u
, v ) F2 * .function. ( u , v ) , ( D ) ##EQU14## with F1, F2
denoting the two respective functions, with u, v denoting the two
respective two frequency coordinates, with || denoting the process
of taking the absolute value, and with * denoting the process of
taking the conjugate complex value.
5. Method according to claim 1, wherein in the presence of--in
particular global and/or local--translation and scaling and in the
absence of--in particular global and/or local--rotation, in
particular according to the following relation (2.1.1) between
consecutive frames (f1, f2) f2(x,y).ident.f1(ax-x0,by-y0), (2.1.1)
with f1, f2 denoting the frames, with x, y denoting the spatial
coordinates or pixel coordinates within the frames (f1, f2), with
x0, y0 describing spatial translational parameters along the x
direction and along the y direction, respectively, and with a, b
describing the spatial scaling parameters or scaling factors along
the x direction and along the y direction, respectively, a process
(III) of estimating said scaling parameters (a, b) is performed and
wherein said process (III) of estimating said scaling parameters
(a, b) comprises: a step of determining values for--in particular
global--pseudo translational parameters (c, d) by applying said
process (I) to respective magnitude functions (M1, M2) for the
respective corresponding consecutive Fourier transformed frames
(F1, F2), in particular based on logarithmic frequency variables
(u,{circumflex over (v)}), in particular according to the following
formula (E) u=log(u) and {circumflex over (v)}=log(v), (E) instead
of applying said process (I) to said respective corresponding
consecutive Fourier transformed frames (F1, F2) directly and a step
of determining said global scaling parameters (a, b) by applying an
exponentiation process to said pseudo translational parameters (c,
d), respectively, in particular according to the following formula
(F): a=e.sup.c and b=e.sup.d, (F) with c, d denoting said pseudo
translational parameters, and in particular a step determining
spatial translation parameters (x0, y0) by applying phase
correlation on respective scaling compensated frames or images.
6. Method according to claim 1, wherein in the presence of--in
particular global--translation and rotation and in the absence
of--in particular global--scaling between consecutive frames (f1,
f2), in particular according to the following relation (2.2.1):
f2(x,y).ident.f1(xcos(.theta.0)+ysin(.theta.0)-x0,-xsin(.theta.0)+ycos(.t-
heta.0)-y0) (2.2.1) with f1, f2 denoting the frames, with x, y
denoting the spatial coordinates or pixel coordinates within the
frames (f1, f2), with x0, y0 denoting spatial the translational
parameters along the x direction and along the y direction,
respectively, and with .theta.0 describing the rotational angle
between the consecutive frames (f1, f2), a process (IV) of
estimating rotational parameters (.theta.0) is performed, and
wherein said process (IV) of estimating rotational parameters
(.theta.0) comprises: a step of determining values for--in
particular global--pseudo translational parameters ({circumflex
over (.theta.)}0) by applying said first process (I) to magnitude
functions (M1, M2) for respective corresponding consecutive Fourier
transformed frames (F1, F2), in particular based on polar frequency
coordinates (.rho., .theta.), in particular according to the
following formula (G): u=.rho.cos(.theta.) and v=.rho.sin(.theta.),
(G) with u, v denoting the frequency coordinates and with .rho.,
.theta. denoting the polar coordinates, and a step of determining
said pseudo translational parameter ({circumflex over (.theta.)}0)
as said rotational parameter (.theta.0), and in particular a step
determining spatial translation parameters (x0, y0) by applying
phase correlation on respective rotation compensated frames or
images.
7. Method according to claim 1, wherein in the presence of--in
particular global--translation, rotation and scaling between
consecutive frames (f1, f2), in particular according to the
following relation (2.3.1):
f2(x,y).ident.f1(axcos(.theta.0)+aysin(.theta.0)-x0,-axsin(.theta.0)+ayco-
s(.theta.0)-y0) (2.3.1) with f1, f2 denoting the frames, with x, y
denoting the spatial coordinates or pixel coordinates within the
frames (f1, f2), with x0, y0 denoting the spatial translational
parameters along the x direction and along the y direction,
respectively, with a describing the spatial scaling parameter or
scaling factor along the x direction and along the y direction, and
with .theta.0 describing the rotational angle or parameters between
the consecutive frames (f1, f2), a process (V) of estimating
rotational parameters (.theta.0) is performed, and wherein said
process (V) of estimating rotational parameters (.theta.0)
comprises: a step of determining values for--in particular
global--first and second pseudo translational parameters
({circumflex over (.theta.)}0, n) by applying said first process
(I) to magnitude functions (M1, M2) for respective corresponding
consecutive Fourier transformed frames (F1, F2), in particular
based on logarithmic polar frequency coordinates (m, n), in
particular according to the following formulas (H1) and (H2):
u=.rho.cos(.theta.), v=.rho.sin(.theta.), (H1) and m=log(.rho.),
n=log(a), (H2) with u, v denoting the frequency coordinates, with
.rho., .theta. denoting the polar coordinates, and with m, n
denoting logarithmic polar frequency coordinates, a step of
determining said first pseudo translational parameter ({circumflex
over (.theta.)}0) as said rotational parameter (.theta.0), a step
of determining from said second pseudo translational parameter (n)
said scaling parameter (a), in particular according to the
following formula (J): a=e.sup.n, and (J) in particular a step of
determining spatial translational parameters (x0,y0) by applying
phase correlation on respective scaling compensated and/or rotation
compensated frames or images.
8. Method according to claim 1, wherein high pass filtering process
is involved in order to boost high frequency details of logarithmic
magnitude Fourier spectra, in particular based on a transfer
function which is given according to the following formula (3.1.1):
H(u,v)=[1-cos(.pi.u)cos(.pi.v)][2-cos(.pi.u)cos (.pi.v)], (3.1.1)
wherein -0,5.ltoreq.u,v.ltoreq.+0,5.
9. Method according to claim 1, wherein before applying a Fourier
transform a process of windowing with a raised cosine window
function is applied, in particular with respect to the frames (f1,
f2).
10. Method according to claim 1, wherein the frames (f1, f2) or
images/pictures are processed based on blocks thereof.
11. Method according claim 10, wherein the size of the blocks is
set according to the motion present in the signal, frames or
images, e. g. to 64.times.64 pixels.
12. Method according to claim 10, wherein a process of zero padding
is applied, in particular on all sides and further in particular in
order to make the dimensions of input frames or pictures/images an
integer multiple of the block size, e. g. of 64.
13. Method according to claim 1, wherein a bicubic interpolation is
involved in order to estimate sub-pixel values from an original
frame or image/picture.
14. Method according to claim 1, wherein for a Cartesian to
logarithmic polar coordinate transformation only valid values for
the radius coordinate (.rho.) are chosen with
0.ltoreq.0.ltoreq.2.pi. for the angular coordinate.
15. Method according to claim 1, wherein an input frame or
picture/image or the blocks thereof are converted into blocks of a
given block size, e. g. of 64.times.64 pixel blocks in the
log-polar domain or log(.rho.),.theta.-domain.
16. Method according to claim 1, wherein local translations,
rotations, and/or scalings with respect to two consecutive frames
or pictures/images are handled in the same manner as global
translations, rotations, and/or scalings.
17. Method according to claim 1, wherein local translations,
rotations, and/or scalings with respect to two consecutive frames
or pictures/images are handled after an estimation and compensation
of global translations, rotations, and/or scalings.
18. System or apparatus for motion estimation, which is adapted
and/or arranged and which comprises means for carrying out a method
for motion estimation according to claim 1 and the steps
thereof.
19. Computer program product comprising computer program means
which is adapted and/or arranged in order to perform a method for
motion estimation according to claim 1 and the steps thereof.
20. Computer-readable storage medium comprising a computer program
product according to claim 19.
Description
[0001] The present invention relates to a method for motion
estimation of sequences of images. More particular, the present
invention relates to methods for global and local motion estimation
for translation, rotation, and/or scaling, in particular in video
scenes or video sequences.
[0002] For certain technical applications in the field of analyzing
visual data and in particular in the field of analyzing video
scenes or sequences of images or pictures, processes of motion
estimation are involved in order to derive parameters for certain
types of motion, e. g. translation, rotation and/or scaling in a
global and/or in a local manner.
[0003] However, conventional processes of motion estimation in the
area of video signal processing or image or picture processing are
not capable of distinguishing between global and local motion
and/or between the different types of motion as being translation,
rotation and/or scaling.
[0004] It is therefore an object underlying the present invention
to provide a method for motion estimation of sequences of images
which can estimate motion parameters describing motion aspects
between consecutive frames in a comparable easy, reliable and fast
approach.
[0005] The object is achieved by a method of motion estimation of
sequences of images according to independent claim 1. The object is
further achieved by a system or an apparatus for motion estimation,
by a computer program product, as well as by a computer-readable
storage medium according to independent claims 18, 19, and 20,
respectively.
[0006] According to the present invention a method for motion
estimation of sequences of frames, pictures or images is proposed
wherein for consecutive frames f1, f2 of a sequence of pictures,
images and/or frames respective corresponding consecutive Fourier
transformed frames F1, F2 are determined and wherein motion
parameters for--in particular global and/or local--translation,
rotation and/or scaling are derived or derivable based on a phase
relationship between said respective corresponding consecutive
Fourier transformed frames F1, F2.
[0007] It is therefore a basic aspect of the present invention to
base a method for motion estimation of sequences of frames,
pictures or images with respect to translation, rotation and/or
scaling on a phase relationship between consecutive Fourier
transformed frames F1, F2 with respect to respective corresponding
frames f1, f2.
[0008] Preferably, said motion parameters are derived or derivable
essentially based on translational, rotational and/or scaling
properties of the Fourier transformation.
[0009] In the presence of a pure--in particular global and/or
local--spatial translation between two frames--in particular
according to f2(x,y).ident.f1(x-x0,y-y0) (1.1) with f1, f2 denoting
the frames, with x, y denoting the spatial coordinates or pixel
coordinates within the frames f1, f2, and with x0, y0 describing
spatial translational parameters along the x direction and the y
direction, respectively, a process (I) of estimating the
translational parameters x0, y0 may be performed, wherein said
process (I) of estimating the translational parameters x0, y0 may
comprise: [0010] a step of determining values Z1, Z2 of a cross
power spectrum Z({umlaut over (,)}) for respective corresponding
consecutive Fourier transformed frames F1, F2 with respect to two
values v1, u1; v2, u2 for the respective frequency variables u, v,
in particular according to the following formula (A) Z1:=Z(u1,v1)
and Z2:=Z(u2,v1), (A) [0011] with Z({umlaut over (,)}) denoting the
cross power spectrum function, with Z1, Z2 denoting the values of
the cross power spectrum Z({umlaut over (,)}), and with v1, u1 and
v2, u2 denoting first and second values for the respective
frequency variables u, v, [0012] a step of determining phase values
.phi.1, .phi.2 for the respective cross power spectrum values Z1,
Z2, in particular according to the following formula (B)
.phi.1:=arg(Z(u1,v1)/2.pi.) and .phi.2:=arg(Z(u2,v2)/2.pi.), (B)
[0013] with .phi.1, .phi.2 denoting the respective phase values and
with arg() denoting the argument function providing the phase value
of any complex number, and [0014] a step of determining
translational parameters x0, y0, in particular according to the
following formulas (C1) and (C2): y .times. .times. 0 = u .times.
.times. 1 .phi. .times. .times. 2 - u .times. .times. 2 .phi.
.times. .times. 1 u .times. .times. 1 v .times. .times. 2 - u
.times. .times. 2 v .times. .times. 1 .times. .times. and ( C
.times. .times. 1 ) x .times. .times. 0 = .phi. .times. .times. 2 u
.times. .times. 2 - v .times. .times. 2 u .times. .times. 2 u
.times. .times. 1 .phi.2 - u .times. .times. 2 .phi.1 u .times.
.times. 1 v .times. .times. 2 - u .times. .times. 2 v .times.
.times. 1 . ( C2 ) ##EQU1## In this case each value Z of the cross
power spectrum Z({umlaut over (,)}) of two functions F1 and F2 for
two frequency coordinates u, v may be defined by a process (II)
according to the following formula (D): Z .function. ( u , v ) := F
.times. .times. 1 .times. ( u , v ) F .times. .times. 2 * .times. (
u , v ) F .times. .times. 1 .times. ( u , v ) F .times. .times. 2 *
.times. ( u , v ) , ( D ) ##EQU2## with F1, F2 denoting the two
respective functions, with u, v denoting the two respective two
frequency coordinates, with || denoting the process of taking the
absolute value, and with * denoting the process of taking the
conjugate complex value.
[0015] Alternatively or additionally, the parameters x0 and z0 may
be determined from the following circumstances:
[0016] The cross power spectrum Z({umlaut over (,)}) of F1 and F2
gives e.g. Z(u,v)=exp(j2.pi.(ux0+vy0)). If one generates the
inverse Fourier transform z({umlaut over (,)}) of this expression
for Z({umlaut over (,)}) one obtains z(x,y)=dirac(x-x0,y-y0), with
the definition of the Dirac function: dirac .function. ( a , b ) :=
{ 1 .times. if .times. .times. ( a = 0 ) ( b = 0 ) 0 .times.
otherwise . ##EQU3## In this way, one can derive the values for x0
and y0.
[0017] In the presence of--in particular global and/or
local--translation and scaling and in the absence of--in particular
global--rotation, in particular according to the following relation
(2.1.1) between consecutive frames f1, f2
f2(x,y).ident.f1(ax-x0,by-y0), (2.1.1) with f1, f2 denoting the
frames, with x, y denoting the spatial coordinates or pixel
coordinates within the frames f1, f2, with x0, y0 describing
spatial translational parameters along the x direction and along
the y direction, respectively, and with a, b describing the spatial
scaling parameters or scaling factors along the x direction and
along the y direction, respectively, a process (III) of estimating
said scaling parameters a, b may be performed, wherein said process
(III) of estimating said scaling parameters a, b may comprise:
[0018] a step of determining values for--in particular
global--pseudo translational parameters c, d by applying said
process (I) to respective magnitude functions M1, M2 for the
respective corresponding consecutive Fourier transformed frames F1,
F2, in particular based on logarithmic frequency variables u,
{circumflex over (v)}, in particular according to the following
formula (E) u=log(u) and {circumflex over (v)}=log(v), (E) [0019]
instead of applying said process (I) to said respective
corresponding consecutive Fourier transformed frames F1, F2
directly and [0020] a step of determining said global scaling
parameters a, b by applying an exponentiation process to said
pseudo translational parameters c, d, respectively, in particular
according to the following formula (F): a=e.sup.c and b=e.sup.d,
(F) [0021] with c, d denoting said pseudo translational parameters,
and [0022] in particular a step determining spatial translation
parameters by applying phase correlation on respective scaling
compensated frames or images.
[0023] In the presence of--in particular global--translation and
rotation and in the absence of--in particular global--scaling
between consecutive frames f1, f2, in particular according to the
following relation (2.2.1):
f2(x,y).ident.f1(xcos(.theta.0)+ysin(.theta.0)-x0,-xsin(.theta.0)+ycos(.-
theta.0)-y0) (2.2.1) with f1, f2 denoting the frames, with x, y
denoting the spatial coordinates or pixel coordinates within the
frames f1, f2, with x0, y0 denoting spatial the translational
parameters along the x direction and along the y direction,
respectively, and with .theta.0 describing the rotational angle
between the consecutive frames f1, f2, a process (IV) of estimating
rotational parameters .theta.0 may be performed, wherein said
process (IV) of estimating rotational parameters .theta.0 may
comprise: [0024] a step of determining values for--in particular
global--pseudo translational parameters {circumflex over
(.theta.)}0 by applying said first process (I) to magnitude
functions M1, M2 for respective corresponding consecutive Fourier
transformed frames F1, F2, in particular based on polar frequency
coordinates .rho., .theta., in particular according to the
following formula (G): u=.rho.cos(.theta.) and v=.rho.sin(.theta.).
(G) [0025] with u, v denoting the frequency coordinates and with
.rho., .theta. denoting the polar coordinates, and [0026] a step of
determining said pseudo translational parameter {circumflex over
(.theta.)}0 as said rotational parameter .theta.0, and [0027] in
particular a step determining spatial translation parameters x0, y0
by applying phase correlation on respective rotation compensated
frames or images.
[0028] In the presence of--in particular global--translation,
rotation and scaling between consecutive frames f1, f2, in
particular according to the following relation (2.3.1):
f2(x,y).ident.f1(axcos(.theta.0)+aysin(.theta.0)-x0,-axsin(.theta.0)+ayco-
s(.theta.0)-y0). (2.3.1) with f1, f2 denoting the frames, with x, y
denoting the spatial coordinates or pixel coordinates within the
frames f1, f2, with x0, y0 denoting the spatial translational
parameters along the x direction and along the y direction,
respectively, with a describing the spatial scaling parameter or
scaling factor along the x direction and along the y direction, and
with .theta.0 describing the rotational angle or parameters between
the consecutive frames f1, f2, a process (V) of estimating
rotational parameters .theta.0 may be performed, wherein said
process (V) of estimating rotational parameters .theta.0 may
comprise: [0029] a step of determining values for--in particular
global--first and second pseudo translational parameters
{circumflex over (.theta.)}0, n by applying said first process (I)
to magnitude functions M1, M2 for respective corresponding
consecutive Fourier transformed frames F1, F2, in particular based
on logarithmic polar frequency coordinates m, n, in particular
according to the following formulas (H1) and (H2):
u=.rho.cos(.theta.), v=.rho.sin(.theta.), (H1) and m=log(.rho.),
n=log(a), (H2) [0030] with u, v denoting the frequency coordinates,
with .rho., .theta. denoting the polar coordinates, and with m, n
denoting logarithmic polar frequency coordinates, [0031] a step of
determining said first pseudo translational parameter {circumflex
over (.theta.)}0 as said rotational parameter .theta.0, [0032] a
step of determining from said second pseudo translational parameter
n said scaling parameter a, in particular according to the
following formula (J): a=e.sup.n, and (J) [0033] in particular a
step of determining spatial translational parameters (x0,y0) by
applying phase correlation on respective scaling compensated and/or
rotation compensated frames or images. [0034] Preferably, high pass
filtering process may be involved in order to boost high frequency
details of logarithmic magnitude Fourier spectra, in particular
based on a transfer function which is given according to the
following formula (3.1.1):
H(u,v)=[1-cos(.pi.u)cos(.rho.v)][2-cos(.pi.u)cos(.pi.v)], (3.1.1)
wherein -0,5.ltoreq.u,v.ltoreq.+0.5.
[0035] It is of particular advantage to alternatively or
additionally perform before applying a Fourier transform a process
of windowing with a raised cosine window function is applied, in
particular with respect to the frames f1, f2.
[0036] The frames f1, f2 or images/pictures may be processed based
on blocks thereof, i.e. in a block wise manner.
[0037] The size of the blocks may be set according to the motion
present in the signal, frames or images, e. g. to 64.times.64
pixels.
[0038] Additionally or alternatively, a process of zero padding may
be applied, in particular on all sides and further in particular in
order to make the dimensions of input frames or pictures/images an
integer multiple of the block size, e. g. of 64.
[0039] Further additionally or alternatively, a bicubic
interpolation is involved in order to estimate sub-pixel values
from an original frame or image/picture.
[0040] For a Cartesian to logarithmic polar coordinate
transformation only valid values for the radius coordinate .rho.
may be chosen with 0.ltoreq..theta..ltoreq.2.pi. for the angular
coordinate.
[0041] An input frame or picture/image or the blocks thereof are
converted into blocks of a given block size, e. g. of 64.times.64
pixel blocks, in the log-polar domain or
log(.rho.),.theta.-domain.
[0042] Local translations, rotations, and/or scalings with respect
to two consecutive frames or pictures/images may be handled in the
same manner as global translations, rotations, and/or scalings.
[0043] Local translations, rotations, and/or scalings with respect
to two consecutive frames or pictures/images may be handled after
an estimation and compensation of global translations, rotations,
and/or scalings.
[0044] It is a further aspect of the present invention to provide a
system or apparatus for motion estimation which are adapted and/or
arranged and which comprise means for carrying out the inventive
method for motion estimation and the steps thereof.
[0045] Further, a computer program product is provided comprising
computer program means which is adapted and/or arranged in order to
perform the inventive method for motion estimation and the steps
thereof.
[0046] Additionally, a computer readable storage medium is provided
comprising the inventive computer program product.
[0047] These and further aspects of the present invention will be
further discussed in the following:
[0048] Introduction:
[0049] The present invention relates in particular to global motion
estimation for translation, rotation, and/or scaling in video
scenes and further in particular to global and local motion
estimation for translation, rotation, and/or scaling in Video
Scenes. [0050] On the one hand, conventional motion estimators in
the area of video signal processing are only capable of tracing
translatory or translational motion. The presented approach however
can estimate the motion in case of translation, rotation, and/or
scaling. The approach can be used to trace the global motion in
video scenes of a given video sequence. [0051] The presented motion
estimation technique is FFT-based resulting in far superior
estimation results in case of translatory motion when compared with
traditional motion estimators. [0052] The presented approach
however can inter alia estimate the motion in case of translation,
rotation, and/or scaling. The approach can be used to trace the
global and local motion in video scenes of a given video
sequence.
[0053] Prior art:
[0054] Phase Correlation based Motion Estimation technique was
proposed in "Television motion measurement for DATV and other
applications", G. A. Thomas, BBC Research Department Report, 1987.
This FFT based technique uses the property of Fourier transform
which states that the translational movement in the spatial domain
corresponds to the phase shift in the frequency domain, therefore
the Phase Correlation method that is a block based motion
estimation method is used to estimate the translational motion
between the two fields directly from their phases.
[0055] The phase correlation based motion estimation technique
consists of two steps:
[0056] In the first step, the input picture is divided into blocks
of fairly large size (for instance 64 by 64 pixels). Then the
Two-Dimensional Discrete Fourier Transform (2D-DFT) of each block
in the input picture frames are taken. The 2D-DFT of the current
frame is then multiplied with the conjugate complex of the 2D-DFT
of the previous frame. The magnitude is normalized by dividing this
product by the absolute value of the product of the two 2D-DFT. The
Correlation Surface for each block is then obtained by taking the
inverse 2D Fourier transform of the above product. The correlation
surface consists of only the real values. The positions of the
dominant peaks in the correlation surface that indicates the motion
of the objects are then located. Several trial or candidate motion
vectors for each block are obtained in this way. This step is
referred to as the phase correlation stage.
[0057] In the second step, each block of the current picture is
shifted by the amount of each trial vector and it is compared with
the block of the previous picture in terms of its luminance and
modulus of the luminance difference is calculated in this way. An
error surface is obtained for each block for every candidate vector
that will show that how well the vector matches with different area
of the picture.
[0058] The vector that gives the smallest error for each area of
the picture is finally assigned to that area of the picture. This
step is referred to as the image correlation or vector assignment
stage.
[0059] The motion estimation algorithms utilized within Sony are
based on block-matching. This technique is widely used in the video
processing domain. It has reached a mature state over the years
However, the approach has fundamental limitations that can be
overcome by the proposed invention. The fundamental difference
between block-based motion estimation and phase correlation-based
motion estimation is that the first technique is processing a given
signal in the spatial domain while the second technique is
processing the signal in the frequency domain.
[0060] Furthermore, block-based motion estimators do not
distinguish between global and local motion. The presented approach
is used to detect the global motion in video scenes allowing the
accuracy of a subsequent local motion estimation to be
improved.
[0061] Problem:
[0062] Existing phase correlation motion estimation methods measure
translation motion only when no scaling or rotation is present in
the current frame. However, real picture sequence may contains
zooming and/or rotation of the picture. The existing algorithms
fails completely to estimate translation motion under these
conditions.
[0063] An extension to the existing method is being proposed to
estimate translation as well as scaling and rotation. Three
modifications to the existing Phase Correlation method are used
which will solve the following problems:
[0064] 1. estimation of translation and rotation in the absence of
scale change,
[0065] 2. estimation of translation and scaling in the absence of
rotation, and
[0066] 3. joint measurement of rotation, scaling and
translation.
[0067] Solution:
[0068] In the following processes for global (A) and for local (B)
motion estimation are discussed separately.
[0069] A. Global Motion Estimation
[0070] A.1. Introduction
[0071] The motion estimation technique proposed in this invention
uses a Fourier domain approach in order to estimate translation,
rotation and scaling by exploiting Fourier translation, rotation
and scaling properties.
[0072] Let f1(x,y) and f2(x,y) denote a previous and a current
frame in a given input picture sequence and let x and y represent
respective spatial x and y coordinates, respectively. Therefore, x
and y represent pixel coordinates or positions within the given
frames f1 and f2.
[0073] Further, let F1(u,v) and F2(u,v) be the corresponding
Fourier transforms of f1 and f2 and let u and v represent the
frequency domain coordinates.
[0074] Now assume a spatial displacement or shift about x0 in the x
direction and y0 about the y direction. If an input frame is
shifted by such an amount of x0, y0 with respect to the previous
frame according to f2(x,y).ident.f1(x-x0,y-y0) (1.1) according to
the shift or displacement property of Fourier transforms the
corresponding Fourier transforms F1 and F2 of the previous frame f1
and the current frame f2 fulfill
F2(u,v)=exp[-j2.pi.(ux0+vy0)]F1(u,v). (1.2)
[0075] The cross power spectrum Z(u,v) of two frames f1 and f2 is
given as Z .function. ( u , v ) = F .times. .times. 1 .times. ( u ,
v ) .times. F .times. .times. 2 .times. ( u , v ) * F .times.
.times. 1 .times. ( u , v ) F .times. .times. 2 * .times. ( u , v )
* = exp .function. [ - j 2 .times. .pi. ( u x .times. .times. 0 + v
y .times. .times. 0 ) ] , ( 1.2 ) ##EQU4## wherein * represents the
conjugate complex of the 2D-FFT. The phase of said cross power
spectrum Z(u,v) gives the phase difference between the two frames
F1, F2. The inverse Fourier transform of said cross power spectrum
Z(u,v) leads to an impulse, as the inverse Fourier transform of a
complex valued exponential function leads to a Dirac function, i.
e. to an impulse. The coordinates of the respective peak or impulse
yields the estimated shift. This process is called phase
correlation.
[0076] A.2. Modification of phase correlation motion estimation
technique
[0077] The proposed modification will cover the following three
cases
[0078] Case 1: Motion estimation in the presence of translation and
scaling.
[0079] Case 2: Motion estimation in the presence of translation and
rotation.
[0080] Case 3: Motion estimation in the presence of translation,
rotation and scaling.
[0081] A.2.1 Motion estimation with scaling
[0082] In the absence of rotation and scaling, translation motion
can be estimated using the existing phase correlation method.
[0083] However, if f2(x,y) is scaled and translated with respect to
f1(x, y) then it is given as f2(x,y).ident.f1(ax-x0,by-y0) (2.1.1)
where a and b are the scaling factors for the horizontal or x and
vertical or y directions. Using the scaling and the shift property
for Fourier transforms, F1(u, v) and F2 (u, v) are related
according to F .times. .times. 2 .times. ( u , v ) = 1 a b exp
.function. [ - j 2 .times. .pi. ( u a x .times. .times. 0 + v b y
.times. .times. 0 ) ] F .times. .times. 1 .times. ( u a , v b ) . (
2.1 .times. .2 ) ##EQU5## Let M1(u v) and M2(u,v) be the magnitudes
of F1(u,v) and F2(u,v), respectively, then the relation M .times.
.times. 2 .times. ( u , v ) = 1 a b M .times. .times. 1 .times. ( u
a , v b ) . ( 2.1 .times. .3 ) ##EQU6## holds.
[0084] Now, the frequency coordinates u and v are changed to a
logarithmic scale, i. e. the following substitutions are performed:
u.fwdarw.log(u) and v.fwdarw.log(v). (2.1.4') If the u axis and the
v axis are converted to a logarithmic scale according to expression
(2.1.4') then scaling will be converted to a translational motion,
i.e. M .times. .times. 2 .times. ( log .function. ( u ) , log
.function. ( v ) ) = 1 a b M .times. .times. 1 .times. ( log
.function. ( u a ) , log .function. ( v b ) ) = 1 a b M .times.
.times. 1 .times. ( log .function. ( u ) - log .function. ( a ) ,
log .function. ( v ) - log .function. ( b ) ) , ( 2.1 .times. .4 )
##EQU7## i. e. the logarithmic frequency coordinates log(u) and
log(v) are shifted by the logarithmic frequency displacements
log(a) and log(b), respectively.
[0085] Let x=log(u) and y=log(v) and ignoring 1/ab changes
expression (2.1.4) to M2(x,y)=M1(x-c,y-d) (2.1.5) where c=log(a)
and d=log(b). Expression (2.1.5) has the same form as (1.1) and
therefore c and d can be found using expressions (1.2) and (1.3)
and taking the inverse Fourier Transform (see section 1). The value
of the scaling factors can be found by taking the inverse logarithm
of c and d: a=e.sup.c and b=e.sup.d, (2.1.6a, 2.1.6b) where e is
the base of the natural logarithm.
[0086] A.2.2 Motion estimation with rotation
[0087] If translation and rotation of f2(x,y) with respect to
f1(x,y) are considered the relation
f2(x,y).ident.f1(xcos(.theta.0)+ysin(.theta.0)-x0,-xsin(.theta.0)+ycos(.t-
heta.0)-y0) (2.2.1) holds, where .theta.0 is the angle of rotation.
Using the shift and rotation properties of Fourier transforms F1(u
v) and F2(u,v) are related according to
F2(u,v)=exp[-j2.pi.(ux0+vy0)]F1(ucos(.theta.0)+vsin(.theta.0),-usin(.thet-
a.0)+vcos(.theta.0)). (2.2.2)
[0088] Let M1(u,v) and M2(u,v) be the magnitudes of F1(u,v) and
F2(u,v), then one has the relation:
M2(u,v)=M1(ucos(.theta.0)+vsin(.theta.0),-usin(.theta.0)+vcos(.theta.0)).
(2.2.3)
[0089] The magnitudes M1(u,v) and M2(u,v) are rotated replicas of
each other and the rotational motion in the frequency domain can be
estimated by using a coordinate conversion from Cartesian to polar
coordinates according to u=.rho.cos(.theta.) and
v=.rho.sin(.theta.) (2.2.4') which results in
M2(.rho.,.theta.)=M1(.rho.,.theta.-.theta.0). (2.2.4)
[0090] The rotation angle .theta.0 now is a shift in the frequency
domain and can be determined by using phase correlation according
to section A.1.
[0091] A.2.3 Motion estimation with scaling and rotation
[0092] If translation, rotation and scaling of f2(x,y) with respect
to f1(x,y) are simultaneously considered the relation
f2(x,y).ident.f1(axcos(.theta.0)+aysin
(.theta.0)-x0,-axsin(.theta.0)+aycos(.theta.0)-y0) (2.3.1) holds
where .theta.0 is the rotation and a is the scaling factor. By
using the shift, scaling and rotation properties of Fourier
transforms F1(u,v) and F2(u,v) are related according to: F .times.
.times. 2 .times. ( u , v ) = 1 a 2 exp .function. [ - j 2 .times.
.pi. ( u a x .times. .times. 0 + v a y .times. .times. 0 ) ]
.times. .times. F .times. .times. 1 .times. ( u a cos .function. (
.theta. .times. .times. 0 ) + v a sin .function. ( .theta. .times.
.times. 0 ) , - u a sin .function. ( .theta. .times. .times. 0 ) +
v a cos .function. ( .theta. .times. .times. 0 ) ) ( 2.3 .times. .2
) ##EQU8## Let M1(u,v) and M2(u,v) be the magnitudes of F1(u,v) and
F2(u,v). Then, M1(u,v) and M2(u,v) fulfill M .times. .times. 2
.times. ( u , v ) = 1 a 2 M .times. .times. 1 .times. ( u a cos
.function. ( .theta. .times. .times. 0 ) + v a sin .function. (
.theta. .times. .times. 0 ) , - u a sin .function. ( .theta.
.times. .times. 0 ) + v a cos .function. ( .theta. .times. .times.
0 ) ) ( 2.3 .times. .3 ) ##EQU9##
[0093] Performing a coordinate conversion from Cartesian to polar
coordinates according to .times. u = .rho. cos .function. ( .theta.
) .times. .times. and .times. .times. v = .rho. sin .function. (
.theta. ) .times. .times. yields ( 2.3 .times. .4 ' ) M .times.
.times. 2 .times. ( .rho. , .theta. ) = 1 a 2 .times. M .times.
.times. 1 .times. ( .rho. a , .theta. - .theta. .times. .times. 0 )
. ( 2.3 .times. .4 ) ##EQU10##
[0094] To convert scaling to translation the axes for .rho. are
changed to a logarithmic scale according to
.rho..fwdarw.log(.rho.). (2.3.5') It follows that M2 .function. (
log .function. ( .rho. ) , .theta. ) = 1 a 2 M1 .function. ( log
.function. ( .rho. a ) , .theta. - .theta. .times. .times. 0 )
.times. = 1 a 2 M1 .function. ( log .function. ( .rho. ) - log
.function. ( a ) , .theta. - .theta. .times. .times. 0 ) , ( 2.3
.times. .5 ) ##EQU11## is fulfilled.
[0095] Let m=log(.rho.) and n=log(a). From (2.3.5) it follows that
M2 .function. ( log .function. ( .rho. ) , .theta. ) = 1 a 2 M1
.function. ( m , .theta. ) = M1 .function. ( m - n , .theta. -
.theta. .times. .times. 0 ) , ( 2.3 .times. .6 ) ##EQU12## is
fulfilled. This can be evaluated according to section 1 in order to
obtain rotational angle .theta.0 in the spatial domain and the
scaling factor a by using phase correlation.
[0096] A.3. Simulation model
[0097] Before presenting the software model, we will discuss some
important implementation issues.
[0098] A.3.1 High pass filtering
[0099] A simple high pass filter is used to boost the high
frequency details of the log magnitude Fourier spectra. Its
transfer function is given as
H(u,v)=[1-cos(.pi.u)cos(.pi.v)][2-cos(.pi.u)cos (.pi.v], (3.1.1)
where -0,5.ltoreq.u,v.ltoreq.+0,5.
[0100] A.3.2 Raised cosine windowing
[0101] The input picture portion is windowed before taking FFT with
a raised cosine window that will cause the image to fade to zero at
the edges. This is done to avoid sharp luminance transition at the
edges of the block that results in noise due to the periodic nature
of the Fourier transform.
[0102] A.3.3 Other issues
[0103] A.3.3.1 Block size
[0104] The input picture is divided into the blocks of 64 by 64
pixels size. Zero padding is done equally on all side to make the
dimensions of input picture an integer multiple of 64.
[0105] A.3.3.2 Picture transformation
[0106] Bicubic interpolation is used to estimate the sub pixel
values from the original image. Affine transformation model is used
to translate, rotate and scale the input picture.
[0107] A.3.3.3 Cartesian to log-polar conversion
[0108] In Cartesian to log-polar conversion, only valid values of
.rho. are selected 0.ltoreq..theta..ltoreq.2.pi.. The input picture
block is converted into a 64 by 64 block in the log-polar domain or
log(.rho.),.theta.-domain.
[0109] A.4. Simulation results
[0110] The block diagram of the simulation model that we have
implemented is shown in FIG. 1. We have used different test
sequences to check the performance of the motion estimation
algorithm. FIG. 2 shows 64.times.64 current frame of the picture
sequence. FIGS. 3, 4, 5 and 6 show previous frame with different
known transformation parameters.
[0111] We have considered the highest peak in the correlation
surface as the valid motion vector. We have applied our algorithm
to estimate translation, rotation and scaling by first estimating
the rotation and scaling and then applying affine transformation to
the current frame and again performing the phase correlation to
estimate the translation motion (see FIG. 1). Results of the
estimated motion parameters are shown in table 1, 2, 3 and 4.
[0112] The slight difference in the actual and the measured value
is due to non-uniformity of the scale. We have successfully
measured translation (horizontal and vertical) of 21 pixels,
rotation of 90.degree. and scaling of 1.65 using this algorithm, e.
g. for the block size of 64.times.64. The limiting factor is the
loss of relevant information present in the frames if we further
alter the transformation parameters. In our implementation we have
used single peak approach (i.e. we have only taken first peak in
the correlation surface as the valid peak). This reduces the amount
of computation to estimate the motion vectors.
[0113] A.5. Conclusion
[0114] An extension to the phase correlation based motion
estimation technique was presented in this report. The new
technique is capable of jointly detecting translation, scaling and
rotation motion (see section A.2.3). The range of translation,
rotation and scaling that can be measured is also mentioned in this
report (see section A.4). To increase the range of motion
parameters that can be measured by our system, we have to consider
more peaks in the correlation surface and then find the parameters
that produces the highest peak.
[0115] Existing phase correlation motion estimation technique fails
completely if scaling or rotation or both are present. Algorithm to
estimate translation in the presence of scaling only (see section
A.2.1) can easily be implemented by performing logarithmic
transformation of both axis, however, it will be sensitive to
rotation. Similarly, technique described in section A.2.2 can also
be easily implemented by doing Cartesian to polar conversion, it
will estimate motion in the presence of rotation only but it will
also fail if scaling is present. Our algorithm provides a robust
method for estimation of all three motion parameters (i.e.
translation, rotation and scaling) either separately or
jointly.
[0116] B Local Motion Estimation
[0117] B.1. Introduction
[0118] The motion estimation technique proposed in this invention
report uses the Fourier domain approach to estimate translation,
rotation and scaling by exploiting Fourier translation, rotation
and scaling properties. The concept of phase correlation based
motion estimation technique and its modification to estimate global
translation, rotation and scaling motion is already defined as
described above with respect to the estimation of global motion. An
extension to the existing method is being proposed to estimate
global and local translation as well as scaling and rotation.
[0119] B.2. Modification of phase correlation motion estimation
technique
[0120] The proposed modification will cover the following three
cases
[0121] Case 1: Global and local motion estimation in the presence
of translation and scaling.
[0122] Case 2: Global and local motion estimation in the presence
of translation and rotation.
[0123] Case 3: Global and local motion estimation in the presence
of translation, rotation and scaling.
[0124] B.2.1 Global and local motion estimation with scaling
[0125] An algorithm for global motion estimation with scaling as
described above with respect to the estimation of global motion. In
order to estimate global and local motion we propose the following
method steps: [0126] Estimate the global translation and scaling
using the technique as described above with respect to the
estimation of global motion. [0127] Perform the compensation on the
current frame using the estimated global translation and scaling
values. [0128] Compute the absolute difference of the globally
compensated current frame and the previous frame. [0129] Threshold
the difference image to get binary image. [0130] Use image
segmentation on the difference image to find the position and size
of the local objects. [0131] Estimate the local translation and
scaling of the objects using the technique as described above with
respect to the estimation of global motion.
[0132] B.2.2 Global and local motion estimation with rotation
[0133] As described above with respect to the estimation of global
motion an algorithm for the global motion estimation in the
presence of rotation is presented. In order to estimate global and
local translation and rotation motion the following method steps
are proposed: [0134] Estimate the global translation and rotation
using the technique as described above with respect to the
estimation of global motion. [0135] Perform the compensation on the
current frame using the estimated global translation and rotation
values. [0136] Compute the absolute difference of the globally
compensated current frame and the previous frame. [0137] Threshold
the difference image to get binary image. [0138] Use image
segmentation on the difference image to find the position and size
of the local objects. [0139] Estimate the local translation and
rotation of the objects using the technique as described above with
respect to the estimation of global motion.
[0140] B.2.3 Global and local motion estimation with scaling and
rotation
[0141] Finally, we are considering the case of global and local
translation, rotation and scaling motion estimation. Details of the
method for global motion measurement can be found as described
above with respect to the estimation of global motion. Here we will
present an extended technique to measure local and global motion:
[0142] Estimate the global translation, rotation and scaling using
the technique as described above with respect to the estimation of
global motion. [0143] Perform the compensation on the current frame
using the estimated global translation, rotation and scaling
values. [0144] Compute the absolute difference of the globally
compensated current frame and the previous frame. [0145] Threshold
the difference image to get binary image. [0146] Use image
segmentation on the difference image to find the position and size
of the local objects. [0147] Estimate the local translation,
rotation and scaling of the local objects using the technique as
described above with respect to the estimation of global
motion.
[0148] B.3. Simulation model
[0149] The block diagram of the simulation model for the motion
estimation system described in section B.2 is shown in FIG. 11.
Simulation model used for modified phase correlation based motion
estimation system is explained as described above with respect to
the estimation of global motion.
[0150] The input picture is divided into blocks of 128 by 128
pixels. Zero padding is applied equally on all sides to make the
dimensions of the input picture an integer multiple of 128.
[0151] A bicubic interpolation is used to estimate the sub pixel
values from the original image. An affine transformation model is
used to translate, rotate and scale the input picture.
[0152] The delay block shifts the input picture by one frame/field
period. Any suitable threshold and segmentation technique can be
used to detect the position and size of the object block from the
binary image. For instance, we may find threshold using the
histogram and we may use binary mathematical morphology based
operations for object detection.
[0153] B.4. Simulation results
[0154] We have used different test sequences to check the
performance of our motion estimation algorithm. FIG. 12 shows
128.times.128 current frame of the picture sequence. A 32.times.32
block with its center lying at the center of the 128.times.128
frame is considered an object. FIG. 13 and 14 show previous frames
with different known global and local transformation parameters,
i.e. translation, rotation and scaling.
[0155] We have considered the highest peak in the correlation
surface as the valid motion vector. We have applied our algorithm
to estimate global translation, rotation and scaling by first
estimating the rotation and scaling and then applying affine
transformation to the current frame and again performing phase
correlation to estimate the translation motion. Then we have
estimated local motion after applying transformation to the current
and previous frames, see FIG. 11. Results of the estimated motion
parameters are shown in tables 5 and 6 shown in FIGS. 15 and 16. A
negative indicates counter clockwise rotation whereas a positive
sign indicates clockwise rotation. The transformation parameters
are applied to the previous frames shown in FIG. 13 and in FIG.
14.
[0156] The two tables above show the actual global and local motion
parameters, the measured global and local motion values using our
algorithm and the peak values in the correlation surfaces. "RotScl"
and "Transl" indicate peak values in the correlation surface
obtained in the process of detecting "rotation scaling" and
"translation" respectively. The slight difference between the
actual and the measured values is due to non-uniformity of the
scale. Furthermore, the difference between the actual and the
estimated value for local motion is larger due to the fact that the
block size is smaller block for these objects. We have successfully
measured global and local translation, rotation and scaling using
this algorithm.
[0157] The limiting factor is the loss of relevant information
present in the frames. Also, it should be noted that local motion
estimates for smaller objects cannot be obtained since the
correlation is lost between the blocks and they are no more
correlated in the current and the previous frames. Another
important limitation is that if the number of objects is large such
that they occupy a significant area then the local motion of large
number of objects affects the global motion estimates. In our
implementation we have used the single peak approach, i.e. we have
only taken first peak in the correlation surface as the valid peak.
This reduces the amount of computation required to estimate the
motion vectors.
[0158] B.5. Conclusion
[0159] Existing phase correlation motion estimation techniques fail
completely if scaling or rotation or both are present. The modified
phase correlation based motion estimation system that was presented
as described above with respect to the estimation of global motion
was capable of detecting global motion. However, it fails to
measure the local motion of objects.
[0160] A further extension to this phase correlation based motion
estimation technique was presented in this invention report. The
new technique is capable of jointly detecting global and local
translation, scaling and rotation motion (see section B.2.3).
Different global and local translation, rotation and scaling that
are measured using this algorithm are mentioned in this report (see
section B.4). To increase the range of motion parameters that can
be measured by our system, we could consider more peaks in the
correlation surface and then find the parameters that produce the
highest peak.
[0161] The algorithm to estimate global and local translation in
the presence of scaling only (see section B.2.1) and to estimate
global and local translation in the presence of rotation only (see
section B.2.2) can easily be implemented by changing the modified
phase correlation based motion estimator as described above with
respect to the estimation of global motion. However, it will be
unable to jointly detect translation, rotation and scaling. Our
presented algorithm provides a robust method for estimating all
three types of global and local motion parameters, i.e.
translation, rotation and scaling, either separately or
jointly.
[0162] In the following these and further aspects of the present
invention will be explained in more detail based on preferred
embodiments of the present invention and by taking reference to the
accompanying figures which schematically demonstrate aspects of the
present invention.
[0163] FIG. 1 is a schematical block diagram describing a preferred
embodiment of the inventive method for global motion
estimation.
[0164] FIGS. 2-6 demonstrate by means of a sequence of images
application aspects of an embodiment of the inventive method for
global motion estimation.
[0165] FIGS. 7-10 summarize by means of respective tables the
actual and the measured geometrical relationships between the
images shown in FIGS. 2 to 6.
[0166] FIG. 11 is a schematical block diagram describing a
preferred embodiment of the inventive method for global and local
motion estimation.
[0167] FIGS. 12-14 demonstrate by means of a sequence of images
application aspects of an embodiment of the inventive method for
global and local motion estimation.
[0168] FIGS. 15, 16 summarize by means of respective tables the
actual and the measured geometrical relationships between the
images shown in FIGS. 2 to 6.
[0169] FIGS. 17-19 demonstrate by means of graphical
representations the definition of certain geometrical aspects
between consecutive frames or images.
[0170] FIG. 20 is a schematical block diagram elucidating further
general aspects of the inventive method for motion estimation.
[0171] In the following structural and/or functional elements which
are comparable, similar or equivalent with respect to each other
will be denoted by identical reference symbols. Not in each case of
their occurrence a detailed description will be repeated.
[0172] Before going into detail, reference is taken to FIG. 20
which demonstrates by means of a schematical block diagram in a
broader sense basic aspects of the inventive method for motion
estimation.
[0173] FIG. 20 gives a rough sketch of some basic aspects of the
present invention. In a first process S1 video input data and in
particular consecutive frames f1, f2 are received from which
corresponding consecutive Fourier transformed frames F1, F2 are
derived in a following process S2. Based on phase relationships
between said corresponding consecutive Fourier transformed frames
F1, F2 global motion parameters are derived in a next process S3
based on which the global motion contained in consecutive frames
f1, f2 is removed or compensated in process S4 so that only local
motion aspects remain in thereby obtained global motion compensated
consecutive frames f1', f2'. Again, consecutive Fourier transformed
frames F1', F2' are derived in a further process S5 which
correspond to said global motion compensated consecutive frames
f1', f2'. Based on phase relationships between said consecutive
Fourier transformed frames F1', F2' corresponding to said global
motion compensated consecutive frames f1', f2' local motion
parameters are derived in a next process S6. Thereby, according to
the present invention global and local motion parameters with
respect to translation, rotation and scaling can be determined in
an easy, reliable and consistent manner and essentially based on
translation, rotation and scaling properties of the Fourier
transforming process only.
[0174] FIG. 1 is a schematical block diagram describing some
aspects of a preferred embodiment of the inventive method for
motion estimation. The method described in FIG. 1 consists of a
sequence of procedural steps 1-1 to 1-16.
[0175] In a first step 1-1 video input data in the form of a
sequence of frames, images, and/or pictures is supplied to a
process of block based cosine windowing. The results of the
windowed data are provided to a following step 1-2 of block based
logarithmic magnitude fast Fourier transformation.
[0176] The Fourier transformed data are supplied to a following
step 1-3 of block based high pass filtering.
[0177] The output data of step 1-3 are fed into a process 1-4 of
block based logarithmic polar conversion.
[0178] The results of step 1-4 are again block based raised cosine
windowed in a following step 1-5.
[0179] In the following step 1-6 the output data of process 1-5 are
block based fast Fourier transformed. The output data are twice
supplied to a following step 1-7 of the determination of cross
power spectral data, i. e. as the original output data from step
1-6 and as the output data of step 1-6 to which a delay process
1-6a has been supplied.
[0180] The cross power spectral data are supplied to a following
step 1-8 of block based inverse fast Fourier transformation.
[0181] In the following step 1-9 the output data of step 1-8 are
supplied to a block based peak detection process, from the output
of which a rotational angle and a scaling factor can be
derived.
[0182] Based on the rotational angle and the scaling factor and the
original video input data a block based transformation is applied
thereto in step 1-10.
[0183] The output of step 1-10 is fed into a block based raised
cosine windowing process of step 1-11.
[0184] Then, a block based fast Fourier transformation follows in
step 1-12.
[0185] Again, a cross power spectral analysis is applied in step
1-14 to the block based fast Fourier transformed output data of
step 1-10, i. e. to the original output data of step 1-12 and to
the output data of step 1-12 to which the delay process according
to step 1-12a has been applied.
[0186] The output of the cross power spectral data of step 1-14 are
fed into a process of block based inverse fast Fourier
transformation of step 1-15. The output data of step 1-15 are fed
into a process of block based peak detection according to step 1-16
from which translational parameters can be obtained.
[0187] FIG. 1 also describes a preferred embodiment of an apparatus
for carrying out the inventive method. It can inter alia be applied
to the application areas of camera shaking/moving compensation,
source coding for video, and video rate and scan conversion.
[0188] FIGS. 2 to 6 demonstrate by means of photograph frames
pictures, and/or images which have certain geometrical
relationships with respect to each other. FIG. 2 may be referred to
as a current frame of a certain object without any rotation,
scaling, and/or translation. With respect to FIG. 2 FIG. 3 is
scaled with a factor of 1.25, rotated about 17.2.degree., and
horizontally and vertically translated by 21 pixels.
[0189] With respect to FIG. 2 FIG. 4 is scaled by a factor of about
1.65, rotated by an angle of 7.2.degree., and horizontally and
vertically translated by 5 pixels.
[0190] With respect to FIG. 2 FIG. 5 is scaled with a scaling
factor of about 1.1, rotated by an angle of 90.degree., and
horizontally and vertically translated by 5 pixels.
[0191] With respect to FIG. 2 FIG. 6 is scaled by a scaling factor
of about 1.6, rotated by an angle of 28.6.degree., and horizontally
and vertically translated by 21 pixels.
[0192] FIGS. 7 to 10 show by means of tables 1 to 4 calculation
results for the translational, rotational, and scaling parameters
of FIGS. 3, 4, 5 and 6, respectively, in each case with respect to
FIG. 2 which is referred to as a current frame. That means that the
parameter shown in FIGS. 7 to 10 and the respective tables 1 to 4
have to be compared to the parameters which have been used for
translation, rotation and scaling in order to derive from FIG. 2 as
a current frame the previous frames shown in FIGS. 3, 4, 5 and 5,
respectively.
[0193] FIG. 11 is a schematical block diagram which elucidates
another preferred embodiment of the present inventive method for
motion estimation and in particular when distinguishing between
global and local motion estimation.
[0194] According to FIG. 11 from a provided video input in a first
process 11-1 global estimation parameters are derived as described
above by using the process of modified phase correlation-based
motion estimation.
[0195] The output data of the process 11-1 are fed into a further
process 11-2 for global motion compensation together with a video
input from which the global motion between consecutive frames have
to be removed.
[0196] The result from the global motion compensation according to
process 11-2 is fed into a following process 11-3 of deriving an
absolute difference between the consecutive frames, i.e. the
compensated video input is provided as original data and after
application of a delay process 11-2a in delayed form in order to
compare consecutive frames of the globally compensated video
input.
[0197] The output data after building the absolute difference
between consecutive frames of step 11-3 is fed into a following
process 11-4 evaluating threshold properties.
[0198] The threshold process data of step or process 11-4 are
supplied to a following process 11-5 of image sequentation. The
output of process 11-2 is fed into an object extraction process
11-6.
[0199] Then, based on a process modified phase correlation-based
motion estimation local motion estimates are derived in order to
present as a result local parameters for local translation,
rotation, and/or scaling.
[0200] FIGS. 12 to 14 show frames to which certain geometrical
processes have been applied in order to test the inventive method
for motion estimation. In the sequence of FIGS. 12 to 14 FIG. 12
serves as a current frame whereas FIGS. 13 and 14 serve as previous
frames which are globally and locally scaled, rotated and/or
translated with respect to the frame shown in FIG. 12.
[0201] Tables 5 and 6 as shown in FIGS. 15 and 16 demonstrate the
numerical results which are obtained by applying the inventive
method for motion estimation with respect to the pairs of frames of
FIGS. 12 and 13 and 12 and 14, respectively.
[0202] FIGS. 17, 18 and 19 give a rough definition of the
geometrical parameters, i. e. of the translational parameters, of
the rotational parameters and of the scaling parameters which are
used to describe the action of the application of the inventive
method for estimating motion between consecutive frames.
CITED REFERENCES
[0203] [1] G. A. Thomas, "Television Motion Measurement for DATV
and other Applications", BBC Research Department, Research Report
1987/11. [0204] [2] B. Reddy and B. Chatterji, "An FFT-based
Technique for Translation, Rotation and Scale-invariant Image
Registration", IEEE Trans. on Image Processing, 5:8, pp 1266-1271,
1996. [0205] [3] L. Hill and T. Vlachos, "On the Estimation of
Global Motion using Phase Correlation for Broadcast Applications"
IEEE International Conference on Image Processing and it's
Applications (IPA 99), pp 721-725, 1999.
REFERENCE SYMBOLS
[0205] [0206] a scaling parameter along x direction [0207] b
scaling parameter along y direction [0208] CI current image/frame
[0209] f1 first or previous image/frame [0210] F1 Fourier
transformed first or previous image/frame [0211] f2 second or
current image/frame [0212] F2 Fourier transformed second or current
image/frame [0213] PI previous image/frame [0214] x0 translational
parameter in x direction [0215] y0 translational parameter in y
direction [0216] .phi.0 rotational parameter
* * * * *