U.S. patent application number 13/775301 was filed with the patent office on 2013-10-17 for dominant motion estimation for image sequence processing.
The applicant listed for this patent is Google Inc.. Invention is credited to Andrew Crawford, Hugh Denman, Francis Kelly, Anil Kokaram, Francois Pitie.
Application Number | 20130271666 13/775301 |
Document ID | / |
Family ID | 33485094 |
Filed Date | 2013-10-17 |
United States Patent
Application |
20130271666 |
Kind Code |
A1 |
Crawford; Andrew ; et
al. |
October 17, 2013 |
DOMINANT MOTION ESTIMATION FOR IMAGE SEQUENCE PROCESSING
Abstract
Herein is described a method of estimating dominant motion
between a current frame n and another frame m of an image sequence
having a plurality of frames, the method comprising generating
integral projections of the images and using gradients of those
projections and using differences between the projections. The
input may be any sequence of image frames from an image source,
such as a video camera, an IR or X-ray imagery, radar, or from a
storage medium such as computer disk memory, video tape or a
computer graphics generator.
Inventors: |
Crawford; Andrew; (Mountain
View, CA) ; Kokaram; Anil; (Sunnyvale, CA) ;
Kelly; Francis; (London, GB) ; Denman; Hugh;
(San Francisco, CA) ; Pitie; Francois; (Dublin,
IE) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Google Inc. |
Mountain View |
CA |
US |
|
|
Family ID: |
33485094 |
Appl. No.: |
13/775301 |
Filed: |
February 25, 2013 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11577779 |
Jan 22, 2008 |
8385418 |
|
|
PCT/IE2005/000117 |
Oct 20, 2005 |
|
|
|
13775301 |
|
|
|
|
Current U.S.
Class: |
348/699 |
Current CPC
Class: |
H04N 19/527 20141101;
H04N 5/145 20130101; G06T 7/269 20170101; H04N 19/80 20141101 |
Class at
Publication: |
348/699 |
International
Class: |
H04N 5/14 20060101
H04N005/14 |
Foreign Application Data
Date |
Code |
Application Number |
Oct 22, 2004 |
GB |
0423578.4 |
Claims
1-20. (canceled)
21. A method of estimating a dominant motion between a current
frame n and another frame m of an image sequence having a plurality
of frames, each of the frames having a plurality of pixels, the
method comprising: generating integral projections of the current
frame n and the other frame m; and estimating the dominant motion
with a processor by matching the generated integral projections,
the dominant motion being a motion associated with most of the
pixels of current frame n.
22. The method of claim 21, wherein matching the generated integral
projections includes directly matching the generated integral
projections.
23. The method of claim 22, wherein the direct matching is
performed using a coarse version of the current frame n and other
frame m.
24. The method of claim 22, wherein matching the generated integral
projections further includes refining the direct matching of the
generated integral projections using gradients of the generated
integral projections.
25. The method of claim 24, wherein refining the direct matching of
the generated integral projections is performed at successively
higher frame resolutions of the current frame n and other frame
m.
26. The method of claim 21, further comprising: determining two or
more projection angles for generating the integral projections of
the current frame n and the other frame m.
27. The method of claim 21, further comprising: normalizing at
least one of the generated integral projections before estimating
the dominant motion.
28. The method of claim 21, wherein matching the generated integral
projections is performed using an initial estimate and gradients of
the generated integral projections.
29. The method of claim 21, further comprising: applying a weight
to of at least one of pixels of the current frame n or the other
frame m or to the integral projections to suppress an effect of
local motion when estimating the dominant motion.
30. An apparatus for estimating a dominant motion between a current
frame n and another frame m of an image sequence having a plurality
of frames, each of the frames having a plurality of pixels, the
apparatus comprising: a memory; and one or more processors
configured to execute instructions stored in the memory to:
generate integral projections of the current frame n and the other
frame m, and estimate the dominant motion by matching the generated
integral projections, the dominant motion being a motion associated
with most of the pixels of current frame n.
31. The apparatus of claim 30, wherein matching the generated
integral projections includes directly matching the generated
integral projections.
32. The apparatus of claim 31, wherein the direct matching is
performed using a coarse version of the current frame n and other
frame m.
33. The apparatus of claim 31, wherein matching the generated
integral projections further includes refining the direct matching
of the generated integral projections using gradients of the
generated integral projections.
34. The apparatus of claim 33, wherein refining the direct matching
of the generated integral projections is performed at successively
higher frame resolutions of the current frame n and other frame
m.
35. The apparatus of claim 30, wherein the one or more processors
are further configured to execute instructions to: normalize at
least one of the generated integral projections before estimating
the dominant motion.
36. A non-transitory computer readable medium including program
instructions executable by one or more processors that, when
executed, cause the one or more processors to perform operations
for estimating a dominant motion between a current frame n and
another frame m of an image sequence having a plurality of frames,
each of the frames having a plurality of pixels, the operations
comprising: generating integral projections of the current frame n
and the other frame m; and estimating the dominant motion by
matching the generated integral projections, the dominant motion
being a motion associated with most of the pixels of current frame
n.
37. The non-transitory computer readable medium of claim 36,
wherein matching the generated integral projections includes
directly matching the generated integral projections.
38. The non-transitory computer readable medium of claim 37,
wherein matching the generated integral projections further
includes refining the direct matching of the generated integral
projections using gradients of the generated integral
projections.
39. The non-transitory computer readable medium of claim 38,
wherein the direct matching is performed using a coarse version of
the current frame n and other frame m and refining the direct
matching of the generated integral projections is performed at
successively higher frame resolutions of the current frame n and
other frame m.
40. The non-transitory computer readable medium of claim 36,
wherein the operations further include: normalizing at least one of
the generated integral projections before estimating the dominant
motion.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation application of co-pending
U.S. patent application Ser. No. 11/577,779, which is the National
Stage of International Application No. PCT/IE2005/000117, filed
Oct. 20, 2005, which in turn claims the benefit of the priority of
United Kingdom Application No. 0423578.4, filed on Oct. 22,
2004.
[0002] This invention relates to image and video processing and is
concerned with measuring the global, dominant or camera motion
between any pair of frames in an image sequence. Prior art reveals
work in this area for compensating random displacements due to
unwanted camera motion, for improving MPEG4 coding and for
detecting events in a video stream (e.g. scene cuts). The dominant
motion in a scene is that motion component that can be ascribed to
most of the picture material in an image. Terms like global motion
and camera motion are used synonymously to mean the same thing, but
they do not quite express the fact that the dominant motion in a
scene can be a combination of both camera behaviour and apparent
object behaviour. Thus in an image sequence showing a head and
shoulders shot of a person taken with a static camera, the movement
of the head is likely to be the dominant motion in the scene as it
is the largest moving object. In the recording of a tennis match,
any camera motion is the dominant motion since most of the scene
content is background (the court) and that background will move
relatively with the camera. However, consider that the camera zooms
in from a wide view of the court to a close up of the player. The
dominant motion is initially the camera zoom, but as the player's
body fills the field of view toward the end of the shot, the body
motion becomes dominant later on.
[0003] Dominant motion information has long been recognised as an
important feature in many video processing tasks. This motion
embodies information about the video event, hence it is a useful
feature for content based retrieval [3]. Similarly, because of the
large picture area that can be ascribed to dominant motion, it can
(in general) be estimated more robustly than local motion, and is
useful for compression as in MPEG4[4].
[0004] One embodiment of this invention involves image
stabilisation. Image instability manifests as a random, unwanted
fluctuation in the dominant motion of a scene. Shake is very common
in footage from both hand-held cameras and fixed cameras (despite
the image stabilisation technology on most cameras). Instability
can be caused by external factors such as wind and unsteadiness in
the camera's movement (any instability is magnified at high zoom).
Archived video sequences also suffer from unsteadiness introduced
during filming or during the digitization of film. As most common
compression systems utilise the similarity between consecutive
frames, random dominant motion has a large effect on the
compressibility of video data since more bandwidth is consumed
unnecessarily representing motion locally. Removal of this
fluctuation therefore has wide application in a number of different
areas.
[0005] There are two issues in video stabilisation. Firstly, the
dominant motion must be estimated. The unwanted component of this
dominant motion must then be extracted and removed, while
preserving intentional motion such as pan. To achieve this, it is
assumed that the two components of motion have different
statistics.
[0006] There are many possibilities for estimating dominant motion.
These can be split into two main categories: feature based and
image based. Feature based methods, typically employed in computer
vision, attempt to locate and match important features, e.g.
corners in image pairs, and hence extract the image geometry and
eventually the perspective distortion [12]. Image based methods
rely on direct transformation of the image grid and minimize some
image difference criterion. The technique discussed here is an
image based method.
[0007] Early image based methods include the work described by
Dufaux et al [4] (2000) and Odobez et al [9] (1995). These are both
very similar and rely on a gradient based approximation to image
warping. [9] correctly points out that accurate estimation of
dominant motion requires the design of a technique that can
suppress the motion of the smaller objects in the scene i.e the
Local Motion. Both [9] and [4] propose weighting schemes which are
applied to the 2D image plane in order to remove the effect of
image motion. These weights are derived from measurements made at
single pixel sites only.
[0008] As part of video stabilisation systems several prior art
publications present mention of global motion estimation. In
GB2307133 Video camera image stabilisation system, KENT PHILIP
JOHN; SMITH ROBERT WILLIAM MACLAUGHL, 1997 a global rotation
measurement is claimed based on using histograms of edge
orientations. There is no consideration of translational or general
affine treatment. In EP0986252, System and method for electronic
image stabilization HANNA KEITH JAMES (US), BURT PETER JEFFREY
(US), SARNOFF CORP (US), 2000 a generic claim is made for global
motion estimation using a recursive refinement of an initial
estimate which may be zero. This concept is well established in
prior available literature, also for global motion [9] 1995. Even
more generically it is known as an idea for generating motion
information since 1987[2]. The present invention presents a new
means for creating updates and the updates themselves do not apply
to the entire 2D image surface, but instead to extracted
measurement vectors. In WO2004056089, FRETWELL PAUL, FAULKNER DAVID
ANDREW ALEXANDER (GB) et al, 2004 a claim is made for a method that
uses a mask to remove the effect of local motion in estimating
global motion. That idea is the same as the weights used by [9],
1995; for the same purpose. However, in [9], the weights are
adaptive while in WO2004056089 the weights comprise a fixed, binary
mask. Adaptive weights are generally a superior mechanism for
coping with global motion, even though more computationally
expensive. Finally, in GB2365244, Image stabilisation, LEBBELL MARK
(GB); TASKER DAVID (GB), 2002 mention is made about using global
motion for video stabilisation but there is no claim regarding the
mechanism used for making the global motion measurement.
[0009] Direct matching techniques can be attempted for dominant
motion estimation. This implies exhaustively searching for the best
motion component that would exactly match two image frames. This is
known as Block Matching. It is very simple to implement but
computationally expensive because of the exhaustive nature of the
search process. Since 1992[5], ad-hoc developments in an
alternative strategy for direct matching have emerged. Note that
all of these developments have addressed only the problem of
discovering the image translation between two images that are
identical except for the relative displacement between them. The
application domain was not realistic image sequences but instead
targeted the image registration problem in satellite imagery. The
idea is instead of matching the entire 2D images, it is sensible to
match the vertical and horizontal summation of the image.
Intuitively it makes sense. Consider that the vertical image
projection is the sum of the image intensities along columns.
Similarly the horizontal projection is the same along rows. If an
image moves upwards, then its horizontal projection also moves
upwards. Thus instead of matching an N.times.M image containing N
rows of M columns of digital data, one could just match two vectors
containing N and M entries respectively. This is a vast savings in
computational cost.
[0010] Since 1992, more schemes have emerged that properly
recognise the relationship to motion estimation: 1996[11], 2002
[7]. However these papers all deal with i) direct matching of
integral projections using an exhaustive search and ii) no local
motion in the blocks. In the former case, computational expense is
lower than direct matching of 2D images, but it is still a cost
especially for high resolution. In the latter case these papers do
not consider the problem of dominant motion estimation.
[0011] Milanfar et al [8, 10] have placed some structure on the
previously ad-hoc work. They do so by showing that the integral
projections approach can be derived from a Radon Transform of the
image. Their work leads to unification of previous approaches and
the introduction of the idea that projections along non-cartesian
directions could be better in some cases. Again this work does not
consider local motion as an issue.
1 Estimating Dominant Motion: The invention
[0012] This invention discloses a new means for estimating dominant
motion that is more computationally efficient. One embodiment of
the invention results in a system using general purpose hardware,
that removes random, unwanted global motion at rates in excess of
25 frames per second operating on standard definition 720.times.576
digital television images. The input may be any sequence of image
frames from an image source, such as a video camera, an IR or X-ray
imagery, radar, or from a storage medium such as computer disk
memory, video tape or a computer graphics generator.
[0013] One component of this invention, is a new perspective on
Integral Projections which is much simpler to follow than the
Transform domain exposition [8]. It is different in that it leads
directly to a gradient based approach to matching integral
projections. This is computationally cheaper. The gradient based
aspect is another component of the invention, along with a
refinement process for treating large displacement. In addition,
the new result allows a measure to be derived that can check the
validity of a projection before motion estimation begins. The
invention also incorporates the use of weights in the image space
to remove the effect of local motion on the integral projection.
Finally, one embodiment of the invention is the use of the Graphics
Hardware available in general purpose PCs, PDAs and game consoles
(e.g. Sony Playstation) for implementing the projection and
compensation unit for an image stabiliser.
[0014] An overview of the process is shown in FIG. 1. The figure
shows the overall system invention, in an embodiment for
translational motion. The frame buffer unit is an image delay that
can manifest as a framestore holding one previous frame in memory.
The frames input to the system need not be consecutive however. The
Image Projections and Projection Shift units create and translate
projections respectively. These units may be implemented within the
Graphics hardware of modem computers and games consoles. The
Gradient Based matching unit calculates the shift between current
and previous image frame projections using the method described in
this invention.
[0015] Dominant motion is estimated based on a single large
N.times.N block centred on each frame. In one embodiment of the
invention, a value of N=512 pixels is used for a 720.times.576
image. This block size is arbitrary and depends on the size of the
overall picture. It generally should occupy 90% of the area of the
image. All methods described use one dimensional, Integral
Projections of this block to estimate global motion. The directions
of the projections need not be vertical and horizontal. They may be
any set of directions, preferably two orthogonal directions.
Consider an integral projection of the image I.sub.n(h,k), where n
is the frame index, h, k are pixel coordinates. The horizontal
projection is calculated by summing along rows (horizontal
direction) and given by p.sub.n.sup.y(h)=.SIGMA..sub.kI.sub.n(h,k),
while the vertical projection results from summing along columns
(vertical direction): p.sub.n.sup.x(k)=E.sub.hI.sub.n(h,k).
[0016] To relate the use of these projections to motion estimation,
express the image sequence as obeying the following law
I.sub.n(x)=I.sub.n-1(x+d)+.epsilon.(x) (1)
where x=[h, k], d is the dominant image displacement and
.epsilon.(x).about.N(0,.sigma..sub.e.sup.2) (Gaussian noise). d
consists of two components [d.sub.1, d.sub.2], the horizontal and
vertical components of motion.
[0017] Consider that an initial estimate of d exists. The initial
estimate may be zero. Define this to be d.sub.0. Further, consider
that it is required to update this estimate such that the result is
the actual displacement: d=d.sub.0+u, where u=[u.sub.x, u.sub.y] is
the update displacement vector. Therefore, the image sequence model
can be written as
I.sub.n(x)=I.sub.n-1((x+d.sub.0)+u)+.epsilon.(x) (2)
[0018] Using the Taylor Series Expansion to linearize the left hand
side about x+d.sub.0 gives:
I.sub.n(x)=I.sub.n-1(x+d.sub.0)+u.sup.T.gradient.I.sub.n-1(x+d.sub.0)+.e-
psilon.(x) (3)
[0019] Let Z.sub.n(x)=I.sub.n(x)-I.sub.n-1(x+d.sub.0):
Z.sub.n(x)=u.sup.T.gradient.I.sub.n-1(x+d.sub.0)+.epsilon.(x)
(4)
[0020] Writing the .gradient. operator in full:
Z.sub.n(h,k)=u.sub.xG.sub.x(h,k)+u.sub.yG.sub.y(h,k)+.epsilon.(h,k)
(5)
where G.sub.x(h,k), G.sub.y(h,k) are horizontal and vertical
gradients at image pixel (h,k) respectively; given as follows.
G y ( h , k ) = .differential. I n - 1 ( h , k ) .differential. y
and G x ( h , k ) = .differential. I n - 1 ( h , k ) .differential.
x ( 6 ) ##EQU00001##
[0021] The crucial step is to recognise that assuming the motion is
the same for a large image area, summing in a particular direction
can allow useful approximations. To simplify matters assume
.SIGMA..sub.h.epsilon.(h,k)=0 although it is possible to proceed
without this assumption. Summing horizontally along rows with
respect to h:
h Z n ( h , k ) ( i ) = u X h G x ( h , k ) ( ii ) = u y h G y ( h
, k ) ( iii ) ( 7 ) ##EQU00002##
[0022] A similar expression exists for summing in the vertical
direction. If it were possible to ignore one of the two terms (ii)
or (iii) each component of motion could be solved separately. The
table below shows the ratio .SIGMA..sub.hG.sub.y/E.sub.hG.sub.x for
a number of test images which are used as standard in the image
processing industry.
TABLE-US-00001 Image Ratio Lena 7.1 Sailboat 24.2 Peppers 76.9
[0023] The table shows that term (iii) is more significant than
term (ii) in general. This makes sense since summing with respect
to h followed by calculating the gradient also with respect to h is
equivalent to applying a low-pass filter along the rows followed by
a high-pass filter in the same direction. Such a cascade will
produce a low energy output. It is sensible then to assume that
(ii)=0, which yields the following simplification.
h Z n ( h , k ) = u y h G y ( h , k ) ( 8 ) ##EQU00003##
[0024] Defining z.sub.n.sup.x(k)=E.sub.hZ.sub.n(h,k) and
g.sub.y.sup.x(k)=E.sub.hG.sub.y(h,k), allows this expression at a
single row k to be written as follows.
z.sub.n.sup.x(k)=u.sub.yg.sub.y.sup.x(k) (9)
[0025] Each such equation at each row can be stacked into a vector
to yield a set of equations as follows.
[ z n x ( 0 ) z n x ( 0 ) z n x ( 0 ) z n x ( N - 1 ) ] = u y [ g y
x ( 0 ) g y x ( 0 ) g y x ( 0 ) g y x ( N - 1 ) ] ( 10 )
##EQU00004##
where there are N rows in the block being analysed. This equation
can be represented in vector form as
z.sub.n.sup.x=u.sub.yg.sub.y.sup.x (11)
[0026] Using the pseudoinverse, an estimate for u.sub.y can then be
generated as using the following expression.
u y = g y x T z n x g y x T g y x ( 12 ) ##EQU00005##
[0027] At this point it is vital to recognise that the elements of
vectors z.sub.n and g.sub.y can be calculated using integral
projections.
z.sub.n.sup.x(k)=p.sub.n.sup.x(k)-p.sub.n-1.sup.x(k) (13)
g.sub.y.sup.x(k)=p.sub.n-1.sup.x(k)-p.sub.n-1.sup.x(k-1) (14)
[0028] Thus u.sub.y can be calculated using integral projections.
u.sub.x can be calculated similarly, summing along rows k. Hence
the connection between Integral projections and motion
estimation.
[0029] In addition, for any transformation of the image that can be
effectively linearized by the Taylor series expansion, this idea
holds. Consider that the dominant motion is due to an affine
transformation given by a 2D matrix A, as follows.
A = [ a 11 a 12 a 21 a 22 ] ( 15 ) ##EQU00006##
[0030] Affine motion generalises zoom, rotation, and skew
transformations of the image. For instance a.sub.11=a.sub.22=0.5;
a.sub.12=a.sub.21=0 causes a zoom of factor two between images.
Assuming translational motion as well, the image model can
therefore be written as
I n ( x ) = I n - 1 ( Ax + d ) + ( x ) = I n - 1 ( a 11 h + a 12 k
+ d 1 , a 21 h + a 22 k + d 2 ) + ( h , k ) ( 16 ) ##EQU00007##
[0031] Again, the Taylor series expansion can be used to expand the
expression above about an initial estimate. However the initial
motion estimate is now A.sub.0, d.sub.0, since both affine motion
and trans-lational must be accounted for. Exactly the same steps as
above can then be followed, including summing along particular
directions to yield a solution for the parameters A, d. In this
formulation however it is not possible to straightforwardly
separate estimation of each parameter into separate equations even
after summation along the projection directions. Nevertheless
summation does yield simplification and again a projection based
motion estimate results.
1.1 A test
[0032] It is possible to use projection directions which are not
vertical or horizontal. In fact this is ad-vantageous in order to
increase the validity of the crucial assumption in equation 8. To
validate a particular projection direction, the term
E.sub.hG.sub.k/E.sub.hG.sub.h can be measured. If this value is too
low, another projection angle should be used. This ratio can also
be used as a prior step before motion estimation to decide on
suitable projection directions.
1.2 Multiresolution Refinement Step
[0033] The Taylor series expansion holds only for small values of
dominant motion. This problem can be circumvented by using
multiresolution techniques. Coarse to fine refinement of motion
estimates on a pyramid of images is one mechanism for dealing with
large displacement in the gradient estimation context. Here a 4
level pyramid is employed with a maximum of 10 iterations at each
level. The method is called Multi-Res in subsequent sections. A
further computational savings is had by noting that the pyramid can
be generated in the 1D projection space rather than in the 2D image
space. Thus the pyramid is built by downsampling 1D projections
rather than projecting downsampled images. The savings is on the
order of N.sup.2/3 multiply adds.
[0034] Because the manipulation of integral projections requires so
little computation, it is possible to propose another, hybrid
technique. Direct matching on the projections using for example
cross correlation is performed, at the integer pixel resolution.
This leads to an estimate d.sub.0. The resulting estimate of motion
is then used to initialise the gradient based estimator above. This
method allows the gradient based method to concentrate on the
relatively small motion adjustments required after the gross direct
matching is achieved.
1.3 Weights
[0035] Weights can be used to reduce the effect of objects
undergoing local motion on the estimation of global motion.
Weighting can be clone either in the projections themselves or in
the 2D image space. The idea of weighted estimation for this
purpose can be found in [9]. This invention applies that idea for
use with the projections based, gradient technique given here.
[0036] Applied to the image space, a weight w(h,k) representing a
confidence between 0 and 1 can be associated with each pixel site.
Each weight can be derived as a function of the observed displaced
frame difference (DFD) .epsilon.(x)=I.sub.n(x)-I.sub.n-1(x+d) at
that site at each iteration. Note that the DFD is measured by
warping the 2D image I.sub.n-1 with the current estimate of global
motion and subtracting that from the current image I.sub.n. Large
DFD is mapped to low weights and vice versa. One possibility for
mapping DFD to weights is the function
w(h,k)=2/(1+exp(.alpha..epsilon.(h,k))) where a adjusts how fast
the weights go to 0 as Z gets larger. Many other functions can be
used, the essential idea being that large DFD probably indicates a
poor image match, hence residual motion, hence local motion. These
weights are then used to remove the effect of the corresponding
pixels in the integral projections by premultiplying the image with
the weights before summation. Each projection element must be
scaled by the sum of the weights along the relevant row or
column.
[0037] In a similar fashion, weights can be applied directly in the
projections space by applying them to modulate gradients and z.
Thus a weight is associated with each projection bin by using the
same means as mentioned previously except the error measure (DFD)
is the difference between current and previous projections
(displaced by current motion estimates). Both the gradient and
difference vector are multiplied by the weights before a solution
is generated for the global motion. This results in a matching
process robust to large deviations in the projections space
presumably caused by local motion.
1.4 Real Time Implementation and Computation
[0038] The video frame-rate must be maintained for a real-time
implementation. To achieve real-time implementation at this PAL
frame rate (25 fps), each frame must be processed in less than 40
ms.
[0039] The table below compares the computational complexity of
block matching with that of each of the methods proposed as
embodiments of the invention. The first column gives the number of
operations required based on a single N.times.N size block, with a
range of (+/-w) (where i is the number of iterations and t is the
number of taps used in the low pass filter used by the multi
resolution method). This does not include the number of
computations required to calculate the projections (2N.sup.2). The
ratio of computations w.r.t. block matching is also shown
(including the calculation of the projections) given values of
N=512, w=32, i=20 and t=15. A value of ratio less than 1 indicates
that the algorithm contains proportionately less operations than
BM. It is clear from these values the use of integral projections
provides a huge reduction in computational complexity.
TABLE-US-00002 Method Operations Ratio to BM BM (2w +
1).sup.2(N.sup.2) 1 Gradient based 2i(7N) 0.00060 Hybrid 8wN + 8N +
14iN 0.00073 Multi-Res 1 15 16 N ( t + 14 i ) ##EQU00008##
0.00074
1.4.1 Separating Unwanted Components of Motion
[0040] Global motion can be caused by: (1) intentional effects like
a pan, and (2) the unsteadiness of the camera which is
unintentional. The first effect is generally low frequency and
exhibits slow temporal variations, whereas the secondary effect
could be temporally impulsive. In the case of image sequence
stabilisation, after the dominant motion estimation step the
measured motion is a combination of unwanted and wanted components.
For instance, if a person is holding a camera and pans from left to
right, a shaking hand will cause the deviation of the global motion
away from the desired pan motion due to the (perhaps) random hand
movements. The random hand motion component is unwanted while the
pan is desired. The dominant motion estimator will yield a motion
estimate that is the sum of these two motions. Thus removing all
dominant motion in this case does stabilise the sequence but it
also removes the desired pan.
[0041] In one embodiment of the invention, the dominant motion
estimator can be coupled with a process for removing unwanted
components of motion. It is possible to extract the low frequency
(desired) signal by means of a low pass filter [6]. The motion
estimate that is required for stabilisation can then be found by
simple difference of the output of this filter and the measured
motion.
[0042] As the shake in hand-held cameras is not extreme and only
past estimations are available in a real time system, a simple IIR
low pass filter is sufficient where the coefficients of the filter
could manifest follows.
H ( z ) = 0.0201 + 0.0402 z - 1 + 0.2017 z - 2 1 + 1.1561 z - 1 -
0.6414 z - 2 ( 17 ) ##EQU00009##
[0043] In another situation, the unintentional motion could last
for a single frame or be completely random. This is the case in
film scanning when frames are displaced randomly from each other
because of scanner malfunction or the degradation of the film guide
holes. In this situation the filter above cannot reject the
impulsive, random component on its own especially when that
component is large. A solution is to use a median filter as a
detector of large deviations in global motion. Thus the motion
estimates are first filtered with a median filter (having at least
3 taps, and preferably 5 taps). This will reject large deviations
in the observed global motion. The difference between that median
filtered output and the original motion signature will be large at
the instances of large impulsive deviation, but small otherwise. By
thresholding this difference signal, it is possible to switch
between the IIR filter output and the median filter output. Thus
the desired component of motion can be estimated regardless of the
size and randomness of the global motion.
[0044] Finally, it is noted that when there are changes in the
average brightness of the image, the iterative refinement global
motion estimate process described above may not converge well. This
problem can occur during scene change effects like fades, or if
there is degradation of the image leading to brightness
fluctuations. This lack of convergence can occur because changes in
brightness can cause a fixed offset in z which in turn ensures that
the update motion u may not ever become zero. To alleviate this
problem it is preferable to normalise the projections to have the
same mean and variance before proceeding with the matching
step.
1.4.2 Event Spotting
[0045] The ability to automatically spot an important event in a
video sequence is useful for surveillance and summarisation
applications. In sports for instance, a rapid zoom in could
indicate an important object is in view. In cricket, a zoom in
followed by a zoom out indicates a bowler run up and delivery
sequence [1]. In addition, large apparent translations could
indicate people entering or leaving a room. For this reason the
dominant motion estimation process described here can be used for
event spotting since it yields a feature that could be correlated
to important events in the video.
1.5 Image Compensation and the GPU
[0046] To create the final images for output, each image must be
shifted to compensate for the unwanted motion component estimated
in previous sections. In order to accurately represent the global
motion of a frame, a sub-pixel accurate motion vector is typically
required. Interpolation of the image signal is required to motion
compensate a frame with a fractional motion vector. Typically
bilinear interpolation is sufficient. However this interpolation is
computationally very demanding and can be a bottleneck in a
real-time shake reduction scheme.
[0047] Modern graphics hardware contain very efficient
interpolation units which are used in the texture mapping stage of
the graphics pipeline. The graphics hardware can compensate each
frame with bilinear interpolation accuracy. This can be done much
faster than real-time with the motion compensated sequence
displayed on screen. Each motion compensated frame can also be
retrieved from the graphics hardware and saved to file if
necessary. Because the graphics hardware can work in parallel with
the CPU, using it for motion compensation also frees up valuable
CPU cycles for other processes. We do not present here the details
of the GPU code needed to achieve this. This code will change with
generations of GPUs. The point to be made here is that it is one
embodiment of this invention that the interpolation unit of the GPU
can be used as part of the pipeline for dominant motion estimation
and subsequent video stabilisation as required. GPUs produced by
NVIDIA.TM. and ATI.TM. are good vehicles for this implementation.
The Sony Playstation.TM. is also suitable.
[0048] In addition, dedicated hardware can be built to perform
these functions including a combination of FPGA and DSP blocks.
REFERENCES
[0049] [1] A. Kokaram and P. Delacourt. A new global estimation
algorithm and its application to retrieval in sport events. In IEEE
International Workshop on Multimedia Signal Processing, MMSP '01,
pages 3-5, October 2001. [0050] [2] J. Biemond, L. Looijenga, and
D. E. Boekee. A pel-recursive Wiener-based displacement estimation
algorithm. Signal Processing, 1987. [0051] [3] P. Bouthemy, M.
Gelgon, and F. Ganansia. A unified approach to shot change
detection and camera motion characterization. IEEE Transactions on
Circuits and Systems for Video Technology, 9:1030-1044, 1999.
[0052] [4] F. Dufaux and J. Konrad. Efficient, robust and fast
global motion estimation for video coding. IEEE Transactions on
Image Processing, 9:497-501, 2000. [0053] [5] J.-S. Kim and R.-H.
Park. A fast feature-based block matching algorithm using integral
projections. IEEE J. Selected Areas in Communications,
10(5):986-971, June 1992.
[0054] [6] A. Kokaram, R. Dahyot, F. Pitie, and H. Denman.
Simultaneous luminance and position stabilization for film and
video. In Visual Communications and Image Processing, San Jose,
Calif. USA, January 2003. [0055] [7] J. H. Lee and J. B. Ra. Block
motion estimation based on selective integral projections. In IEEE
ICIP, volume I, pages 689-693, 2002. [0056] [8] P. Milanfar. A
model of the effect of image motion in the radon transform domain.
IEEE Trans. on image Processing, 8(9):1276-1281, 1999. [0057] [9]
J-M. Odobez and P. Bouthemy. Robust multiresolution estimation of
parametric motion models. Journal of visual communication and image
representation, 6:348-365, 1995. [0058] [10] Dirk Robinson and
Peyman Milanfar. Fast local and global projection-based methods for
affine motion estimation. Journal of Mathematical Imaging and
Vision, 18:35-54, 2003. [0059] [11] K. Sauer and B. Schwartz.
Efficient block motion estimation using integral projections. IEEE
Trans. Circuits and Systems for Video Technology, 6(5):513-518,
October 1996. [0060] [12] P. H. S. Torr. Geometric motion
segmentation and model selection. Philosophical Transactions of the
Royal Society A, pages 1321-1340, 1998.
* * * * *