U.S. patent application number 13/411255 was filed with the patent office on 2013-05-16 for method and apparatus for estimating rotation, focal lengths and radial distortion in panoramic image stitching.
The applicant listed for this patent is Hailin Jin. Invention is credited to Hailin Jin.
Application Number | 20130121616 13/411255 |
Document ID | / |
Family ID | 45757969 |
Filed Date | 2013-05-16 |
United States Patent
Application |
20130121616 |
Kind Code |
A1 |
Jin; Hailin |
May 16, 2013 |
METHOD AND APPARATUS FOR ESTIMATING ROTATION, FOCAL LENGTHS AND
RADIAL DISTORTION IN PANORAMIC IMAGE STITCHING
Abstract
Method and apparatus for estimating relative three-dimensional
(3D) camera rotations, focal lengths, and radial (lens) distortions
from point-correspondences in pairwise (two image) image alignment.
A core estimator takes a minimal (three) number of
point-correspondences and returns a rotation, lens (radial)
distortion and two focal lengths. The core estimator solves
relative 3D camera rotations, and lens distortions from
3-point-correspondences in two images in the presence of noise in
point-correspondences. A robust estimator may be based on or may be
"wrapped around" the core estimator to handle noise and errors in
point-correspondences. The robust estimator may determine an
alignment model for a pair of images from the rotation, distortion,
and focal lengths.
Inventors: |
Jin; Hailin; (Campbell,
CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Jin; Hailin |
Campbell |
CA |
US |
|
|
Family ID: |
45757969 |
Appl. No.: |
13/411255 |
Filed: |
March 2, 2012 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
12035954 |
Feb 22, 2008 |
8131113 |
|
|
13411255 |
|
|
|
|
60991108 |
Nov 29, 2007 |
|
|
|
Current U.S.
Class: |
382/284 |
Current CPC
Class: |
G06T 2207/30244
20130101; G06T 7/74 20170101; G06T 7/337 20170101 |
Class at
Publication: |
382/284 |
International
Class: |
G06K 9/36 20060101
G06K009/36 |
Claims
1-22. (canceled)
23. A method, comprising: performing, by one or more computing
devices: generating, for a pair of images, a plurality of estimates
of relative rotation, focal lengths, and radial distortion
according to a plurality of sets of three point-correspondences for
the pair of images, where each point-correspondence represents a
different feature point that occurs in both of the images;
generating a plurality of alignment models for the pair of image
according to the plurality of estimates of relative rotation, focal
lengths, and radial distortion; and determining which of the
plurality of alignment models are to be used for processing the
pair of images.
24. The method as recited in claim 23, wherein each alignment model
is a mathematical model that defines a geometric relationship
between the two images.
25. The method as recited in claim 23, wherein said determining
including determining a best alignment model for the pair of images
by verifying each of the plurality of alignment models against a
plurality point-correspondences for the pair of images.
26. The method as recited in claim 23, wherein generating estimates
of relative rotation, focal lengths, and radial distortion
according to a set of three point correspondences comprises:
composing a set of parametric equations from the three
point-correspondences; solving the set of parametric equations to
generate the estimates of focal lengths and radial distortion; and
computing the estimate of relative rotation from the three
point-correspondences and the estimates of focal lengths and radial
distortion.
27. The method as recited in claim 23, further comprising stitching
the pair of images in accordance with the determined alignment
model to form a composite image from the pair of images.
28. The method as recited in claim 23, further comprising:
performing said generating a plurality of estimates of relative
rotation, focal lengths, and radial distortion, said generating a
plurality of alignment models, and said determining for at least
two pairs of images in a plurality of component images; and
generating a panoramic image from the plurality of component images
according to the determined alignment models.
29. The method as recited in claim 23, wherein said generating a
plurality of estimates of relative rotation, focal lengths, and
radial distortion is performed by a core estimator implemented with
a robust estimator, wherein the robust estimator performs said
generating a plurality of alignment models and said determining
which of the plurality of alignment models are to be used for
processing the pair of images.
30. The method as recited in claim 23, wherein the pair of images
are overlapping images from a plurality of component images taken
of a panoramic scene.
31. A system, comprising: at least one processor; and a memory
comprising program instructions, wherein the program instructions
are executable by the at least one processor to: generate, for a
pair of images, a plurality of estimates of relative rotation,
focal lengths, and radial distortion according to a plurality of
sets of three point-correspondences for the pair of images, where
each point-correspondence represents a different feature point that
occurs in both of the images; generate a plurality of alignment
models for the pair of image according to the plurality of
estimates of relative rotation, focal lengths, and radial
distortion; and determine which of the plurality of alignment
models are to be used to form a composite image from the pair of
images.
32. The system as recited in claim 31, wherein each alignment model
is a mathematical model that defines a geometric relationship
between the two images.
33. The system as recited in claim 31, wherein the determination
includes a determination of a best alignment model for the pair of
images, the program instructions are executable by the at least one
processor to verify each of the plurality of alignment models
against a plurality of point-correspondences for the pair of
images.
34. The system as recited in claim 31, wherein, to generate
estimates of relative rotation, focal lengths, and radial
distortion according to a set of three point correspondences, the
program instructions are executable by the at least one processor
to: compose a set of parametric equations from the three
point-correspondences; solve the set of parametric equations to
generate the estimates of focal lengths and radial distortion; and
compute the estimate of relative rotation from the three
point-correspondences and the estimates of focal lengths and radial
distortion.
35. The system as recited in claim 31, wherein the program
instructions are executable by the at least one processor to stitch
the pair of images in accordance with the determined alignment
model to form the composite image from the pair of images.
36. The system as recited in claim 31, wherein the program
instructions are executable by the at least one processor to:
perform said generating a plurality of estimates of relative
rotation, focal lengths, and radial distortion, said generating a
plurality of alignment models, and said determining a best
alignment model for at least two different pairs of images in a
plurality of component images; and generate a panoramic image from
the plurality of component images according to the determined
alignment models.
37. A computer-readable memory medium storing program instructions,
wherein the program instructions are computer-executable to
implement: generating, for a pair of images, a plurality of
estimates of relative rotation, focal lengths, and radial
distortion according to a plurality of sets of three point
correspondences for the pair of images, where each
point-correspondence represents a different feature point that
occurs in both of the images; generating a plurality of alignment
models for the pair of image according to the plurality of
estimates of relative rotation, focal lengths, and radial
distortion; and determining, from the plurality of alignment models
for the pair of images, a best alignment model for the pair of
images.
38. The computer-readable memory medium as recited in claim 37,
wherein each alignment model is a mathematical model that defines a
geometric relationship between the two images.
39. The computer-readable memory medium as recited in claim 37,
wherein, in said determining a best alignment model for the pair of
images, the program instructions are computer-executable to
implement verifying each of the plurality of alignment models
against a plurality of point-correspondences for the pair of
images.
40. The computer-readable memory medium as recited in claim 37,
wherein, in generating estimates of relative rotation, focal
lengths, and radial distortion according to a set of three
point-correspondences, the program instructions are
computer-executable to implement: composing a set of parametric
equations from the three point-correspondences; solving the set of
parametric equations to generate the estimates of focal lengths and
radial distortion; and computing the estimate of relative rotation
from the three point-correspondences and the estimates of focal
lengths and radial distortion.
41. The computer-readable memory medium as recited in claim 37,
wherein the program instructions are computer-executable to
implement stitching the pair of images in accordance with the best
alignment model to form a composite image from the pair of
images.
42. The computer-readable memory medium as recited in claim 37,
wherein the program instructions are computer-executable to
implement: performing said generating a plurality of estimates of
relative rotation, focal lengths, and radial distortion, said
generating a plurality of alignment models, and said determining a
best alignment model for at least two different pairs of images in
a plurality of component images; and generating a panoramic image
from the plurality of component images according to the determined
best alignment models.
Description
PRIORITY INFORMATION
[0001] This application is a Continuation of U.S. application Ser.
No. 12/035,954, filed Feb. 22, 2008, which claims benefit of
priority of U.S. Provisional Application Ser. No. 60/991,108, filed
Nov. 29, 2007, both entitled "Method and Apparatus for Estimating
Rotation, Focal Lengths and Radial Distortion in Panoramic Image
Stitching", the contents of which are incorporated by reference
herein in their entirety.
BACKGROUND
[0002] 1. Field of the Invention
[0003] This invention relates to computer systems, specifically to
computer-aided image processing, and more specifically to the
merging of images to form a composite image.
[0004] 2. Description of the Related Art
[0005] Image capture devices, such as cameras, may be used to
capture an image of a section of a view or scene, such as a section
of the front of a house. The section of the view or scene whose
image is captured by a camera is known as the field of view of the
camera. Adjusting a lens associated with a camera may increase the
field of view. However, there is a limit beyond which the field of
view of the camera cannot be increased without compromising the
quality, or "resolution", of the captured image. Further, some
scenes or views may be too large to capture as one image with a
given camera at any setting. Thus, it is sometimes necessary to
capture an image of a view that is larger than can be captured
within the field of view of a camera. In these instances, multiple
overlapping images of segments of the view or scene may be taken,
and then these component images may be joined together, or merged,
to form a composite image.
[0006] One type of composite image is known as a panoramic image. A
panoramic image may have a rightmost and leftmost image that each
overlap only one other image, or alternatively the images may
complete 360.degree., where all images overlap at least two other
images. In the simplest type of panoramic image, there is one row
of images, with each image at most overlapping two other images.
However, more complex composite images may be captured that have
two or more rows of images; in these composite images, each image
may potentially overlap more than two other images. For example, a
motorized camera may be configured to scan a scene according to an
M.times.N grid, capturing an image at each position in the grid.
Other geometries of composite images may be captured.
[0007] Computer programs and algorithms exist for assembling a
single composite image from multiple potentially overlapping
component images. A general paradigm for automatic image stitching
techniques is to first detect features in individual images;
second, to establish feature correspondences and geometric
relationships between pairs of images (pairwise stage); and third,
to use the feature correspondences and geometric relationships
between pairs of images found at the pairwise stage to infer the
geometric relationship among all the images (multi-image
stage).
[0008] Image stitching is thus a technique to combine and create
images with large field of views. Feature-based stitching
techniques are image stitching techniques that use
point-correspondences, instead of image pixels directly, to
estimate the geometric transformations between images. An
alternative is intensity-based stitching techniques that use image
pixels to infer the geometric transformations. Many image stitching
implementations make assumptions that images are related either by
2D projective transformations or 3D rotations. However, there are
other types of deformations in images that are not captured by the
aforementioned two, for instance, lens distortions.
[0009] Panoramic image alignment is the problem of computing
geometric relationships among a set of component images for the
purpose of stitching the component images into a composite image.
Feature-based techniques have been shown to be capable of handling
large scene motions without initialization. Most feature-based
methods are typically done in two stages: pairwise alignment and
multi-image alignment. The pairwise stage starts from feature
(point) correspondences, which are obtained through a separate
feature extraction and feature matching process or stage, and
returns an estimate of the alignment parameters and a set of
point-correspondences that are consistent with the parameters.
Various robust estimators or hypothesis testing frameworks may be
used to handle outliers in point-correspondences.
[0010] The multi-image stage may use various techniques to further
refine the alignment parameters, jointly over all the images, based
on the consistent point-correspondences retained in the pairwise
stage. It is known that the convergence of the multi-image stage
depends on how good the initial guesses are. However, an equally
important fact that is often overlooked is that the quality of the
final result from the multi-image stage depends on the number of
consistent point-correspondences retained in the pairwise stage.
When the number of consistent point-correspondences is low, the
multi-image alignment will still succeed, but the quality of the
final result may be poor.
[0011] In the pairwise stage, it is commonly assumed that an
imaging system satisfies an ideal pinhole model. As a result, many
conventional methods only estimate either 3.times.3 homographies or
"rotation+focal lengths". However, real imaging systems have some
amount of lens distortion. Moreover, wide-angle lenses that are
commonly used for shooting panoramic images may introduce larger
distortions than regular lenses. Modeling lens distortion is
critical for obtaining high-quality alignment. It may appear that
it is sufficient to model lens distortion at the multi-image
alignment stage. This strategy may work if all the most correct
correspondences are kept at the pairwise alignment. However,
without modeling lens distortion at the pairwise stage, it may not
be possible to retain all of the most correct correspondences.
Among those most correct correspondences that may be rejected by
the model without lens distortion, many may be ones close to image
borders, because lens distortion effects are more pronounced for
the points close to image borders than those close to image
centers. Correspondences that have points close to image borders
are, on the other hand, more important for estimating lens
distortion, for the same reason that lens distortion effects are
larger there. Losing them at the pairwise stage makes it difficult
for the multi-image stage to correctly estimate lens distortion. As
a result, misalignment may show up when images are stitched
together, particularly along the image borders. Therefore, it is
important to estimate the lens distortion jointly with other
alignment parameters at the pairwise stage.
RANSAC
[0012] RANSAC is an exemplary robust estimator or hypothesis
testing framework. RANSAC is an abbreviation for "RANdom SAmple
Consensus". RANSAC provides a hypothesis testing framework that may
be used, for example, to estimate parameters of a mathematical
model from a set of observed data which contains outliers.
EXIF
[0013] EXIF stands for Exchangeable Image File Format, and is a
standard for storing interchange information in image files,
especially those using Joint Photographic Experts Group (JPEG)
compression. Most digital cameras now use the EXIF format. The
format is part of the Design rule for Camera File system (DCF)
standard created by Japan Electronics and Information Technology
Industries Association (JEITA) to encourage interoperability
between imaging devices.
SUMMARY
[0014] Various embodiments of a method and apparatus for estimating
relative three-dimensional (3D) camera rotations, focal lengths,
and radial (lens) distortions from point-correspondences in
pairwise (two image) image alignment are described. Embodiments may
provide a core estimator that takes a minimal (three) number of
point-correspondences and returns a rotation, lens (radial)
distortion and two focal lengths. In embodiments, a robust
estimator may be based on or may be "wrapped around" the core
estimator to handle noise and errors in the point-correspondences.
Embodiments may be implemented in composite image generation
systems used to generate composite images from sets of input
component images.
[0015] Embodiments may provide a three-point minimal solution for
panoramic stitching with lens distortion. Embodiments may be
directed at panoramic image alignment, which is the problem of
computing geometric relationships among images for the purpose of
stitching the images into composites. In particular, embodiments
may be directed at feature-based techniques. Embodiments may
provide a minimal solution (a core estimator) for aligning two
images taken by a rotating camera from point-correspondences.
Embodiments in particular address the case where there is lens
distortion in the images. The two camera centers may be assumed to
be known, but not the focal lengths, and the focal lengths may be
allowed to vary. Embodiments may provide a core estimator that uses
a minimal number (three) of point-correspondences, and that is well
suited for use in a hypothesis testing framework (i.e., a robust
estimator). The three-point minimal solution provided by
embodiments of the core estimator does not suffer from numerical
instabilities observed in conventional algebraic minimal
solvers.
[0016] Embodiments of a core estimator may estimate rotation, focal
lengths, and radial distortion using three point correspondence,
which is minimal, and do so at the pairwise stage of a composite
image generation process. Thus, embodiments of the core estimator
may provide a three-point minimal solution. Some embodiments of the
core estimator may work with more than three point-correspondences.
An embodiment of the core estimator is described that is not based
on Algebraic Geometry but is instead based on nonlinear
optimization, and thus does not suffer from numerical instabilities
observed in many conventional minimal solvers. In addition,
embodiments of the core estimator address lens distortion in the
panoramic image alignment problem.
[0017] In one embodiment, for each pair of overlapping images in a
set of component images, a plurality of point-correspondences may
be generated, for example by a feature extraction and feature
matching stage of a composite image generation process. Feature
extraction extracts features from the pair of images, and feature
matching generates the actual point-correspondences from the
extracted features. For each pair of overlapping images in the set
of component images, relative rotation, focal lengths, and radial
distortion for the pair of images may be estimated by the core
estimator from sets of three point-correspondences for the two
images. In one embodiment, a robust estimator or hypothesis testing
framework may select sets of three point-correspondences and feed
the sets to the core estimator. For each set of three
point-correspondences for the pair of overlapping images, an
alignment model for the pair of images may be generated by the
robust estimator from the corresponding relative rotation, focal
lengths, and radial distortion as estimated and output by the core
estimator. An alignment model is a mathematical model that defines
the geometric relationship between two images and that may be
applied to the image data to adjust one or both images into
alignment as part of the process of merging component images into a
composite or panoramic image. In embodiments, an alignment model is
a combination of relative rotation, focal lengths, and radial
distortion.
[0018] An embodiment may use the robust estimator to generate sets
of alignment models for each pair of overlapping images, with the
robust estimator using the core estimator to estimate relative
rotation, focal lengths, and radial distortion (an alignment model)
for each set of three point-correspondences input to the core
estimator. The robust estimator may determine a best alignment
model (best combination of relative rotation, focal lengths, and
radial distortion) for each pair of overlapping images from the
generated alignment models for the overlapping pairs of images and
output the determined best alignment models for all pairs of
overlapping images to a multi-image processing stage of the
composite image generation process. A composite image may then be
generated from the set of component images in accordance with the
determined best alignment models for the set of component
images.
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] FIG. 1 illustrates an exemplary composite image generation
system that includes an exemplary core estimator according to one
embodiment.
[0020] FIG. 2 illustrates an exemplary robust estimator and an
exemplary core estimator in a pairwise stage of a composite image
generation system according to one embodiment.
[0021] FIG. 3 is a flowchart of a method for estimating rotation,
focal lengths, and lens distortion in panoramic image stitching
according to one embodiment.
[0022] FIG. 4 is a flowchart of a method for composite image
generation that uses a core estimator as described herein,
according to one embodiment.
[0023] FIG. 5 is a plot that illustrates the convergence rate of an
embodiment of the core estimator against the distortion coefficient
on random geometry.
[0024] FIGS. 6A and 6B illustrates performance comparisons of
embodiments of the core estimator and a conventional three point
algorithm.
[0025] FIGS. 7A through 7C illustrate a comparison on real images
without lens distortion estimation and with radial distortion
estimation according to one embodiment.
[0026] FIGS. 8A and 8B illustrate the application of multi-image
bundle adjustment according to one embodiment.
[0027] FIGS. 9A through 9C illustrate real-image examples of
multi-image stitching with lens distortion accounted for using a
core estimator according to one embodiment.
[0028] FIG. 10 illustrates an exemplary computer system that may be
used in embodiments.
[0029] While the invention is described herein by way of example
for several embodiments and illustrative drawings, those skilled in
the art will recognize that the invention is not limited to the
embodiments or drawings described. It should be understood, that
the drawings and detailed description thereto are not intended to
limit the invention to the particular form disclosed, but on the
contrary, the intention is to cover all modifications, equivalents
and alternatives falling within the spirit and scope of the present
invention. The headings used herein are for organizational purposes
only and are not meant to be used to limit the scope of the
description. As used throughout this application, the word "may" is
used in a permissive sense (i.e., meaning having the potential to),
rather than the mandatory sense (i.e., meaning must). Similarly,
the words "include", "including", and "includes" mean including,
but not limited to.
DETAILED DESCRIPTION OF EMBODIMENTS
[0030] Various embodiments of a method and apparatus for estimating
relative three-dimensional (3D) camera rotations, focal lengths,
and radial (lens) distortions from point-correspondences in
pairwise (two image) image alignment are described. Embodiments may
provide a core estimator that takes a minimal (three) number of
point-correspondences and returns a rotation, lens (radial)
distortion and two focal lengths (one for each image). Embodiments
may include a robust estimator or hypothesis testing framework may
be based on or may be "wrapped around" the core estimator to handle
noise and errors in point-correspondences. Embodiments may be
implemented in composite image generation systems used to generate
composite images from sets of input component images.
[0031] Using three points may significantly reduce the number of
trials needed by a robust estimator such as the RANSAC (RANdom
SAmple Consensus) algorithm. Embodiments may estimate camera
rotations, focal lengths and lens distortion directly and therefore
may avoid problems that may occur in two-step or other conventional
approaches. Embodiments may handle errors in point-correspondences.
Results of the core estimator and robust estimator may be fed into
any of various algorithms for multi-image stitching. While RANSAC
is used herein as an example of a robust estimator or hypothesis
testing framework that may be used in embodiments, other robust
estimators or hypothesis testing frameworks may be used.
[0032] Embodiments provide a core estimator that includes
correction for lens distortion and that may use a minimum number
(three) of point-correspondences. Embodiments may provide a core
estimator for simultaneous estimation of a single radial distortion
coefficient, a rotation and two focal lengths. Embodiments of the
core estimator may use only three point-correspondences and are
suited for use in hypothesis testing frameworks such as RANSAC.
Although it is possible to use a Grobner basis or a computer
program or application, such as HOMPACK, that solves numerical
equations to solve the resulting polynomial equations, and
embodiments are described that do so, embodiments that solve the
problem using a nonlinear optimization core estimator are also
described. Advantages of using nonlinear optimization to solve the
resulting polynomial equations include being able to make use of
prior knowledge and being free from numerical instability issues.
The cost optimized by embodiments is a geometric one instead of an
algebraic one. Although embodiments of the core estimator may be
more expensive than some other solvers such as a conventional
three-point algorithm, embodiments of the core estimator may be
much faster when the entire pairwise image processing state
including the operations of a hypothesis testing framework (e.g.,
RANSAC) process is considered for images with lens distortion
because embodiments of the core estimator may be able to find more
of the best or most correct point-correspondences for a pair of
images in fewer trials.
[0033] Three-Point Minimal Solution for Panoramic Stitching with
Lens Distortion
[0034] Embodiments may provide a three-point minimal solution for
panoramic stitching with lens distortion. Embodiments may be
directed at panoramic image alignment, which is the problem of
computing geometric relationships among images for the purpose of
stitching them into composites. In particular, embodiments may be
directed at feature-based techniques. Embodiments may provide a
minimal solution (a core estimator) for aligning two images taken
by a rotating camera from point-correspondences. Embodiments in
particular may address the case where there is lens distortion in
the images. The two camera centers may be assumed to be known, but
not the focal lengths, and the focal lengths may be allowed to
vary. Embodiments may provide a core estimator that uses a minimal
number (three) of point-correspondences, and that is well suited
for use in a hypothesis testing framework (i.e., a robust
estimator). The three-point minimal solution provided by
embodiments of the core estimator may not suffer from numerical
instabilities observed in conventional algebraic minimal solvers
and also may be more efficient when compared to conventional
methods. The three-point minimal solution provided by embodiments
of the core estimator may be applied in multi-image panoramic
stitching on real images with lens distortion, as illustrated in
several examples presented in FIGS. 7A through 7C, FIGS. 8A and 8B,
and FIGS. 9A through 9C, which are further described below.
[0035] Embodiments of a core estimator as described herein may
estimate rotation, focal lengths, and radial distortion using three
point-correspondence, which is minimal, and do so at the pairwise
stage. Thus, embodiments of the core estimator may provide a
three-point minimal solution. Some embodiments of the core
estimator may work with more than three point-correspondences.
[0036] Some embodiments may use the division model for radial
distortion, or variations thereof, for panoramic image alignment,
although other models or algorithms may be used in embodiments. An
embodiment of the core estimator is presented that is not based on
Algebraic Geometry but is instead based on nonlinear optimization,
and thus does not suffer from numerical instabilities observed in
many conventional minimal solvers. In addition, embodiments of the
core estimator address lens distortion in the panoramic image
alignment problem.
The Core Two-View Problem
[0037] In this section, a core problem in the pairwise alignment
stage--how to relate lens distortion to point-correspondences along
with other geometric parameters--is addressed. Two cameras are
considered with coincident optical centers viewing three points
P.sub.1, P.sub.2 and P.sub.3. Let X.sub.1.epsilon.R.sup.3 be the
coordinates of P.sub.1 with respect to the reference frame of the
first camera. The imaging process is modeled as an ideal pinhole
projection plus radial distortion. In particular, the pinhole model
says that the projection of P.sub.1 on the imaging plane of the
first camera, q.sub.1.epsilon.R.sup.2, is related to X.sub.1 by a
perspective projection:
q 1 = .pi. ( X 1 ) = [ X 11 X 13 , X 12 X 13 ] T ( 1 )
##EQU00001##
where X.sub.1=[X.sub.11, X.sub.12, X.sub.13].sup.T. The radial
distortion may be modeled, for example, with the division
model:
q 1 = p 1 1 + .kappa. 1 p 1 2 ( 2 ) ##EQU00002##
where p.sub.1.epsilon.R.sup.2 is the radially distorted point and
.kappa..sub.1.epsilon.R is the radial distortion coefficient. Note,
however, that other distortion models may be used in embodiments.
The measurement X.sub.1.epsilon.R2, in image coordinates, is
related to p.sub.1 through a linear transformation K.sub.1
(intrinsic calibration):
x 1 = K 1 p 1 [ f 1 .sigma. 1 0 s 1 f 1 ] p 1 + c 1 ( 3 )
##EQU00003##
where f.sub.1 is the focal length, c.sub.1 is the camera center,
s.sub.1 is the aspect ratio, and .sigma..sub.1 is the skew of the
pixel. K.sub.1 is invertible and its inverse K.sub.1.sup.-1 is
given by:
p 1 = K 1 - 1 x 1 [ f 1 .sigma. 1 0 s 1 f 1 ] - 1 ( x 1 - c 1 ) ( 4
) ##EQU00004##
[0038] Combining equations (1), (2), and (3), the following is
obtained:
X 1 .about. [ K 1 - 1 x 1 1 + .kappa. 1 K 1 - 1 x 1 2 ] ( 5 )
##EQU00005##
where .about. indicates similarity relationship, i.e. the
quantities are equal up to a scale. Let X.sub.2 be the coordinates
of P.sub.1 with respect to the reference frame of the second camera
and x.sub.2 be the radially distorted projection. The following is
obtained:
X 2 .about. [ K 2 - 1 x 2 1 + .kappa. 2 K 2 - 1 x 2 2 ] ( 6 )
##EQU00006##
where .kappa..sub.2 and K.sub.2 are the radial distortion
coefficient and the intrinsic calibration of the second camera
respectively. The two cameras are related by a rotation, R
.epsilon.SO(3); therefore, X.sub.1=RX.sub.2.
[0039] Considering a second point P.sub.2 which has coordinates
Y.sub.1 and Y.sub.2 with respect to the two reference frames, a key
idea for eliminating the rotation is to notice that rotations
preserve angles between vectors:
.theta..sub.X.sub.1.sub.Y.sub.1=.theta..sub.X.sub.2.sub.Y.sub.2
(7)
where .theta.x.sub.1y.sub.1 measures the angle between X.sub.1 and
Y.sub.1. Using equations (5) and (6), angles can be expressed using
distorted projections as:
.theta. X 1 Y 1 = X 1 , Y 1 X 1 Y 1 = [ K 1 - 1 x 1 1 + .kappa. 1 K
1 - 1 x 1 2 ] , [ K 1 - 1 y 1 1 + .kappa. 1 K 1 - 1 y 1 2 ] K 1 - 1
x 1 1 + .kappa. 1 K 1 - 1 x 1 2 K 1 - 1 y 1 1 + .kappa. 1 K 1 - 1 y
1 2 .theta. X 2 Y 2 = X 2 , Y 2 X 2 Y 2 = [ K 2 - 1 x 2 1 + .kappa.
2 K 2 - 1 x 2 2 ] , [ K 2 - 1 y 2 1 + .kappa. 2 K 2 - 1 y 2 2 ] K 2
- 1 x 2 1 + .kappa. 2 K 2 - 1 x 2 2 K 2 - 1 y 2 1 + .kappa. 2 K 2 -
1 y 2 2 ( 8 ) ##EQU00007##
where y.sub.1, y.sub.2.epsilon.R.sub.2 are the radially distorted
projections of P.sub.2 in the two respective cameras.
[0040] To further simplify the problem, the following assumptions
may be made: [0041] the two camera centers are known and coincide
with the respective image centers; [0042] there is no pixel skew
and the pixel aspect ratio is 1, i.e. pixels are square; and [0043]
the focal lengths for the two cameras may vary but the radial
distortion coefficients are the same.
[0044] While the assumption of known camera centers and square
pixels are typical for image stitching algorithms, it may appear
that the assumption of varying focal lengths contradicts that of
constant distortion coefficients. Indeed, it is true that the
distortion coefficient changes when a lens zooms. However, when a
lens does not zoom or the zoom amount is small, the distortion
coefficient approximately stays constant, which is the most common
scenario for panoramic stitching: people do not typically zoom when
they shoot panoramas. Note that it should not be assumed that the
focal lengths stay the same because they may vary when the camera
focuses on objects with different depths even under the same zoom.
Under these assumptions, K.sub.i.sup.-1 o x.sub.i reduces to
1 f i x _ i ##EQU00008##
where x.sub.i x.sub.i-c.sub.i. Equation (8) may be rewritten
as:
1 f 1 2 x _ 1 , y _ 1 + ( 1 + .kappa. f 1 2 x _ 1 2 ) ( 1 + .kappa.
f 1 2 y _ 1 2 ) 1 f 1 2 x _ 1 2 + ( 1 + .kappa. f 1 2 x _ 1 2 ) * 1
f 1 2 y _ 1 2 + ( 1 + .kappa. f 1 2 y _ 1 2 ) = 1 f 1 2 x _ 2 , y _
2 + ( 1 + .kappa. f 2 2 x _ 2 2 ) ( 1 + .kappa. f 2 2 y _ 2 2 ) 1 f
2 2 x _ 2 2 + ( 1 + .kappa. f 2 2 x _ 2 2 ) * 1 f 2 2 y _ 2 2 + ( 1
+ .kappa. f 2 2 y _ 2 2 ) ( 9 ) ##EQU00009##
where .kappa.=.kappa..sub.1=.kappa..sub.2. An additional point
P.sub.3 yields two more equations:
1 f 1 2 y _ 1 , z _ 1 + ( 1 + .kappa. f 1 2 y _ 1 2 ) ( 1 + .kappa.
f 1 2 z _ 1 2 ) 1 f 1 2 y _ 1 2 + ( 1 + .kappa. f 1 2 y _ 1 2 ) * 1
f 1 2 z _ 1 2 + ( 1 + .kappa. f 1 2 z _ 1 2 ) = 1 f 2 2 y _ 2 , z _
2 + ( 1 + .kappa. f 2 2 y _ 2 2 ) ( 1 + .kappa. f 2 2 z _ 2 2 ) 1 f
2 2 y _ 2 2 + ( 1 + .kappa. f 2 2 y _ 2 2 ) * 1 f 2 2 z _ 2 2 + ( 1
+ .kappa. f 2 2 z _ 2 2 ) ( 10 ) 1 f 1 2 z _ 1 , x _ 1 + ( 1 +
.kappa. f 1 2 z _ 1 2 ) ( 1 + .kappa. f 1 2 x _ 1 2 ) 1 f 1 2 z _ 1
2 + ( 1 + .kappa. f 1 2 z _ 1 2 ) * 1 f 1 2 x _ 1 2 + ( 1 + .kappa.
f 1 2 x _ 1 2 ) = 1 f 2 2 z _ 2 , x _ 2 + ( 1 + .kappa. f 2 2 z _ 2
2 ) ( 1 + .kappa. f 2 2 x _ 2 2 ) 1 f 2 2 z _ 2 2 + ( 1 + .kappa. f
2 2 z _ 2 2 ) * 1 f 2 2 y _ 2 2 + ( 1 + .kappa. f 2 2 y _ 2 2 ) (
11 ) ##EQU00010##
where x.sub.1, z.sub.2.epsilon.R.sup.2 are the radially distorted
projections of P.sub.3. There are three unknowns (f.sub.1, f.sub.2
and .kappa.) in equations (9-11). These three equations are
generally independent and sufficient to determine the unknowns. On
the other hand, it would not be possible to derive three equations
from less than three point-correspondences. Therefore, three is the
minimal number of point-correspondences.
Core Estimators
[0045] Several methods for solving equations (9-11) are described
that may be used in various embodiments of a core estimator. Some
embodiments of a core estimator may be based on a computer program
or application that solves numerical equations, such as HOMPACK,
which uses homotopy methods. An embodiment of a core estimator
based on HOMPACK is described. In addition, a core estimator based
on a Grobner basis is described. An embodiment of a core estimator
based on nonlinear optimization is also described.
[0046] Equations (9-11) may be rewritten into a set of polynomials
equations by squaring both sides and re-arranging the terms. The
results yield equations (12):
( x.sub.1, y.sub.1+F.sub.1(1+.lamda..sub.1.parallel.
x.sub.1.parallel..sup.2)(1+.lamda..sub.1.parallel.
y.sub.1.parallel..sup.2)).sup.2(.parallel.
x.sub.2.parallel..sup.2+F.sub.2(1+.lamda..sub.2.parallel.
x.sub.2.parallel..sup.2).sup.2)(.parallel.
y.sub.2.parallel.+F.sub.2(1+.lamda..sub.2.parallel.
y.sub.2.parallel..sup.2).sup.2)=( x.sub.2,
y.sub.2+F.sub.2(1+.lamda..sub.2.parallel.
x.sub.2.parallel..sup.2)(1+.lamda..sub.2.parallel.
y.sub.2.parallel..sup.2)).sup.2(.parallel.
x.sub.1.parallel..sup.2+F.sub.1(1+.lamda..sub.1.parallel.
x.sub.1.parallel..sup.2).sup.2)(.parallel.
y.sub.1.parallel.+F.sub.1(1+.lamda..sub.1.parallel.
y.sub.1.parallel..sup.2).sup.2)
( y.sub.1, z.sub.1+F.sub.1(1+.lamda..sub.1.parallel.
y.sub.1.parallel..sup.2)(1+.lamda..sub.1.parallel.
z.sub.1.parallel..sup.2)).sup.2(.parallel.
y.sub.2.parallel..sup.2+F.sub.2(1+.lamda..sub.2.parallel.
y.sub.2.parallel..sup.2).sup.2)(.parallel.
z.sub.2.parallel.+F.sub.2(1+.lamda..sub.2.parallel.
z.sub.2.parallel..sup.2).sup.2)=( y.sub.2,
z.sub.2+F.sub.2(1+.lamda..sub.2.parallel.
y.sub.2.parallel..sup.2)(1+.lamda..sub.2.parallel.
z.sub.2.parallel..sup.2)).sup.2(.parallel.
y.sub.1.parallel..sup.2+F.sub.1(1+.lamda..sub.1.parallel.
y.sub.1.parallel..sup.2).sup.2)(.parallel.
z.sub.1.parallel.+F.sub.1(1+.lamda..sub.1.parallel.
z.sub.1.parallel..sup.2).sup.2)
( z.sub.1, x.sub.1+F.sub.1(1+.lamda..sub.1.parallel.
z.sub.1.parallel..sup.2)(1+.lamda..sub.1.parallel.
x.sub.1.parallel..sup.2)).sup.2(.parallel.
z.sub.2.parallel..sup.2+F.sub.2(1+.lamda..sub.2.parallel.
z.sub.2.parallel..sup.2).sup.2)(.parallel.
x.sub.2.parallel.+F.sub.2(1+.lamda..sub.2.parallel.
x.sub.2.parallel..sup.2).sup.2)=( z.sub.2,
x.sub.2+F.sub.2(1+.lamda..sub.2.parallel.
z.sub.2.parallel..sup.2)(1+.lamda..sub.2.parallel.
x.sub.2.parallel..sup.2)).sup.2(.parallel.
z.sub.1.parallel..sup.2+F.sub.1(1+.lamda..sub.1.parallel.
z.sub.1.parallel..sup.2).sup.2)(.parallel.
x.sub.1.parallel.+F.sub.1(1+.lamda..sub.1.parallel.
x.sub.1.parallel..sup.2).sup.2)
where F.sub.i:=f.sub.i.sup.2 and
.lamda..sub.i:=.kappa./f.sub.i.sup.2, i=1, 2. F.sub.i and
.lamda..sub.i are related by:
.lamda..sub.1F.sub.1=.lamda..sub.2F.sub.2 (13)
[0047] It can be verified that equations (12) and (13) are indeed
sufficient to determine all four unknowns, F.sub.1, F.sub.2,
.lamda..sub.1 and .lamda..sub.2. It is possible to further
constrain the problem by noticing the following relationship:
[ X 1 , Y 1 , Z 1 ] X 1 Y 1 Z 1 = [ X 2 , Y 2 , Z 2 ] X 2 Y 2 Z 2 (
14 ) ##EQU00011##
where [X,Y,Z] denotes the scalar triple product: (X, Y.times.Z),
for any vectors X,Y,Z .epsilon.R.sup.3. This triple product based
constraint is not algebraically independent but can be used to
remove extraneous solutions nevertheless. To be more precise, there
are 96 solutions, both real and complex, to equations (12) and
(13), out of which 54 satisfy (14).
[0048] Equations (12) may be solved jointly. From the solution, the
following may be computed:
f.sub.1= {square root over (F.sub.1)}
f.sub.2= {square root over (F.sub.2)}
Numerical Solution-Based Core Estimators
[0049] The equations (12) may be solved numerically, for example
using a computer program or application that solves numerical
equations. An example of such a program that may be used is
HOMPACK. HOMPACK is a suite of subroutines for solving nonlinear
systems of equations using homotopy methods. Other methods for
solving the equations may be used. However, a numerical solution
such as a solution that uses HOMPACK to solve the equations may
tend to suffer from numerical instability.
Grobner Basis Core Estimator
[0050] It is possible to construct a Grobner basis from equations
(12) and (13) and solve for the unknowns. However, Grobner
basis-based methods tend to suffer from considerable numerical
instabilities for problems of high degree when implemented
numerically.
[0051] Solving the equations numerically, for example using
HOMPACK, generally results in multiple solution triples. However,
not all the solutions are real, and real solutions are sought.
Moreover, solutions are sought that satisfy:
F.sub.1>0
F.sub.2>0
as only those solutions will lead to real f.sub.1 and f.sub.2. Once
f.sub.1, f.sub.2 and .kappa. are found, any one of several methods
or algorithms may be used to compute R. For example, any one of
several feature-based techniques may be used to compute R.
Heuristics to Remove Uninteresting Solutions
[0052] For solutions that are produced by solving the equations
numerically, since real-world focal lengths tend to be within a
range (for instance, it may be safely assumed that common focal
lengths are within the range of 5 mm to 1000 mm), in one
embodiment, solutions that have unrealistic focal lengths (focal
lengths outside a given range) may be removed. In one embodiment,
the lens distortion parameter may be assumed to be within a range,
for example -1 to 1, and solutions with lens distortion parameters
outside of the assumed range may be removed.
Pre-Normalization
[0053] To make the equations well behaved, in one embodiment, the
coordinates of points in the three point-correspondences may be
pre-normalized by estimates of the focal lengths. The estimates
may, for example, be obtained based on Exchangeable Image File
Format (EXIF) data in the images, from image data in some other
format than EXIF, or from image dimensions. For instance, according
to EXIF data, a rough estimate for the focal length may be
calculated. If EXIF (or other) data are not available, f may be
estimated to be half of the sum of image width and height, which
approximately corresponds to 30 mm focal length on full-frame
digital SLRs. Similar estimations may be applied for digital SLRs
of different form factors. The form factor of digital SLR cameras
may be defined as the relative physical size of an imaging sensor
with respect to that of a 35 mm film camera. For instance, for a
camera with form factor 1.6, the formula ((width+height)/2*1.6) may
be used to estimate an effective equivalent to 30 mm focal length
on a full-frame digital SLR. Assuming f.sub.o as a
pre-normalization constant, (u.sub.i<-u.sub.1/f.sub.0) may be
applied to pre-normalize. At the end of calculation,
(f.sub.1<-f.sub.1*f.sub.0) may be applied.
Core Estimator Based on Nonlinear Optimization
[0054] Embodiments of a core estimator based on nonlinear
optimization are described, which may be referred to herein as a
nonlinear optimization core estimator. In addition to suffering
from numerical instability issues, the previously described methods
such as those based on a Grobner basis make no use of prior
knowledge in a given problem. For instance, in the absence of any
prior knowledge, it is still known that the two focal lengths are
real and positive and that the distortion coefficient is a small
real number around 0. In practice, known ranges for the focal
lengths and distortion coefficients can often be obtained, for
example from EXIF data in the images. A more efficient core
estimator may be obtained by taking advantage of the prior
knowledge.
[0055] The root-seeking problem is cast into an optimization
framework. In particular, the following objective function is
minimized:
(.theta..sub.X.sub.1.sub.Y.sub.1-.theta..sub.X.sub.2.sub.Y.sub.2).sup.2+-
(.theta..sub.Y.sub.1.sub.Z.sub.1-.theta..sub.Y.sub.2.sub.Z.sub.2).sup.2+(.-
theta..sub.Z.sub.1.sub.X.sub.1-.theta..sub.Z.sub.2.sub.X.sub.2).sup.2
(15)
[0056] It is obvious that the roots to equations (12) and (13) are
the minima. Note that cost (15) is not an arbitrary algebraic
quantity, but is geometrically meaningful. In fact, it measures the
cumulative difference between corresponding angles. Since cost (15)
is in a form of nonlinear least squares, a method, for example a
Levenberg-Marquardt algorithm, with analytical derivatives may be
used to perform the optimization. Other methods may be used as
well. The initial values for the unknowns may be obtained as
follows. Prior knowledge for K may be used as the initial value
(.kappa..sup.0) since the distortion coefficient usually does not
vary significantly. In the absence of prior knowledge,
.kappa..sup.0=0 may be used. Equations (12) may then be solved,
assuming .kappa. is known, to obtain initial values for (f.sub.1,
f.sub.2). In particular, given .kappa.=.kappa..sup.0, equations
(12) may be reduced to:
{ ( x ~ 1 , y ~ 1 + F ~ 1 ) 2 ( x ~ 2 2 + F ~ 2 ) ( y ~ 2 2 + F ~ 2
) = ( x ~ 2 , y ~ 2 + F ~ 2 ) 2 ( x ~ 1 2 + F ~ 1 ) ( y ~ 1 2 + F ~
1 ) , ( y ~ 1 , z ~ 1 + F ~ 1 ) 2 ( y ~ 2 2 + F ~ 2 ) ( z ~ 2 2 + F
~ 2 ) = ( y ~ 2 , z ~ 2 + F ~ 2 ) 2 ( y ~ 1 2 + F ~ 1 ) ( z ~ 1 2 +
F ~ 1 ) , ( z ~ 1 , x ~ 1 + F ~ 1 ) 2 ( z ~ 2 2 + F ~ 2 ) ( x ~ 2 2
+ F ~ 2 ) = ( z ~ 2 , x ~ 2 + F ~ 2 ) 2 ( z ~ 1 2 + F ~ 1 ) ( x ~ 1
2 + F ~ 1 ) where : ( 16 ) x ~ i = x _ i / f i P 1 + .kappa. 0 x _
i / f i P 2 , y ~ i = y _ i / f i P 1 + .kappa. 0 y _ i / f i P 2 ,
z ~ i = z _ i / f i P 1 + .kappa. 0 z _ i / f i P 2 ( 17 )
##EQU00012##
and f.sub.1.sup.p and f.sub.2.sup.p are given by the prior
knowledge (f.sub.1.sup.p=0 and f.sub.2.sup.p=0 may be used in the
absence of prior knowledge). {tilde over (F)}.sub.1 and {tilde over
(F)}.sub.2 may be solved using any one of several techniques.
Finally, the initial values for f.sub.1 and f.sub.2 may be given
by:
f.sub.i.sup.0=f.sub.i.sup.p {square root over ({tilde over
(F)}.sub.i)},i=1,2. (18)
[0057] Note that the Levenberg-Marquardt part is a fairly small
problem (three unknowns and three squared terms) and may be
implemented very efficiently.
Solving for the Rotation
[0058] Once the focal lengths and the distortion coefficient are
known, the rotation may be computed. Using equation (5),
X 1 X 1 ##EQU00013##
may be computed as follows:
X 1 X 1 = 1 1 f 1 2 x _ 1 2 + ( 1 + .kappa. f 1 2 x _ 1 2 ) 2 [ x _
1 / f 1 1 + .kappa. f 1 2 x _ 1 2 ] ( 19 ) ##EQU00014##
Similarly,
[0059] X 2 X 2 , Y 1 Y 1 , Y 2 Y 2 , Z 1 Z 1 , and X 2 X 2
##EQU00015##
may be computed. Any of various mechanisms may then be invoked to
obtain the rotation. Robust Solutions and Bundle Adjustment Core
estimators such as the nonlinear optimization-based core estimator
presented above are not intended to be used directly on
point-correspondences because core estimators may be limited in the
number of points they can handle and do not handle outliers or
noise in point-correspondences. Embodiments of the core estimator
may thus be used in a hypothesis testing framework or robust
estimator, such as RANSAC, so that the robust estimator may handle
outliers and noise.
[0060] It may be necessary or desirable to further refine the
parameters obtained by the robust solutions for better results.
This step is known as bundle adjustment. In general, there are two
categories or types of bundle adjustments: pairwise and multi-image
bundle adjustments. Multi-image bundle adjustment is described
briefly. Pairwise bundle adjustment may be considered as a special
case. In some embodiments, a multi-image bundle adjustment
algorithm may be used to optimize the following geometric cost
function:
i = 1 M j = 1 N w ij x ^ ij ( .theta. j , .phi. j ; R i , f i , k i
c i ) - xi j 2 ( 20 ) ##EQU00016##
where M is the number of images and N is the number of chains of
consistent point-correspondences. Consistent means that all the
points are projections of the same point in space. This point is
denoted as X.sub.j which is parameterized by spherical coordinates
(.theta..sub.j, .phi..sub.j) with respect to a chosen reference
frame, i.e.:
Xj=[cos(.theta..sub.j)cos(.phi..sub.j),
cos(.theta..sub.j)sin(.phi..sub.j), sin(.theta..sub.j)].sup.T
where is the measured projection of X.sub.j in the i-th image and
is the associated weight w.sub.ij=0 if X.sub.j does not appear in
the i-th image; otherwise, it is a positive number. R.sub.i,
f.sub.i, k.sub.i, and c.sub.i are the rotation, focal length,
radial distortion coefficient and image center of the i-th image
respectively. {circumflex over (x)}.sub.ij is the measurement
equation given by:
{circumflex over
(x)}.sub.ij(.theta..sub.j,.phi..sub.j;R.sub.i,f.sub.i,k.sub.i|c.sub.i)=f.-
sub.i{circumflex over (k)}.sub.i(.pi.(R.sub.iX);k.sub.i)+c.sub.i
(21)
where:
{circumflex over
(k)}.sub.i(q;k.sub.i)=q(1+k.sub.i1.parallel.q.parallel..sup.2+k.sub.i2.pa-
rallel.q.parallel..sup.4
for any q .epsilon.R.sup.2 where k.sub.i=[k.sub.i1,
k.sub.i2].sup.T. In one embodiment, a distortion model as described
by Zhang (Z. Zhang. A flexible new technique for camera
calibration. IEEE Trans. on Pattern Analysis and Machine
Intelligence, 22(11):1330-1334, November 2000) may be used rather
than the division model that was previously described because
Zhang's distortion model has two parameters and better represents
the distortion effects. Note, however, that embodiments are not
limited to a particular distortion model. It is possible to go from
the division model to Zhang's distortion model by noticing the
following relationship in equation (2):
p=q(1+.kappa..parallel.p.parallel..sup.2)=q(1+.kappa..parallel.q.paralle-
l..sup.2+2.kappa..sup.2.parallel.q.parallel..sup.4+O(.parallel.q.parallel.-
.sup.6)) (22)
[0061] The unknowns in equation (20) are .theta..sub.j,
.phi..sub.j, j=1, . . . , N, and R.sub.i; fi, k.sub.i, i=1, . . . ,
M. Observing that cost (20) is in a nonlinear least squares form,
it may be optimized, for example by using Levenberg-Marquardt which
can be implemented efficiently using sparse techniques. Other
methods to optimize cost (20) may be used.
Implementations
[0062] Embodiments provide a core estimator that includes a
correction for lens distortion and that may use a minimum number
(three) of point-correspondences. Embodiments provide a core
estimator for simultaneous estimation of a single radial distortion
coefficient, a rotation and two focal lengths. Embodiments of the
core estimator, for example an embodiment of a nonlinear
optimization core estimator as described above, may be implemented
in any of various image processing applications or systems that
perform panoramic image stitching. As mentioned, embodiments of the
core estimator may be implemented with or wrapped in a "robust
estimator" or hypothesis testing framework that handles noise and
errors in point-correspondences in the pairwise (two image) stage.
An exemplary hypothesis testing framework or robust estimator that
may be used is RANSAC.
[0063] FIG. 1 illustrates an exemplary composite image generation
system that includes an exemplary core estimator according to one
embodiment. Composite image generation system 200 may be a computer
program, application, system or component that may execute on a
computer system or combination of two or more computer systems. An
exemplary computer system on which composite image generation
system 200 may be implemented is illustrated in FIG. 10. A set of
two or more component images 100 may be input into the composite
image generation system 200.
[0064] In one embodiment, pairwise stage 220 may implement a robust
estimator 204. Robust estimator 204 may be implemented as a
hypothesis testing framework such as RANSAC. In one embodiment, for
each pair of overlapping images in component images 100, robust
estimator 204 takes a large number of point-correspondences between
the two images (e.g., using random selection) and attempts to find
a best alignment model by iterating through small subsets of the
point-correspondences. The robust estimator 204 starts with a small
set of point-correspondences, finds an alignment model for the set,
and verifies the alignment model against the entire set of
point-correspondences. This is repeated for other small sets of
point-correspondences (each set containing three
point-correspondences). An alignment model is a mathematical model
that defines the geometric relationship between two images and that
may be applied to the image data to adjust one or both images into
alignment as part of the process of merging component images into a
composite or panoramic image. In embodiments, an alignment model is
a combination of rotation, focal lengths, and radial
distortion.
[0065] Core estimator 202, for example a nonlinear optimization
core estimator as described above, may be used in finding alignment
models corresponding to sets of point-correspondences. Core
estimator 202 accepts three point-correspondences (the minimum) and
estimates rotation, focal lengths, and radial distortion for the
corresponding pair of images using three-point correspondence.
Several embodiments of a core estimator 202 that may be used are
described above. The robust estimator 204 determines, tracks, and
records the best or most correct sets of point-correspondences that
are estimated and output by core estimator 202. From a best or most
correct set of point-correspondences, a best alignment model for
the two images can be found. Thus, robust estimator 204 tries many
sets of point-correspondences using core estimator 202 to estimate
corresponding rotation, focal lengths, and radial distortion, finds
a best or most correct three-point correspondence, and thus
determines a best alignment model corresponding to the best or most
correct three-point correspondence.
[0066] Once pairwise stage 220 has processed all of the component
images 100, the determined geometric relationships 112 (e.g.,
alignment models) may be passed to multi-image stage 208, which may
then generate the composite image 114 from the component images 100
using the information in geometric relationships 112.
[0067] FIG. 2 illustrates an exemplary robust estimator and an
exemplary core estimator in a pairwise stage of a composite image
generation system according to one embodiment. A set of component
images 100 may be input into a composite image generation system.
Feature extraction and feature matching 102 may be performed to
extract features and generate point-correspondences from the
extracted features for pairs of images 100 that overlap. Robust
estimator 204 may be implemented as a hypothesis testing framework
such as RANSAC. In one embodiment, for each pair of overlapping
images in component images 100, robust estimator 204 takes a large
number of point-correspondences between the two images (e.g., using
random selection or some other selection method) and attempts to
find a best alignment model (rotation, focal lengths, and radial
distortion) by iterating through small subsets of the
point-correspondences. The robust estimator 204 starts with a set
of three point-correspondences 104 for two images, finds an
alignment model for the two images from the set 104 using core
estimator 202, and verifies the alignment model against all
point-correspondences for the two images. This is repeated for
other small sets of point-correspondences 104 for the two images,
each set containing three point-correspondences.
[0068] Core estimator 202 accepts a set of three
point-correspondences 104 and estimates rotation, focal lengths,
and radial distortion using three-point correspondence. Several
embodiments of a core estimator 202 that may be used are described
above. For example, in one embodiment, a nonlinear optimization
core estimator may be used. The robust estimator 204 determines,
tracks, and records the best or most correct sets of
point-correspondences that are estimated and output by core
estimator 202. From a most correct set of point-correspondences, a
best alignment model for the two images can be found. The robust
estimator may then move to a next pair of overlapping images 100
and repeat the process until all pairs of images are processed.
Thus, robust estimator 204 tries many sets of three
point-correspondences 104 for each pair of overlapping images 100
using core estimator 202, finds a most correct point correspondence
for each pair of overlapping images, and thus a best alignment
model for each pair of overlapping images.
[0069] Once all of the pairs of overlapping images in component
images 100 have been processed, the alignment models 110 may be
passed to a multi-image processing stage of the composite image
generation system, which may generate a composite image from the
component images 100 using at least the alignment models 110.
[0070] FIG. 3 is a flowchart of a method for estimating rotation,
focal lengths, and lens distortion in panoramic image stitching
according to one embodiment. As indicated at 300, relative
rotation, focal lengths, and radial distortion may be estimated for
a pair of images from sets of three point-correspondences. In
embodiments, a core estimator as described herein, such as the
nonlinear optimization core estimator, may be used to estimate
relative rotation, focal lengths, and radial distortion for each
set of three point-correspondences. In one embodiment, the core
estimator may be "wrapped" in a robust estimator, for example a
hypothesis testing framework. RANSAC is an exemplary hypothesis
testing framework that may be used as a robust estimator in
embodiments, but other hypothesis testing frameworks or robust
estimators may be used. The robust estimator may feed the sets of
three point-correspondences to the core estimator, which may output
the estimated relative rotation, focal lengths, and radial
distortion for each set to the robust estimator.
[0071] As indicated at 302, for each set of three
point-correspondences, an alignment model for the pair of images
may be generated from the corresponding estimated relative
rotation, focal lengths, and radial distortion. In one embodiment,
the robust estimator may generate the alignment models. A best
alignment model for the pair of images may be determined from the
generated alignment models, as indicated at 304.
[0072] The pair of images may, for example, be overlapping images
from a plurality of component images taken of a panoramic scene.
The robust estimator and core estimator may perform the method
described above for each discrete pair of overlapping images in the
plurality of component images in a pairwise stage of a composite
image generation process. The output from the method or pairwise
stage (geometric relationships among the images, including but not
limited to the alignment models) may then be applied to the input
component images in a multi-image state to stitch the plurality of
component images into a panoramic image.
[0073] FIG. 4 is a flowchart of a method for composite image
generation that uses a core estimator as described herein,
according to one embodiment. As indicated at 400, for each pair of
overlapping images in a set of component images, a plurality of
point-correspondences may be generated, for example in a feature
extraction and feature matching stage of a composite image
generation process. As indicated at 402, for each pair of
overlapping images in the set of component images, relative
rotation, focal lengths, and radial distortion for the pair of
images may be estimated from sets of three point-correspondences
for the two images. In one embodiment, a robust estimator or
hypothesis testing framework may select sets of three
point-correspondences and feed the sets to an embodiment of the
core estimator as described herein, such as a nonlinear
optimization core estimator. The core estimator may estimate
relative rotation, focal lengths, and radial distortion from each
set of three point-correspondences.
[0074] As indicated at 404, for each set of three
point-correspondences for the pair of overlapping images, an
alignment model for the pair of images may be generated by the
robust estimator from the corresponding relative rotation, focal
lengths, and radial distortion as estimated and output by the core
estimator. The robust estimator may thus generate sets of alignment
models for each pair of overlapping images, using the core
estimator to estimate relative rotation, focal lengths, and radial
distortion for each set of three point-correspondences input to the
core estimator. As indicated at 406, the robust estimator may
determine a best alignment model for each pair of overlapping
images from the generated alignment models for the overlapping
pairs of images.
[0075] The robust estimator may output the best alignment models to
a multi-image processing stage of the composite image generation
process. A composite image may then be generated from the set of
component images in accordance with the determined best alignment
models for the set of component images, as indicated at 408.
[0076] While embodiments of the core estimator are generally
described as working with sets of three point-correspondences, some
embodiments of the core estimator may accept more than three
point-correspondences for a pair of images and estimate relative
rotation, focal lengths, and radial distortion for the pair of
images according to the input more than three
point-correspondences.
[0077] FIG. 5 is a plot that illustrates the convergence rate of an
embodiment of the core estimator against the distortion coefficient
on random geometry. An experiment was performed to test the
convergence rate of the nonlinear optimization-based two-view core
estimator. To that end, synthetic data was used that provided
ground truth. For a given distortion coefficient, three noise-free
point-correspondences were generated from random geometry according
to equation (2). In particular, three points in space were randomly
generated whose projections in one image were uniformly distributed
in [-0.5, 0.5].times.[-0.5, 0.5] and whose depths were uniformly
distributed in [1.3, 1.7]; the axis of the rotation between two
images was randomly sampled within a 30.degree. cone around the
y-axis. The two focal lengths are randomly sampled in [0.5, 1.5]
which corresponds to a range from 17 mm to 50 mm for 35 mm film
cameras. These settings are typical for panoramas. The
point-correspondences were fed into the core estimator and recorded
if the algorithm found the best solution. For each distortion
coefficient, the test was repeated 10,000 times and the whole
process was repeated for 51 values of the distortion coefficient
ranging uniformly from -0.25 to 0.25. The results are presented in
FIG. 5. As can be seen in FIG. 5, the core estimator is able to
converge correctly over 80% time for distortion from -0.14 to
0.25.
[0078] The performance of some embodiments of the core estimator
may degrade for distortion coefficients lower than -0.14. Some
embodiments of the core estimator may perform better for pincushion
distortion (positive .kappa.) than barrel distortion (negative
.kappa.). Note that it is not necessary to have a 100% convergence
rate because the core estimator is generally intended for use in a
hypothesis testing, or robust estimator, framework.
[0079] FIGS. 6A and 6B illustrates performance comparisons of
embodiments of the core estimator and a conventional three point
algorithm. An experiment was performed to check if an embodiment of
the core estimator was able to retain more correct correspondences
than a conventional algorithm that did not estimate lens
distortion. The algorithm compared with is a conventional
three-point algorithm. Again, synthetic data was used for the sake
of ground truth. Both the embodiment of the core estimator and the
conventional three point algorithm were wrapped in a RANSAC
framework. For each distortion coefficient, 200 noisy
point-correspondences were generated from random geometry, which is
the same as in the first test. The noise added to
point-correspondences was zero-mean Gaussian with standard
deviation set to 0.1% of the image width. The maximum number of
trials for RANSAC was set to 500 and the desired confidence was set
to 0.995. For each distortion coefficient, the test was repeated
10,000 times. The results are presented in FIGS. 6A and 6B, where
the solid lines are the results of the embodiment of the core
estimator and the dashed lines are the results of the conventional
three point algorithm. FIG. 6A shows the percentage of best or most
correct correspondences that the core estimator (solid line) and
the conventional three point algorithm (dashed line) may retain for
distortion ranging from -0.25 to 0.25. The core estimator averages
at above 75% while that of the conventional three point algorithm
is considerably lower for most distortion coefficients. FIG. 6B
shows the number of trials needed to obtain a RANSAC confidence of
0.995. The core estimator (solid line) needs only about 15 trials
on average while the conventional three point algorithm (dashed
line) needs many more trials on average.
[0080] From the above, the tested embodiment of the core estimator
outperforms the conventional three point algorithm. In particular,
the core estimator is able to retain over 75% best or most correct
point-correspondences in about 15 trials on average. An important
implication of these two plots is that, although the core estimator
may be more expensive than a conventional three-point algorithm,
the entire RANSAC process with the core estimator on images with
lens distortion may be significantly faster because of fewer trials
and a higher inlier ratio.
[0081] FIGS. 7A through 7C illustrate a comparison on real images
without lens distortion estimation and with radial distortion
estimation according to one embodiment. The two images in FIG. 7A
are input images. SIFT features may be used. SIFT (Scale-invariant
feature transform) is an algorithm in computer vision to detect and
describe local features in images. The image in FIG. 7B is the
result obtained without lens distortion estimation. The composition
mode is cylindrical. The two images from FIG. 7A are alpha-blended
with equal weights in the overlapping regions. There are visible
misalignments, for example in the crosswalk region. The image in
FIG. 7C is the result obtained with lens distortion estimation
using an embodiment of the core estimator as described herein.
Again, the two images from FIG. 7A are alpha-blended with equal
weights in the overlapping regions. There is no visible
misalignment in the image in FIG. 7C.
[0082] FIGS. 8A and 8B illustrate the application of multi-image
bundle adjustment according to one embodiment. The composite image
in FIG. 8A was generated with pairwise bundle adjustment but
without multi-image bundle adjustment, while the composite image in
FIG. 8B was generated with both pairwise bundle adjustment and
multi-image bundle adjustment. Lens distortion is estimated in both
cases. Images are simply stacked one onto another without
alpha-blending. The alignment is observably better in the composite
image of FIG. 8B to which multi-image bundle adjustment was
applied.
[0083] FIGS. 9A through 9C illustrate real-image examples of
multi-image stitching with lens distortion accounted for using a
core estimator according to one embodiment.
[0084] FIG. 9A shows a composite image of the Golden Gate bridge
stitched from six input images. The image shown in FIG. 9B is a
stitch of the Copacabana beach from 35 input images. The image
shown in FIG. 9C is a full 360.degree. panorama stitched from 23
input images. Features were extracted using SIFT, and blending was
performed. Note that various methods for extracting features and/or
for blending may be used in embodiments.
Exemplary System
[0085] Various components of embodiments of a method and apparatus
for estimating rotation, focal lengths, and lens distortion in
panoramic image stitching may be executed on one or more computer
systems, which may interact with various other devices. One such
computer system is illustrated by FIG. 10. In the illustrated
embodiment, computer system 700 includes one or more processors 710
coupled to a system memory 720 via an input/output (I/O) interface
730. Computer system 700 further includes a network interface 740
coupled to I/O interface 730, and one or more input/output devices
750, such as cursor control device 760, keyboard 770, audio device
790, and display(s) 780. In some embodiments, it is contemplated
that embodiments may be implemented using a single instance of
computer system 700, while in other embodiments multiple such
systems, or multiple nodes making up computer system 700, may be
configured to host different portions or instances of embodiments.
For example, in one embodiment some elements may be implemented via
one or more nodes of computer system 700 that are distinct from
those nodes implementing other elements.
[0086] In various embodiments, computer system 700 may be a
uniprocessor system including one processor 710, or a
multiprocessor system including several processors 710 (e.g., two,
four, eight, or another suitable number). Processors 710 may be any
suitable processor capable of executing instructions. For example,
in various embodiments, processors 710 may be general-purpose or
embedded processors implementing any of a variety of instruction
set architectures (ISAs), such as the x86, PowerPC, SPARC, or MIPS
ISAs, or any other suitable ISA. In multiprocessor systems, each of
processors 710 may commonly, but not necessarily, implement the
same ISA.
[0087] System memory 720 may be configured to store program
instructions and/or data accessible by processor 710. In various
embodiments, system memory 720 may be implemented using any
suitable memory technology, such as static random access memory
(SRAM), synchronous dynamic RAM (SDRAM), nonvolatile/Flash-type
memory, or any other type of memory. In the illustrated embodiment,
program instructions and data implementing desired functions, such
as those described above for a method and apparatus for estimating
rotation, focal lengths, and lens distortion in panoramic image
stitching, are shown stored within system memory 720 as program
instructions 725 and data storage 735, respectively. In other
embodiments, program instructions and/or data may be received, sent
or stored upon different types of computer-accessible media or on
similar media separate from system memory 720 or computer system
700. Generally speaking, a computer-accessible medium may include
storage media or memory media such as magnetic or optical media,
e.g., disk or CD/DVD-ROM coupled to computer system 700 via I/O
interface 730. Program instructions and data stored via a
computer-accessible medium may be transmitted by transmission media
or signals such as electrical, electromagnetic, or digital signals,
which may be conveyed via a communication medium such as a network
and/or a wireless link, such as may be implemented via network
interface 740.
[0088] In one embodiment, I/O interface 730 may be configured to
coordinate I/O traffic between processor 710, system memory 720,
and any peripheral devices in the device, including network
interface 740 or other peripheral interfaces, such as input/output
devices 750. In some embodiments, I/O interface 730 may perform any
necessary protocol, timing or other data transformations to convert
data signals from one component (e.g., system memory 720) into a
format suitable for use by another component (e.g., processor 710).
In some embodiments, I/O interface 730 may include support for
devices attached through various types of peripheral buses, such as
a variant of the Peripheral Component Interconnect (PCI) bus
standard or the Universal Serial Bus (USB) standard, for example.
In some embodiments, the function of I/O interface 730 may be split
into two or more separate components, such as a north bridge and a
south bridge, for example. In addition, in some embodiments some or
all of the functionality of I/O interface 730, such as an interface
to system memory 720, may be incorporated directly into processor
710.
[0089] Network interface 740 may be configured to allow data to be
exchanged between computer system 700 and other devices attached to
a network, such as other computer systems, or between nodes of
computer system 700. In various embodiments, network interface 740
may support communication via wired or wireless general data
networks, such as any suitable type of Ethernet network, for
example; via telecommunications/telephony networks such as analog
voice networks or digital fiber communications networks; via
storage area networks such as Fibre Channel SANs, or via any other
suitable type of network and/or protocol.
[0090] Input/output devices 750 may, in some embodiments, include
one or more display terminals, keyboards, keypads, touchpads,
scanning devices, voice or optical recognition devices, or any
other devices suitable for entering or retrieving data by one or
more computer system 700. Multiple input/output devices 750 may be
present in computer system 700 or may be distributed on various
nodes of computer system 700. In some embodiments, similar
input/output devices may be separate from computer system 700 and
may interact with one or more nodes of computer system 700 through
a wired or wireless connection, such as over network interface
740.
[0091] As shown in FIG. 10, memory 720 may include program
instructions 725, configured to implement embodiments of a method
and apparatus for estimating rotation, focal lengths, and lens
distortion in panoramic image stitching as described herein, and
data storage 735, comprising various data accessible by program
instructions 725. In one embodiment, program instructions 725 may
include software elements of a method and apparatus for estimating
rotation, focal lengths, and lens distortion in panoramic image
stitching as illustrated in the above Figures. Data storage 735 may
include data that may be used in embodiments. In other embodiments,
other or different software elements and data may be included.
[0092] Those skilled in the art will appreciate that computer
system 700 is merely illustrative and is not intended to limit the
scope of a method and apparatus for estimating rotation, focal
lengths, and lens distortion in panoramic image stitching as
described herein. In particular, the computer system and devices
may include any combination of hardware or software that can
perform the indicated functions, including computers, network
devices, internet appliances, PDAs, wireless phones, pagers, etc.
Computer system 700 may also be connected to other devices that are
not illustrated, or instead may operate as a stand-alone system. In
addition, the functionality provided by the illustrated components
may in some embodiments be combined in fewer components or
distributed in additional components. Similarly, in some
embodiments, the functionality of some of the illustrated
components may not be provided and/or other additional
functionality may be available.
[0093] Those skilled in the art will also appreciate that, while
various items are illustrated as being stored in memory or on
storage while being used, these items or portions of them may be
transferred between memory and other storage devices for purposes
of memory management and data integrity. Alternatively, in other
embodiments some or all of the software components may execute in
memory on another device and communicate with the illustrated
computer system via inter-computer communication. Some or all of
the system components or data structures may also be stored (e.g.,
as instructions or structured data) on a computer-accessible medium
or a portable article to be read by an appropriate drive, various
examples of which are described above. In some embodiments,
instructions stored on a computer-accessible medium separate from
computer system 700 may be transmitted to computer system 700 via
transmission media or signals such as electrical, electromagnetic,
or digital signals, conveyed via a communication medium such as a
network and/or a wireless link. Various embodiments may further
include receiving, sending or storing instructions and/or data
implemented in accordance with the foregoing description upon a
computer-accessible medium. Accordingly, the present invention may
be practiced with other computer system configurations.
CONCLUSION
[0094] Various embodiments may further include receiving, sending
or storing instructions and/or data implemented in accordance with
the foregoing description upon a computer-accessible medium.
Generally speaking, a computer-accessible medium may include
storage media or memory media such as magnetic or optical media,
e.g., disk or DVD/CD-ROM, volatile or non-volatile media such as
RAM (e.g. SDRAM, DDR, RDRAM, SRAM, etc.), ROM, etc. As well as
transmission media or signals such as electrical, electromagnetic,
or digital signals, conveyed via a communication medium such as
network and/or a wireless link.
[0095] The various methods as illustrated in the Figures and
described herein represent exemplary embodiments of methods. The
methods may be implemented in software, hardware, or a combination
thereof. The order of method may be changed, and various elements
may be added, reordered, combined, omitted, modified, etc.
[0096] Various modifications and changes may be made as would be
obvious to a person skilled in the art having the benefit of this
disclosure. It is intended that the invention embrace all such
modifications and changes and, accordingly, the above description
to be regarded in an illustrative rather than a restrictive
sense.
* * * * *