U.S. patent application number 11/286524 was filed with the patent office on 2006-04-06 for deghosting mosaics using multiperspective plane sweep.
This patent application is currently assigned to Microsoft Corporation. Invention is credited to Sing Bing Kang, Richard S. Szeliski, Matthew T. Uyttendaele.
Application Number | 20060072851 11/286524 |
Document ID | / |
Family ID | 29733192 |
Filed Date | 2006-04-06 |
United States Patent
Application |
20060072851 |
Kind Code |
A1 |
Kang; Sing Bing ; et
al. |
April 6, 2006 |
Deghosting mosaics using multiperspective plane sweep
Abstract
A system and method for deghosting mosaics provides a novel
multiperspective plane sweep approach for generating an image
mosaic from a sequence of still images, video images, scanned
photographic images, computer generated images, etc. This
multiperspective plane sweep approach uses virtual camera positions
to compute depth maps for columns of overlapping pixels in adjacent
images. Object distortions and ghosting caused by image parallax
when generating the image mosaics are then minimized by blending
pixel colors, or grey values, for each computed depth to create a
common composite area for each of the overlapping images. Further,
the multiperspective plane sweep approach described herein is both
computationally efficient, and applicable to both the case of
limited overlap between the images used for creating the image
mosaics, and to the case of extensive or increased image
overlap.
Inventors: |
Kang; Sing Bing; (Redmond,
WA) ; Szeliski; Richard S.; (Redmond, WA) ;
Uyttendaele; Matthew T.; (Seattle, WA) |
Correspondence
Address: |
MICROSOFT CORPORATION;C/O LYON & HARR, LLP
300 ESPLANADE DRIVE
SUITE 800
OXNARD
CA
93036
US
|
Assignee: |
Microsoft Corporation
Redmond
WA
|
Family ID: |
29733192 |
Appl. No.: |
11/286524 |
Filed: |
November 22, 2005 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10172859 |
Jun 15, 2002 |
|
|
|
11286524 |
Nov 22, 2005 |
|
|
|
Current U.S.
Class: |
382/294 ;
382/284 |
Current CPC
Class: |
G06T 5/003 20130101;
G06K 9/32 20130101; G06K 2009/2045 20130101; G06T 3/00 20130101;
G06T 7/30 20170101; G06T 7/55 20170101; G06T 5/20 20130101; G06T
15/503 20130101 |
Class at
Publication: |
382/294 ;
382/284 |
International
Class: |
G06K 9/32 20060101
G06K009/32; G06K 9/36 20060101 G06K009/36 |
Claims
1. A system for blending pixels in overlapping images of a scene
comprising using a computing device to perform the following steps:
determining an area of overlap between at least two images having a
common viewing plane; dividing the area of overlap into at least
one column of image pixels, each column having a width of at least
one pixel; computing a depth map for each column; for each column,
identifying pixels from at least one image that correspond to the
depth map computed for each column; and blending at least two
identified pixels in each column.
2. The system of claim 1 wherein a composite image pixel is
produced by blending the at least two identified pixels.
3. The system of claim 2 further comprising assigning each
composite pixel to the column used for identifying the pixels
blended to produce the composite pixel.
4. The system of claim 3 wherein the composite pixel is assigned to
a portion of each overlapping image represented by the column so
that the composite pixel is common to each overlapping image.
5. The system of claim 1 wherein at least two overlapping images of
a scene having different perspective viewing planes are warped to
bring each image into the common viewing plane prior to determining
the area of overlap between the at least two images.
6. The system of claim 1 wherein the depth map for each column is
computed by using a plane sweep from a virtual origin perpendicular
to each column.
7. The system of claim 1 wherein the depth map for each column is
computed using pixel distances measured by a laser range finder at
the time each image was acquired.
8. The system of claim 1 wherein the depth map for each column is
computed using pixel distances measured by a radar range finder at
the time each image was acquired.
9. The system of claim 1 further comprising aligning overlapping
image areas prior to dividing the area of overlap into at least one
column.
10. The system of claim 1 further comprising compensating for
exposure variation between overlapping images by adjusting an
exposure of at least one of the overlapping images so that the
exposure is consistent between the overlapping images.
11. The system of claim 1 further comprising weighting each
identified pixel prior to pixel blending.
12. The system of claim 1 wherein a linear weighting that is a
function of a proximity to an edge of the area of overlap and to
the point of origin for each image is applied to each identified
pixel prior to pixel blending.
13. The system of claim 1 wherein pixel blending is achieved by
averaging the pixels that are blended.
14. The system of claim 11 wherein pixel blending is achieved by
averaging the weighted pixels that are blended.
15. The system of claim 5 further comprising warping each image
back to its original perspective viewing plane after pixel
blending.
16. The system of claim 1 wherein at least one of the overlapping
images of a scene is acquired using at least one camera.
17. The system of claim 1 wherein at least two cameras having
overlapping fields of view are used to acquire overlapping images
of a scene.
18-28. (canceled)
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is a Continuation Application of U.S.
patent application Ser. No. 10/172,859, filed on Jun. 15, 2002 by
Kang, et al., and entitled "A SYSTEM AND METHOD DEGHOSTING MOSAICS
USING MULTIPERSPECTIVE PLANE SWEEP".
BACKGROUND
[0002] 1. Technical Field
[0003] The invention is related to a system for mosaicing images,
and in particular, to a system and method for minimizing object
distortions and ghosting caused by image parallax.
[0004] 2. Related Art
[0005] In general, image mosaics are a combination of two or more
overlapping images that serve to present an overall view of a scene
from perspectives other than those of the individual images used to
generate the mosaic. In other words, image-based rendering
techniques such as the creation of image mosaics are used to render
photorealistic novel views from collections of real or pre-rendered
images which allow a user or viewer to look in any desired
direction. Such novel views are useful for virtual travel,
architectural walkthroughs, video games, or simply for examining a
scene or area from perspectives not originally captured or
otherwise rendered. Typically, better final mosaicing results for a
given scene or area are achieved by using many overlapping images
having a large percentage of overlap between the images.
[0006] Unfortunately, using large sets of overlapping images having
a high degree of overlap for generating mosaics is typically
computationally expensive. Further, where the set of overlapping
images available for generating a mosaic comprises a sparse or
limited set of images taken at slightly displaced locations, the
problem of ghosting due to the presence of parallax becomes a major
concern. In general, ghosting can be described as a visual artifact
resulting from parallax that is frequently observed when images
captured from different camera positions are either stitched,
mosaiced, or otherwise combined. Specifically, any deviations from
a pure parallax-free motion model or an ideal pinhole camera model
can result in local misregistrations between the combined images.
These misregistrations are typically visible as a loss of detail,
such as blurring, or as two or more overlapping semi-transparent
regions in the mosaiced images, i.e., ghosting.
[0007] There are several existing schemes for addressing ghosting
when mosaicing images. For example, one conventional scheme uses a
local patch-based deghosting technique in an attempt to address the
problem. This scheme provides a system for constructing panoramic
image mosaics from sequences of images. This scheme constructs a
full view panorama using a rotational mosaic representation that
associates a rotation matrix and, optionally, a focal length, with
each input image in a sequence of images.
[0008] This scheme then uses a patch-based alignment algorithm
which uses motion models to align two sequential images. In order
to reduce accumulated registration errors between such images, a
global alignment, or "block adjustment" is first applied to the
whole sequence of images, which results in an optimally registered
image mosaic. To compensate for small amounts of motion parallax
introduced by translations of the camera and other unmodeled
distortions, a local alignment technique for deghosting the
combined images is used. This local alignment technique warps each
image based on the results of pairwise local image registrations.
Combining both the global and local alignment, serves to improve
the quality of image mosaics generated using this scheme.
[0009] Unfortunately, while useful, because the aforementioned
patch-based deghosting technique is purely image-based, it is only
capable of addressing small amounts of motion parallax.
Consequently, this scheme can not fully address significant
parallax problems. Further, the corrective warping used in this
patch-based deghosting technique often produces unrealistic-looking
results. In addition, the patch-based deghosting technique
summarized above tends to is be computationally expensive.
[0010] Another conventional scheme for addressing the problem of
parallax induced ghosting in stitched or mosaiced images involves
the use of dense sampling to overcome the ghosting problem.
Effectively, this dense sampling requires the use of images having
significant overlapping regions. Specifically, this scheme provides
for synthesizing an image from a new viewpoint using data from
multiple overlapping reference images. This synthesized image is
constructed from a dataset which is essentially a single image that
is produced by combining samples from multiple viewpoints into a
single image. Unfortunately, this scheme can not provide a
satisfactory solution in the case of sparse sampling, such as where
overlap between images is 50% or less and where parallax is a
significant concern. In addition, because of the dense sampling,
the aforementioned scheme tends to be computationally
expensive.
[0011] Therefore, what is needed is a computationally efficient
system and method for deghosting image mosaics. Further, this
system and method should be capable of deghosting image mosaics
even in the case where there is significant parallax, or where
there is limited overlap between images used for creating the image
mosaics.
SUMMARY
[0012] A system and method for deghosting mosaics as described
herein solves the aforementioned problems, as well as other
problems that will become apparent from an understanding of the
following description by providing a novel "multiperspective plane
sweep" approach for generating an image mosaic from a sequence of
still images, video images, scanned photographic images, computer
generated images, etc. This multiperspective plane sweep approach
uses virtual camera positions to compute depth maps for strips of
overlapping pixels in adjacent images. These strips, which are at
least one pixel in width, are perpendicular to camera motion. For
horizontal camera motion, these strips correspond to pixel columns.
Even if the camera motion is not horizontal, the images are warped
or "rectified" to produce an effective horizontal camera motion.
From this point on, the discussion assumes horizontal camera motion
for ease of discussion. However, as should be appreciated by those
skilled in the art, the system and method for deghosting mosaics as
described herein, applies to arbitrary camera motions and
translations.
[0013] Object distortions and ghosting caused by image parallax
when generating the image mosaics is then minimized by blending
pixel colors, or grey values, for each computed depth to create a
common composite area for each of the overlapping images. Further,
the multiperspective plane sweep approach described herein is both
computationally efficient, and applicable to the case of limited
overlap between the images used for creating the image mosaics.
Note the multiperspective plane sweep approach described herein
also works well in cases of increased image overlap.
[0014] In general, the multiperspective plane sweep (MPPS)
technique described herein addresses the problem of ghosting and
distortion resulting from image parallax effects by considering the
problem from a geometric point of view. Specifically, given two or
more of images that are to be stitched or combined to form a
composite mosaic image, a perspective warping is first applied to
the images to put them into a common plane. Overlapping regions of
the warped images are then identified. These overlapping regions
are then subdivided into columns having one or more pixels in
width. Virtual camera positions are then associated with each
column and used with a multiperspective plane sweep to determine a
relative depth for each of the pixels in each column. The relative
depth is then used in combination with each of the virtual camera
positions to identify particular pixels for blending to create a
composite overlapping region common to each of the overlapping
images.
[0015] The perspective warping of the images requires that certain
information regarding the camera used to acquire an image, such as,
for example, camera position, focal length, field of view, and
orientation are known. Similarly, in the case of computer generated
or rendered images, the equivalent information is typically
available as if a virtual camera having known parameters at a known
point in space had been used to acquire the image. Note that any
discussions throughout this description that refer to a camera
location for acquiring an image also apply equally to virtual
viewing origins for computer generated images produced without the
use of an actual camera.
[0016] In either case, perspective warping of an image simply means
to digitally process the image so that it appears that the image
was captured or rendered from the perspective of a different camera
location or point of view, rather than at the position or point of
view from which the image was either originally captured or
rendered. For example, with respect to the MPPS techniques
described herein, perspective warping of images is used to warp
overlapping images so that each image appears to be in the same
plane.
[0017] Once the images have been warped, any of a number of
conventional alignment techniques is used to identify overlapping
regions between two or more images which are to be composited.
Non-overlapping regions are then associated with their respective
original camera locations (or rendering origins), while each column
of pixels in overlapping areas of the images are associated with
virtual camera locations existing between the two original camera
locations. The use of such virtual camera locations serves to
minimize object distortion, which is unavoidable, while producing a
practically seamless composite image. Computing the appearance of
each column within the overlapping region is accomplished using a
modification of a conventional plane sweep technique. This
modification is termed "multi-perspective plane sweep" (MPPS),
because the plane sweep for every column in the overlapping region
is computed using a different virtual camera position.
[0018] Conventional plane sweep algorithms are used for computing a
relative depth of pixels in overlapping images. In particular,
plane sweep algorithms operate by considering each candidate
disparity as defining a plane in space, and project all images to
be matched onto that plane using a planar perspective transforms
(homography). A per-pixel fitness metric (e.g., the variance of the
corresponding collection of pixels) is first computed, and this is
then aggregated spatially using an efficient convolution algorithm
such as a moving average box filter or some other technique. After
all the cost functions have been computed, a winning disparity is
chosen. If the planes are processed in front to back order,
occlusion relationships can also be included. Note that such plane
sweep techniques are well known to those skilled in the art, and
will not be discussed in detail herein.
[0019] Also, as noted above, other methods for computing the depth
of overlapping pixels can also be used. For example, in the case of
actual images, a laser or radar range finder can be used with the
camera to accurately measure the true depth of every pixel in the
image. Similarly, in the case of many computer rendered images, the
images are generated based on a three-dimensional model or models
where the relative depth of all pixels in the rendered image is
known at the time the image is rendered. In alternate embodiments,
these depth maps are used in the same manner as the depth maps
generated using the multi-perspective plane sweep.
[0020] In the pixel color assignment step, the computed depth,
whether from the MPPS, or from another depth mapping technique, is
used to index the colors or grey values from the input images.
Specifically, given the computed depth map at each virtual camera
location, a vector is projected from each actual camera location
through the overlapping image region to the computed depth at that
virtual camera location for each pixel in the column. The pixel
values at the points where each of these vectors pass through the
overlapping image region are then blended to create a composite
image pixel at the point on the image plane corresponding to the
virtual camera location. Further, this same pixel value is assigned
to each of the overlapping images such that a common composite area
is created for each of the overlapping images.
[0021] Further, in one embodiment, blending weights are used to
weight the pixels being blended. In particular, those pixels that
are closer to a camera or rendering point are weighted more heavily
than those pixels that are further from a camera or rendering
point. In other words, the pixels are weighted based on the
proximity to the edge of the overlap region and to the camera or
origin used to acquire or render the image. In alternate
embodiments, these blending weights are any conventional linear or
non-linear weighting function.
[0022] In view of the preceding discussion, it is clear that the
MPPS techniques described herein are advantageous for use in
generating seamless mosaics in cases of sparse sampling, such as
where overlap between images is 50% or less, where parallax is a
significant concern, and where computational resources are
limited.
[0023] In addition to the just described benefits, other advantages
of the multiperspective plane sweep techniques described herein
will become apparent from the detailed description which follows
hereinafter when taken in conjunction with the accompanying drawing
figures.
DESCRIPTION OF THE DRAWINGS
[0024] The specific features, aspects, and advantages of the
present invention will become better understood with regard to the
following description, appended claims, and accompanying drawings
where:
[0025] FIG. 1 is a general system diagram depicting a
general-purpose computing device constituting an exemplary system
for using a multiperspective plane sweep to combine two or more
images into a seamless mosaic.
[0026] FIG. 2 illustrates an exemplary architectural diagram
showing exemplary program modules for using a multiperspective
plane sweep to combine two or more images into a seamless
mosaic.
[0027] FIG. 3A is a schematic representation of two image planes,
each plane having been captured or rendered from different origins
and being disposed at a large angle relative to each other.
[0028] FIG. 3B provides exemplary photographic images corresponding
to the image planes of FIG. 3A.
[0029] FIG. 4A is a schematic representation of the image planes of
FIG. 3A following a perspective warping of the image planes to
place the image planes into a common image plane.
[0030] FIG. 4B provides exemplary photographic images showing the
effect of the perspective warping of the images of FIG. 3B to place
the images into a common image plane.
[0031] FIG. 5 is a schematic diagram that illustrates the use of
depth maps and virtual camera positions for selecting pixels for
blending.
[0032] FIG. 6 illustrates an exemplary system flow diagram for
using a multiperspective plane sweep to combine two or more images
into a seamless mosaic.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0033] In the following description of the preferred embodiments of
the present invention, reference is made to the accompanying
drawings, which form a part hereof, and in which is shown by way of
illustration specific embodiments in which the invention may be
practiced. It is understood that other embodiments may be utilized
and structural changes may be made without departing from the scope
of the present invention.
1.0 Exemplary Operating Environment:
[0034] FIG. 1 illustrates an example of a suitable computing system
environment 100 on which the invention may be implemented. The
computing system environment 100 is only one example of a suitable
computing environment and is not intended to suggest any limitation
as to the scope of use or functionality of the invention. Neither
should the computing environment 100 be interpreted as having any
dependency or requirement relating to any one or combination of
components illustrated in the exemplary operating environment
100.
[0035] The invention is operational with numerous other general
purpose or special purpose computing system environments or
configurations. Examples of well known computing systems,
environments, and/or configurations that may be suitable for use
with the invention include, but are not limited to, personal
computers, server computers, hand-held, laptop or mobile computer
or communications devices such as cell phones and PDA's,
multiprocessor systems, microprocessor-based systems, set top
boxes, programmable consumer electronics, network PCs,
minicomputers, mainframe computers, distributed computing
environments that include any of the above systems or devices, and
the like.
[0036] The invention may be described in the general context of
computer-executable instructions, such as program modules, being
executed by a computer. Generally, program modules include
routines, programs, objects, components, data structures, etc. that
perform particular tasks or implement particular abstract data
types. The invention may also be practiced in distributed computing
environments where tasks are performed by remote processing devices
that are linked through a communications network. In a distributed
computing environment, program modules may be located in both local
and remote computer storage media including memory storage devices.
With reference to FIG. 1, an exemplary system for implementing the
invention includes a general-purpose computing device in the form
of a computer 110.
[0037] Components of computer 110 may include, but are not limited
to, a processing unit 120, a system memory 130, and a system bus
121 that couples various system components including the system
memory to the processing unit 120. The system bus 121 may be any of
several types of bus structures including a memory bus or memory
controller, a peripheral bus, and a local bus using any of a
variety of bus architectures. By way of example, and not
limitation, such architectures include Industry Standard
Architecture (ISA) bus, Micro Channel Architecture (MCA) bus,
Enhanced ISA (EISA) bus, Video Electronics Standards Association
(VESA) local bus, and Peripheral Component Interconnect (PCI) bus
also known as Mezzanine bus.
[0038] Computer 110 typically includes a variety of computer
readable media. Computer readable media can be any available media
that can be accessed by computer 110 and includes both volatile and
nonvolatile media, removable and non-removable media. By way of
example, and not limitation, computer readable media may comprise
computer storage media and communication media. Computer storage
media includes volatile and nonvolatile removable and non-removable
media implemented in any method or technology for storage of
information such as computer readable instructions, data
structures, program modules or other data.
[0039] Computer storage media includes, but is not limited to, RAM,
ROM, EEPROM, flash memory or other memory technology, CD-ROM,
digital versatile disks (DVD) or other optical disk storage,
magnetic cassettes, magnetic tape, magnetic disk storage or other
magnetic storage devices, or any other medium which can be used to
store the desired information and which can be accessed by computer
110. Communication media typically embodies computer readable
instructions, data structures, program modules or other data in a
modulated data signal such as a carrier wave or other transport
mechanism and includes any information delivery media.
[0040] The aforementioned term "modulated data signal" means a
signal that has one or more of its characteristics set or changed
in such a manner as to encode information in the signal. By way of
example, and not limitation, communication media includes wired
media such as a wired network or direct-wired connection, and
wireless media such as acoustic, RF, infrared and other wireless
media. Combinations of any of the above should also be included
within the scope of computer readable media.
[0041] The system memory 130 includes computer storage media in the
form of volatile and/or nonvolatile memory such as read only memory
(ROM) 131 and random access memory (RAM) 132. A basic input/output
system 133 (BIOS), containing the basic routines that help to
transfer information between elements within computer 110, such as
during start-up, is typically stored in ROM 131. RAM 132 typically
contains data and/or program modules that are immediately
accessible to and/or presently being operated on by processing unit
120. By way of example, and not limitation, FIG. 1 illustrates
operating system 134, application programs 135, other program
modules 136, and program data 137.
[0042] The computer 110 may also include other
removable/non-removable, volatile/nonvolatile computer storage
media. By way of example only, FIG. 1 illustrates a hard disk drive
141 that reads from or writes to non-removable, nonvolatile
magnetic media, a magnetic disk drive 151 that reads from or writes
to a removable, nonvolatile magnetic disk 152, and an optical disk
drive 155 that reads from or writes to a removable, nonvolatile
optical disk 156 such as a CD ROM or other optical media. Other
removable/non-removable, volatile/nonvolatile computer storage
media that can be used in the exemplary operating environment
include, but are not limited to, magnetic tape cassettes, flash
memory cards, digital versatile disks, digital video tape, solid
state RAM, solid state ROM, and the like. The hard disk drive 141
is typically connected to the system bus 121 through a
non-removable memory interface such as interface 140, and magnetic
disk drive 151 and optical disk drive 155 are typically connected
to the system bus 121 by a removable memory interface, such as
interface 150.
[0043] The drives and their associated computer storage media
discussed above and illustrated in FIG. 1, provide storage of
computer readable instructions, data structures, program modules
and other data for the computer 110. In FIG. 1, for example, hard
disk drive 141 is illustrated as storing operating system 144,
application programs 145, other program modules 146, and program
data 147. Note that these components can either be the same as or
different from operating system 134, application programs 135,
other program modules 136, and program data 137. Operating system
144, application programs 145, other program modules 146, and
program data 147 are given different numbers here to illustrate
that, at a minimum, they are different copies.
[0044] A user may enter commands and information into the computer
110 through input devices such as a keyboard 162 and pointing
device 161, commonly referred to as a mouse, trackball or touch
pad. Other input devices (not shown) may include a microphone,
joystick, game pad, satellite dish, scanner, or the like. These and
other input devices are often connected to the processing unit 120
through a user input interface 160 that is coupled to the system
bus 121, but may be connected by other interface and bus
structures, such as a parallel port, game port or a universal
serial bus (USB). A monitor 191 or other type of display device is
also connected to the system bus 121 via an interface, such as a
video interface 190. In addition to the monitor, computers may also
include other peripheral output devices such as speakers 197 and
printer 196, which may be connected through an output peripheral
interface 195.
[0045] Further, the computer 110 may also include, as an input
device, a camera 192 (such as a digital/electronic still or video
camera, or film/photographic scanner) capable of capturing a
sequence of images 193. Further, while just one camera 192 is
depicted, multiple cameras could be included as input devices to
the computer 110. The use of multiple cameras provides the
capability to capture multiple views of an image simultaneously or
sequentially, to capture three-dimensional or depth images, or to
capture panoramic images of a scene. The images 193 from the one or
more cameras 192 are input into the computer 110 via an appropriate
camera interface 194. This interface is connected to the system bus
121, thereby allowing the images 193 to be routed to and stored in
the RAM 132, or any of the other aforementioned data storage
devices associated with the computer 110. However, it is noted that
image data can be input into the computer 110 from any of the
aforementioned computer-readable media as well, without requiring
the use of a camera 192.
[0046] The computer 110 may operate in a networked environment
using logical connections to one or more remote computers, such as
a remote computer 180. The remote computer 180 may be a personal
computer, a server, a router, a network PC, a peer device or other
common network node, and typically includes many or all of the
elements described above relative to the computer 110, although
only a memory storage device 181 has been illustrated in FIG. 1.
The logical connections depicted in FIG. 1 include a local area
network (LAN) 171 and a wide area network (WAN) 173, but may also
include other networks. Such networking environments are
commonplace in offices, enterprise-wide computer networks,
intranets and the Internet.
[0047] When used in a LAN networking environment, the computer 110
is connected to the LAN 171 through a network interface or adapter
170. When used in a WAN networking environment, the computer 110
typically includes a modem 172 or other means for establishing
communications over the WAN 173, such as the Internet. The modem
172, which may be internal or external, may be connected to the
system bus 121 via the user input interface 160, or other
appropriate mechanism. In a networked environment, program modules
depicted relative to the computer 110, or portions thereof, may be
stored in the remote memory storage device. By way of example, and
not limitation, FIG. 1 illustrates remote application programs 185
as residing on memory device 181. It will be appreciated that the
network connections shown are exemplary and other means of
establishing a communications link between the computers may be
used.
[0048] The exemplary operating environment having now been
discussed, the remaining part of this description will be devoted
to a discussion of the program modules and processes embodying use
of a multiperspective plane sweep to combine two or more images
into a seamless mosaic.
2.0 Introduction:
[0049] The multiperspective plane sweep techniques described herein
are useful for minimizing object distortions and ghosting caused by
image parallax when generating mosaics from a sequence of still
images. Minimization of object distortions and ghosting is achieved
using virtual camera positions in combination with image depth map
based pixel blending for overlapping portions of images. In a
working embodiment, the image depth maps are created using a
"multi-perspective plane sweep" to determine a relative depth of
overlapping strips or columns of pixels in overlapping images of a
scene. However, other methods for generating depth maps for the
overlapping columns of pixels can also be used in further
embodiments.
[0050] In general, a system and method for deghosting mosaics
provides a novel multiperspective plane sweep approach for
generating an image mosaic from a sequence of still images, video
images, scanned photographic images, computer generated images,
etc. This multiperspective plane sweep approach uses virtual camera
positions to compute depth maps for columns of overlapping pixels
in adjacent images. Object distortions and ghosting caused by image
parallax when generating the image mosaics are then minimized by
blending pixel colors, or grey values, for each computed depth to
create a common composite area for each of the overlapping images.
Further, the multiperspective plane sweep approach described herein
is both computationally efficient, and applicable to both the case
of limited overlap between the images used for creating the image
mosaics, and to the case of extensive or increased image
overlap.
2.1 System Overview:
[0051] In general, the multiperspective plane sweep (MPPS)
technique described herein addresses the problem of ghosting and
distortion resulting from image parallax effects by considering the
problem from a geometric point of view. Specifically, given two or
more of images that are to be stitched or combined to form a
composite mosaic image, a perspective warping is first applied to
the images to put them into a common plane. Overlapping regions of
the warped images are then identified. These overlapping regions
are then subdivided into subregions having one or more pixels in
width. For example, in the case of two images with horizontal
camera motions, a subregion corresponds to a pixel column. Further,
in the most general case, with more than two cameras at arbitrary
locations, a subregion can be as small as a single pixel in
overlapping images. Note that even if the camera motion is not
horizontal, the images are warped or "rectified" to produce an
effective horizontal camera motion. For purposes of explanation,
the following discussion addresses the case of two images with
horizontal camera motions as an illustrative example. However, as
should be appreciated by those skilled in the art, the system and
method for deghosting mosaics as described herein, applies to
arbitrary camera motions and translations. Virtual camera positions
are then associated with each column and used with a
multiperspective plane sweep to determine a relative depth for each
of the pixels in each column. The relative depth is then used in
combination with each of the virtual camera positions to identify
particular pixels for blending to create a composite overlapping
region common to each of the overlapping images.
2.2 System Architecture:
[0052] The processes summarized above are illustrated by the
general system diagram of FIG. 2. In particular, the system diagram
of FIG. 2 illustrates the interrelationships between program
modules for implementing deghosting using a multi-perspective plane
sweep. It should be noted that the boxes and interconnections
between boxes that are represented by broken or dashed lines in
FIG. 2 represent alternate embodiments of deghosting methods
described herein, and that any or all of these alternate
embodiments, as described below, may be used in combination with
other alternate embodiments that are described throughout this
document.
[0053] In particular, as illustrated by FIG. 2, a system and method
for deghosting mosaics uses images 200 from a database or
collection of images stored in a computer readable medium. These
images 200 are either captured by one or more cameras 210 or
rendered using conventional computer image generation techniques.
To begin, an image acquisition module 220 receives two or more
overlapping images 200, which are either previously stored or
acquired directly from the one or more cameras 210.
[0054] In one embodiment, the images 200 are then provided to a
global exposure compensation module 225 which uses any of a number
of conventional techniques for equalizing or compensating for
exposure differences between each image. The global exposure
compensation module 225 addresses exposure compensation parameters
such as brightness and contrast levels using conventional exposure
compensation techniques that are well known to those skilled in the
art, and will not be discussed in further detail herein.
[0055] An image warping module 230 then applies a perspective warp
to each of the overlapping images 200 so that the images are warped
into a common viewing plane. The perspective warping module 230
uses well known conventional techniques for warping or "rectifying"
images from one viewing perspective to another using image capture
or rendering parameters 240 which include information, such as, for
example, camera or rendering origin, field of view, focal length,
and orientation.
[0056] The warped images are then provided to an image overlap
module 250 that aligns the overlapping portions of the images using
conventional image alignment techniques. The overlapping portions
of the images are then divided into columns of at least one pixel
in width. Note that at this point most overlapping images will
exhibit signs of ghosting or blurriness in the overlap regions due
to image parallax resulting from capturing or rendering the
overlapping images from even slightly different perspectives. The
ghosting and blurriness noted above then is addressed by the
program modules discussed in the following paragraphs.
[0057] In particular, a depth map module 260 generates a depth map
for the overlapping portions of each image. In a working embodiment
of the deghosting system described herein, a multi-perspective
plane sweep using virtual camera positions for each column of the
overlapping portions of each image is used to compute relative
depth maps. This process is discussed in greater detail in Section
3.1.2. In alternate embodiments, the depth maps can be created
using other techniques, including, for example, the use of laser or
radar range finding equipment for determining the actual depth of
image pixels relative to a camera origin.
[0058] Next, a depth-based pixel blending module 270 uses the depth
maps in combination with pixel information in the overlapping
portions of each image to generate a composite pixel for each
column of the overlapping images. Specifically, for each virtual
camera location, given the computed depth maps for each column, the
depth-based pixel blending module 270 projects a vector from each
actual camera location through the overlapping image region to the
computed depth for each pixel in the column.
[0059] The pixel values at the points where each of these vectors
pass through the overlapping image region are then blended to
create a composite image pixel at the point on an image plane
corresponding to the virtual camera location for each column of the
overlap region. Further, this same pixel value is assigned to each
of the overlapping images such that for each column a common
composite area is created for each of the overlapping images.
Having a composite area serves to reduce or eliminate distortion,
blurring and ghosting resulting from image parallax. This process
is discussed in greater detail in Section 3.1.3.
[0060] In a related embodiment, the depth-based pixel blending
module 270 uses "blending weights" to weight the pixels being
blended. In particular, those pixels that are closer to a camera or
rendering point are weighted more heavily than those pixels that
are further from a camera or rendering point. In other words, the
pixels are weighted based on the proximity to the edge of the
overlap region and to the camera or origin used to acquire or
render the image. In further embodiments, these blending weights
are any conventional linear or non-linear weighting function.
[0061] Finally, a reverse warping module 290 exactly reverses or
inverts the perspective warping applied to the images by the image
warping module 220. Consequently, the images are put back into
their original perspectives. However, these output images differ
from the original input images in that they now have a common
overlap area that differs only in the perspective from which the
images were captured or rendered, with any potential ghosting or
blurring effects minimized or eliminated by the aforementioned
pixel blending procedures. These images 200 are then stored to
computer readable medium for later use in viewing or creating image
mosaics or image panoramas.
3.0 Operation Overview:
[0062] The system and method described herein for deghosting
mosaics is applicable to actual images such as still images, video
images, scanned photographic images, images acquired via film or
digital cameras, etc., and to computer generated or processed
images. However, for ease of explanation, the detailed description
provided herein focuses on mosaicing a set of two or more images
captured using a conventional camera having a known origin, field
of view, focal length, and orientation. The above-described program
modules are employed in a mosaic image deghoster for automatically
deghosting overlapping portions of mosaiced images. This process is
depicted in the flow diagram of FIG. 3 following a detailed
operational discussion of exemplary methods for implementing the
aforementioned programs modules.
3.1 Operational Elements:
[0063] In general, the MPPS techniques described herein addresses
the problem of ghosting and distortion resulting from image
parallax effects by performing a series of operations on
overlapping pictures. In particular, given two or more overlapping
images, a perspective warping is first applied to the images to put
them into a common plane. Overlapping regions of the warped images
are then identified and subdivided into columns of at least one
pixel in width. Virtual camera positions are then associated with
each column and used with a multiperspective plane sweep to
determine a relative depth for each of the pixels in each column.
The relative depth is then used in combination with each of the
virtual camera positions to identify particular pixels for blending
to create a composite overlapping region common to each of the
overlapping images. The following sections describe in detail the
operational elements for implementing an image deghoster using the
processes summarized above and described in detail in the following
sections.
3.1.1 Image Warping:
[0064] Perspective Warping of images is a well known conventional
digital imaging technique for warping an image so that it appears
that the image was captured or rendered from a perspective that is
different than the perspective from which the image was actually
captured or rendered. Perspective warping of images requires that
certain information regarding the camera used to acquire an image,
such as, for example, camera position, focal length, field of view,
and orientation is either known, or can be computed or otherwise
approximated. Similarly, in the case of computer generated or
rendered images, the equivalent information is typically available
as if a virtual camera having known parameters at a known point in
space had been used to acquire the image. This type of perspective
warping is also often referred to as image rectification, which is
simply defined as a process of making image data conform to a
desired map projection system. Note that such image warping or
rectification techniques are well known to those skilled in the
art, and will not be discussed in detail herein. It should also be
noted that any discussions throughout this description that refer
to a camera location or position for acquiring an image also apply
equally to virtual viewing origins for computer generated images
produced without the use of an actual camera.
[0065] In either case, as noted above, perspective warping or
rectification of an image simply means to digitally process the
image so that it appears that the image was captured or rendered
from the perspective of a different camera location or point of
view, rather than at the position or point of view from which the
image was either originally captured or rendered. For example, with
respect to the MPPS techniques described herein, perspective
warping of images is used to warp each image so that each image
appears to be in the same plane. This concept is clearly
illustrated by FIG. 3A through FIG. 4B. Specifically, FIG. 3A shows
two image planes, I.sub.1 and I.sub.2, which have been captured or
rendered from origin points O.sub.1 and O.sub.2, respectively. The
shorter portions of both I.sub.1 and I.sub.2 extending past the
point of intersection represent overlapping regions of I.sub.1 and
I.sub.2. Clearly, the image plane I.sub.1 is at a large angle to
the image plane I.sub.2. FIG. 3B provides two photographic images,
with the leftmost image representing the image of plane I.sub.1,
and the rightmost image representing the image of plane I.sub.2.
The dotted rectangles on each of these two images represent the
overlapping areas in common to both images.
[0066] FIG. 4A illustrates the image planes for I.sub.1 and
I.sub.2, as shown in FIG. 3A, after rectification. Note that while
the points of origin for each image plane have not been modified,
the image planes, now I.sub.1, Rect. and I.sub.2, Rect. have been
warped into a common image plane having overlapping regions.
Further, FIG. 4B shows the effect of rectification on the images of
FIG. 3B, with the leftmost image of FIG. 4B corresponding to the
leftmost image of FIG. 3B, and the rightmost image of FIG. 4B
corresponding to the rightmost image of FIG. 3B. Again, the dotted
rectangles on each of these two images represent the overlapping
areas in common to both images.
3.1.2 Multi-Perspective Plane Sweep for Generation of Depth
Maps:
[0067] Conventional plane sweep algorithms are useful for computing
the relative depth of pixels in overlapping images. In particular,
such plane sweep algorithms operate by considering each candidate
disparity as defining a plane in space and projecting all images to
be matched onto that plane using a planar perspective transforms
(nomography). A per-pixel fitness metric (e.g., the variance of the
corresponding collection of pixels) is first computed, and this is
then aggregated spatially using an efficient convolution algorithm
such as a moving average box filter or some other conventional
technique. After all the cost functions have been computed, a
winning disparity is chosen. If the planes are processed in front
to back order, occlusion relationships can also be included. Note
that such plane sweep techniques are well known to those skilled in
the art, and will not be discussed in detail herein.
[0068] Plane sweeps as described herein depart from the
conventional usage described above. In particular, once the images
have been rectified, any of a number of conventional alignment
techniques is used to identify overlapping regions between two or
more images which are to be composited. Non-overlapping regions are
then associated with their respective original camera locations (or
rendering origins), while each column of pixels in overlapping
areas of the images are associated with virtual camera locations
existing between the two original camera locations. Each of these
camera locations, or rendering origins, either actual or virtual is
then used in a plane sweep for computing a depth map for the
overlapping portions of two or more images. For each plane or
"depth," each of the overlapping views is mapped to a single
reference view. The plane or depth resulting in the lowest overall
error is chosen as the correct depth. This process is repeated
using a different virtual camera origin for every column in the
overlapping region. This modification to the conventional plane
sweep is termed "multi-perspective plane sweep" (MPPS), because a
plane sweep for every column in the overlapping region is computed
using a different virtual camera position.
[0069] The use of such virtual camera locations serves to minimize
object distortion, which is unavoidable, especially given sparsely
sampled images having limited overlap, while producing a
practically seamless composite image. Computing the appearance of
each column within the overlapping region is then accomplished
using the virtual camera positions in combination with the depth
maps for pixel blending as described below in Section 3.1.3.
[0070] Also, as noted above, other methods for computing the depth
of overlapping pixels can also be used. For example, in the case of
actual images, a laser or radar range finder can be used with the
camera to accurately measure the true depth of every pixel in the
image. Similarly, in the case of many computer rendered images, the
images are generated based on a three-dimensional model or models
where the relative depth of all pixels in the rendered image is
known at the time the image is rendered. In alternate embodiments,
these depth maps are used in the same manner as the depth maps
generated using the multi-perspective plane sweep.
3.1.3 Pixel Blending using Depth Maps:
[0071] In the pixel blending step, the computed depth, whether from
the MPPS, or from another depth mapping technique, is used to index
the colors or grey values from the input images. Specifically,
given the computed depth map at each virtual camera location, i.e.,
at each column, at least one vector is projected from each actual
camera location through the overlapping image region to the
computed depths for each pixel in each column. Next, for pixels at
the same level within each column, the pixel values at the points
where each of these vectors pass through the overlapping image
region are then blended to create composite image pixels to replace
each pixel in that column. Further, these same pixel values are
assigned to each of the overlapping images such that a common
composite area is created for each column of the overlapping
images.
[0072] These concepts are illustrated by FIG. 5. In particular,
FIG. 5 is a schematic diagram that illustrates the use of depth
maps and virtual camera positions for selecting pixels to be
blended for creating a common composite area for each of two
overlapping images. Specifically, as illustrated by FIG. 5, two
overlapping images, I.sub.1, Rect. and I.sub.2, Rect., 500 and 510,
respectively have been perspective warped so that they are in the
same plane. Note that the offset between the two image planes, 500
and 510, as shown in FIG. 5, is merely for purpose of illustration
only, and does not denote an actual offset.
[0073] Using the two image planes, 500 and 510, as an example, it
is clear that there are three distinct regions: the first region,
on the far right, is a portion of I.sub.1, Rect. which is not
overlapped by any portion of I.sub.2, Rect.; the second region, in
the middle, is an area of image overlap between I.sub.1, Rect. and
I.sub.2, Rect.; and finally, the third region is a portion of
I.sub.2, Rect. which is not overlapped by any portion of I.sub.2,
Rect. Given these three regions, the first region, i.e., the
non-overlapped portion of I.sub.1, Rect., is associated with the
original camera 530, at origin O.sub.1, used to capture or acquire
the associated image, and is not further modified during the
blending stage. Similarly, the third region, i.e., the
non-overlapped portion of I.sub.2, Rect., is associated with the
original camera 540, at origin O.sub.2, used to capture or acquire
the associated image, and is not further modified during the
blending stage. Finally, the second region is divided into columns
having a minimum width of at least one pixel, with each column
being associated with a separate virtual camera 550, at virtual
origin O.sub.v,i, where i represents the current column from 0 to n
which is being processed.
[0074] Given the previously computed depth map at each virtual
camera location, a vector is then projected from each actual camera
origin, 530 and 540, for each pixel in the column, through the
overlapping image region, i.e., the region of overlap between
I.sub.1, Rect. and I.sub.2, Rect., to the computed depth for each
pixel in the column. For example, as illustrated in FIG. 5, the
relative depth computed for one pixel for the virtual camera 550,
at the virtual origin O.sub.v,i is on relative depth plane 3, where
any number of depth planes 1 to n are available.
[0075] Given this relative depth, a vector is then extended from
the virtual origin O.sub.v,i of the virtual camera 550 to the depth
plane 3. Next, a vector is extended from both of the actual camera
origins, O.sub.1 and O.sub.2, to the point where the vector
extended from the virtual origin O.sub.v,i intersects depth plane
3. The points where the vectors extending from the actual camera
origins, O.sub.1 and O.sub.2, intersect with the overlapping images
correspond to pixels, A.sub.I1 and A.sub.I2, respectively, in the
area of image overlap between I.sub.1, Rect. and I.sub.2, Rect.,
that are blended together to create a composite pixel, PS.sub.i,
for the points on both I.sub.1, Rect. and I.sub.2, Rect. where the
vector extended from the virtual origin O.sub.v,i intersects both
I.sub.1, Rect. and I.sub.2,, respectively.
[0076] This selection of pixels for blending is repeated for each
pixel in each column using the previously computed image depth
maps. Further, as suggested above, the same blended pixel value is
assigned to each of the overlapping images for any given virtual
origin O.sub.v,i such that a common composite area is created for
each of the overlapping images.
3.1.4 Pixel Weighting for Pixel Blending:
[0077] In one embodiment, blending weights are used to weight the
pixels being blended. In general, pixels are weighted based on the
proximity to the edge of the overlap region and to the camera or
origin used to acquire or render the image. In alternate
embodiments, these blending weights are any conventional linear or
non-linear weighting function.
[0078] Specifically, in one embodiment, a simple linear function
for assigning blending weights to pixels is based on the proximity
of each pixel being blended to the edge of the image overlap region
and to the camera used to acquire the image containing that pixel.
For example, if a pixel is located .lamda..sub.1 pixels away from
the left boundary of the overlap region (i.e., the boundary closer
to a camera C.sub.1) and .lamda..sub.2 pixels away from the right
boundary (closer to a camera C.sub.2), and a mapped color from
image I.sub.1, rect is c.sub.1 and a mapped color from image
I.sub.2, rect is c.sub.2. Then, the blended color of the pixel is
given by Equation 1 as: ( .lamda. 2 .lamda. 1 + .lamda. 2 .times. c
1 + .lamda. 1 .lamda. 1 + .lamda. 2 .times. c 2 ) Equation .times.
.times. 1 ##EQU1## Note that while this equation represents a
simple linear blending with weighting based on the distance to the
camera used to capture the image, many other well known
conventional linear or non-linear weightings can be used in
blending pixel colors or values for determining the final pixel
color or value once the pixels to be blended have been selected as
described in Section 3.1.3. 3.2 System Operation:
[0079] The program modules described in Section 2.2 with reference
to FIG. 2, and in view of the detailed description provided in
Section 3.1, are employed for automatically deghosting overlapping
portions of mosaiced images. This process is depicted in the flow
diagram of FIG. 6. It should be noted that the boxes and
interconnections between boxes that are represented by broken or
dashed lines in FIG. 6 represent alternate embodiments of the
present invention, and that any or all of these alternate
embodiments, as described below, may be used in combination.
[0080] Referring now to FIG. 6 in combination with FIG. 2, the
process can be generally described as a deghosting process for
overlapping portions of images of a scene. In particular, as
illustrated by FIG. 6, a system and method for automatically
deghosting images for use in image mosaics or panoramas uses images
200 from a database or collection of images stored in a computer
readable medium. These images 200 are either captured by one or
more cameras 210 or rendered using conventional computer image
generation techniques. Overlapping images captured using the
cameras 210 are either provided directly to the image processing
modules described above, or saved to image files or databases for
later processing.
[0081] In either case, in one embodiment, once the images are
available, either directly from the cameras 210, or from previously
stored image files 200, any of a number of conventional techniques
for equalizing or compensating for exposure differences between
each image is used to provide a consistent exposure between the
images 605. This exposure compensation 605 addresses exposure
compensation parameters such as brightness and contrast levels
using conventional exposure compensation techniques that are well
known to those skilled in the art, and will not be discussed in
further detail herein.
[0082] Next, two or more overlapping images are input 600 to an
algorithm for performing a perspective warp 610 on the images, so
that the images are warped into a common viewing plane. These
warped images are then further processed to locate and align 620
the overlapping portions of the images, and divide the overlapping
portions of the images into columns of at least one pixel in
width.
[0083] The overlapping portions of the images are then processed
using a multi-perspective plane sweep 630 to generate depth maps
for the overlapping regions. As discussed above, this
multi-perspective plane sweep uses virtual camera positions for
each column of the overlapping portions of each image to compute
relative depth maps for the overlapping portions. Note that this
process is discussed above in greater detail in Section 3.1.2. In
alternate embodiments, also as discussed above, the depth maps are
generated 635 using other techniques, such as, for example, using
laser or radar range finding equipment for determining the actual
depth of image pixels relative to a camera origin.
[0084] Next, using the depth maps generated either by the
multi-perspective plane sweep 630, or other methods 635, image
pixels are selected for color/gray level blending 650.
Specifically, with respect to pixel color or gray level blending
650, the depth maps are in combination with pixel information in
the overlapping portions of each image to generate a common
composite pixel for each column of each of the overlapping
images.
[0085] In particular, given the depth map for each column of the
overlap region, a vector is projected from each actual camera
location through the overlapping image region to the computed depth
for each pixel in each column. Next, for pixels at the same level
within each column, the pixel values at the points where each of
these vectors pass through the overlapping image region are then
blended to create composite image pixels to replace the pixels
comprising each column of the overlap region. The same color or
gray level assigned to the composite pixel is then assigned to each
of the overlapping images at a point corresponding to the current
virtual camera. Consequently, for each column a common composite
area is created for each of the overlapping images. Having a
composite area serves to reduce or eliminate distortion, blurring
and ghosting resulting from image parallax. This process is
discussed in greater detail in Section 3.1.3.
[0086] In a related embodiment, blending weights are used to weight
645 the pixels being blended. In particular, those pixels that are
closer to a camera or rendering point are weighted 645 more heavily
than those pixels that are further from a camera or rendering
point. In other words, the pixels are weighted 645 based on the
proximity to the edge of the overlap region and to the camera or
origin used to acquire or render the image. In further embodiments,
these blending weights are determined using any conventional linear
or non-linear weighting function.
[0087] Finally, a reverse or inverse perspective warp 660 is
applied to the images for exactly reversing or inverting the
original perspective warping 610 applied to the images.
Consequently, the images are put back into their original
perspectives. However, these output images 670 differ from the
original input images in that they now have a common overlap area
that differs only in the perspective from which the images were
captured or rendered, with any potential ghosting or blurring
effects minimized or eliminated by the aforementioned pixel
blending procedures. These output images are then stored to
computer files or databases 200 for later use in viewing or
creating image mosaics or image panoramas.
[0088] The processes described above are then repeated so long as
there are more overlapping images to process 680. Note that
particular images may overlap one or more different images on each
image border, and that particular images may therefore be
repeatedly processed as described above for each unique overlap
case. For example, in a 360-degree panorama, any given picture may
have four or more images at least partially overlapping either all
or part of that images edges or borders. In such a case, all
overlap cases are processed individually so that in the end, each
of the overlapping images will have at least some common overlap
areas that differ only in the perspective from which the images
were captured or rendered.
[0089] The foregoing description of the invention has been
presented for the purposes of illustration and description. It is
not intended to be exhaustive or to limit the invention to the
precise form disclosed. Many modifications and variations are
possible in light of the above teaching. It is intended that the
scope of the invention be limited not by this detailed description,
but rather by the claims appended hereto.
* * * * *