U.S. patent number 8,953,905 [Application Number 13/490,454] was granted by the patent office on 2015-02-10 for rapid workflow system and method for image sequence depth enhancement.
This patent grant is currently assigned to Legend3D, Inc.. The grantee listed for this patent is Tony Baldridge, Craig Cesareo, Anthony Lopez, Jared Sandrew, Timothy Tranquill. Invention is credited to Tony Baldridge, Craig Cesareo, Anthony Lopez, Jared Sandrew, Timothy Tranquill.
United States Patent |
8,953,905 |
Sandrew , et al. |
February 10, 2015 |
Rapid workflow system and method for image sequence depth
enhancement
Abstract
Movies to be colorized/depth enhanced (2D.fwdarw.3D) are broken
into backgrounds/sets or motion/onscreen-action. Background and
motion elements are combined into composite frame which becomes a
visual reference database that includes data for all frame offsets
used later for the computer controlled application of masks within
a sequence of frames. Masks are applied to subsequent frames of
motion objects based on various differentiating image processing
methods, including automated mask fitting/reshaping. Colors/depths
are automatically applied with masks throughout a scene from the
composite background and to motion objects. Areas never exposed by
motion or foreground objects may be partially or fully
realistically drawn/rendered/applied to the occluded areas and
applied throughout the images to generate artifact-free secondary
viewpoints during 2D.fwdarw.3D conversion. Iterative workflow is
eliminated for simple artifact correction through real-time
manipulation of images to avoid re-rendering of images and
associated delays of sending work product to other workgroups for
correction.
Inventors: |
Sandrew; Jared (San Diego,
CA), Baldridge; Tony (San Diego, CA), Lopez; Anthony
(San Diego, CA), Tranquill; Timothy (San Diego, CA),
Cesareo; Craig (San Diego, CA) |
Applicant: |
Name |
City |
State |
Country |
Type |
Sandrew; Jared
Baldridge; Tony
Lopez; Anthony
Tranquill; Timothy
Cesareo; Craig |
San Diego
San Diego
San Diego
San Diego
San Diego |
CA
CA
CA
CA
CA |
US
US
US
US
US |
|
|
Assignee: |
Legend3D, Inc. (Carlsbad,
CA)
|
Family
ID: |
44224497 |
Appl.
No.: |
13/490,454 |
Filed: |
June 7, 2012 |
Prior Publication Data
|
|
|
|
Document
Identifier |
Publication Date |
|
US 20120242790 A1 |
Sep 27, 2012 |
|
Related U.S. Patent Documents
|
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
Issue Date |
|
|
13029862 |
Feb 17, 2011 |
8385684 |
|
|
|
12976970 |
Dec 22, 2010 |
8401336 |
|
|
|
12913614 |
Oct 27, 2010 |
8396328 |
|
|
|
12542498 |
Aug 17, 2009 |
7907793 |
|
|
|
12032969 |
Feb 18, 2008 |
7577312 |
|
|
|
11324815 |
Jan 4, 2006 |
7333670 |
|
|
|
10450970 |
|
7181081 |
|
|
|
PCT/US02/14192 |
May 6, 2002 |
|
|
|
|
60288929 |
May 4, 2001 |
|
|
|
|
Current U.S.
Class: |
382/284; 358/537;
382/283; 358/538; 382/213; 382/274; 358/517 |
Current CPC
Class: |
H04N
13/266 (20180501); H04N 13/257 (20180501) |
Current International
Class: |
G06K
9/36 (20060101) |
Field of
Search: |
;382/213,274,275,283,284
;358/517,537,538,452,453 |
References Cited
[Referenced By]
U.S. Patent Documents
Foreign Patent Documents
|
|
|
|
|
|
|
003444353 |
|
Jun 1986 |
|
DE |
|
0302454 |
|
Feb 1989 |
|
EP |
|
60-52190 |
|
Mar 1985 |
|
JP |
|
2003046982 |
|
Feb 2003 |
|
JP |
|
2004-207985 |
|
Jul 2004 |
|
JP |
|
20120095059 |
|
Feb 2012 |
|
KR |
|
20130061289 |
|
Nov 2013 |
|
KR |
|
1192168 |
|
Nov 1982 |
|
SU |
|
2008/075276 |
|
Jun 2008 |
|
WO |
|
2011/029209 |
|
Mar 2011 |
|
WO |
|
2012016600 |
|
Sep 2012 |
|
WO |
|
2013084234 |
|
Jun 2013 |
|
WO |
|
Other References
International Search Report Issued for PCT/US2013/072208, dated
Feb. 27, 2014, 6 pages. cited by applicant .
European Office Action dated Jun. 26, 2013, received for EP Appl.
No. 02734203.9 on Jul. 22, 2013, 5 pages. cited by applicant .
International Search Report received fro PCT Application No.
PCT/US2011/067024, dated Aug. 22, 2012, 10 pages. cited by
applicant .
International Search Report and Written Opinion issued for
PCT/US2013/072447, dated Mar. 13, 2014, 6 pages. cited by applicant
.
Tam et al., "3D-TV Content Generation: 2D-To-3D Conversion", ICME
2006, p. 1868-1872. cited by applicant .
Harman et al. "Rapid 2D to 3D Conversion", The Reporter, vol. 17,
No. 1, Feb. 2002, 12 pages. cited by applicant .
Legend Films, "System and Method for Conversion of Sequences of
Two-Dimensional Medical Images to Three-Dimensional Images" Sep.
12, 2013, 7 pages. cited by applicant .
International Search Report dated May 10, 2012, 8 pages. cited by
applicant .
Machine translation of JP Patent No. 2004-207985, dated Jul. 22,
2008, 34 pages. cited by applicant .
Office Action for EPO Patent Application No. 02 734 203.9 dated
Sep. 12, 2006. (4 pages). cited by applicant .
Office Action for AUS Patent Application No. 2002305387 dated Mar.
9, 2007. (2 pages). cited by applicant .
Office Action for EPO Patent Application No. 02 734 203.9 dated
Oct. 7, 2010. (5 pages). cited by applicant .
First Examination Report for Indian Patent Application No.
01779/DELNP/2003 dated Mar. 2004. (4 pages). cited by applicant
.
International Search Report Dated Jun. 13, 2003. (3 pages). cited
by applicant .
Declaration of Barbara Frederiksen in Support of In-Three, Inc's
Opposition to Plaintiffs Motion for Preliminary Injunction, Aug. 1,
2005, IMAX Corporation et al v. In-Three, Inc., Case No. CV05 1795
Fmc (Mcx). (25 pages). cited by applicant .
USPTO, Board of Patent Appeals and Interferences, Decision on
Appeal dated Jul. 30, 2010, Ex parte Three-Dimensional Media Group,
Ltd., Appeal 2009-004087, Reexamination Control No. 90/007,578, US
Patent 4,925,294. (88 pages). cited by applicant .
Office Action for Canadian Patent Application No. 2,446,150 dated
Oct. 8, 2010. (6 pages). cited by applicant .
Office Action for Canadian Patent Application No. 2,446,150 dated
Jun. 13, 2011. (4 pages). cited by applicant .
"Nintendo DSi Uses Camera Face Tracking to Create 3D Mirages",
retrieved from www.Gizmodo.com on Mar. 18, 2013, 3 pages. cited by
applicant .
IPER, Mar. 29, 2007, PCT/US2005/014348, 5 pages. cited by applicant
.
IPER, Oct. 5, 2013, PCT/US2011/058182, 6 pages. cited by applicant
.
International Search Report, Jun. 13, 2003, PCT/US02/14192, 4
pages. cited by applicant .
Partial Testimony, Expert: Samuel Zhou, Ph.D., 2005 WL 3940225
(C.D.Cal.), Jul. 21, 2005, 21 pages. cited by applicant .
PCT ISR, Feb. 27, 2007, PCT/US2005/014348, 8 pages. cited by
applicant .
PCT ISR, Sep. 11, 2007, PCT/US07/62515, 9 pages. cited by applicant
.
CA Office Action, Dec. 28, 2011, Appl No. 2,446,150, 4 pages. cited
by applicant .
PCT ISR, Nov. 14, 2007, PCT/US07/62515, 24 pages. cited by
applicant .
PCT IPRP, Jul. 4, 2013, PCT/US2011/067024, 5 pages. cited by
applicant .
Lenny Lipton, "Foundations of the Stereo-Scopic Cinema, a Study in
Depth" With and Appendix on 3D Television, 325 ppages, May 1978.
cited by applicant .
Interpolation (from Wikipedia encyclopedia, article pp. 1-6),
retrieved from Internet
URL:http://en.wikipedia.org/wiki/Interpolation on Jun. 5, 2008.
cited by applicant .
Optical Reader (from Wikipedia encyclopedia, article p. 1),
retrieved from Internet
URL:http://en.wikipedia.org/wiki/Optical.sub.--reader on Jun. 5,
2008. cited by applicant .
Declaration of Steven K. Feiner, Exhibit A, 10 pages, Nov. 2, 2007.
cited by applicant .
Declaration of Michael F. Chou, Exhibit B, 12 pages, Nov. 2, 2007.
cited by applicant .
Declaration of John Marchioro, Exhibit C, 3 pages, Nov. 2, 2007.
cited by applicant .
Exhibit 1 to Declaration of John Marchioro, Revised translation of
portions of Japanese Patent Document No. 60-52190 to Hiromae, 3
pages, Nov. 2, 2007. cited by applicant .
U.S. Patent and Trademark Office, Before the Board of Patent
Appeals and Interferences, Ex Parte Three-Dimensional Media Group,
Ltd., Appeal 2009-004087, Reexamination Control No. 90/007,578, US
Patent No. 4,925,294, Decision on Appeal, 88 pages, Jul. 30, 2010.
cited by applicant .
Daniel L. Symmes, Three-Dimensional Image, Microsoft Encarta Online
Encyclopedia (hard copy printed May 28, 2008 and of record, now
indicated by the website indicated on the document to be
discontinued:
http://encarta.msn.com/text.sub.--761584746.sub.--0/Three-Dimensional.sub-
.--Image.htm). cited by applicant .
Lenny Lipton, Foundations of the Stereo-Scopic Cinema A Study in
Depth, 1982, Van Nostrand Reinhold Company. cited by applicant
.
U.S. District Court, C.D. California, IMAX v. In-Three, No. 05 CV
1795, 2005, Partial Testimony, Expert: David Geshwind, WestLaw
2005, WL 3940224 (C.D.Cal.), 8 pages. cited by applicant .
U.S. District Court, C.D. California, IMAX Corporation and
Three-Dimensional Media Group, Ltd., v. In-Three, Inc., Partial
Testimony, Expert: Samuel Zhou, Ph.D., No. CV 05-1795 FMC(Mcx),
Jul. 19, 2005, 2005 WL 3940223 (C.D.Cal.), 6 pages. cited by
applicant .
U.S. District Court, C.D. California, IMAX v. In-Three. No. 06 CV
1795. Jul. 21, 2005, Partial Testimony, Expert: Samuel Zhou, Ph.D.,
2005 WL 3940225 (C.D.Cal.), 21 pages. cited by applicant .
U.S. District Court, C.D. California, Western Division, Imax
Corporation, and Three-Dimensional Media Group, Ltd. v. In-Three,
Inc., No. CV05 1795 FMC (Mcx). Jul. 18, 2005. Declaration of
Barbara Frederiksen in Support of In-Three, Inc.'s Opposition to
Plaintiffs' Motion for Preliminary Injunction, 2005 WL 5434580
(C.D.Cal.), 13 pages. cited by applicant .
Lenny Lipton, "Foundations of the Stereo-Scopic Cinema, a Study in
Depth" With and Appendix on 3D Television, 325 pages. cited by
applicant .
Interpolation (from Wikipedia encyclopedia, article pp. 1-6). cited
by applicant .
Optical Reader (from Wikipedia encyclopedia, article p. 1). cited
by applicant .
Declaration of Steven K. Feiner, Exhibit A, 10 pages. cited by
applicant .
Declaration of Michael F. Chou, Exhibit B, 12 pages. cited by
applicant .
Declaration of John Marchioro, Exhibit C, 3 pages. cited by
applicant .
Exhibit 1 to Declaration of John Marchioro, Revised translation of
portions of Japanese Patent Document No. 60-52190 to Hiromae, 3
pages. cited by applicant .
Nell et al., "Stereographic Projections by Digital Computer",
Computers and Automation for May 1965, pp. 32-34. cited by
applicant .
Nell, "Computer-Generated Three-Dimensional Movies" Computers and
Automation for Nov. 1965, pp. 20-23. cited by applicant .
U.S. Patent and Trademark Office, Before the Board of Patent
Appeals and Interferences, Ex Parte Three-Dimensional Media Group,
Ltd., Appeal 2009-004087, Reexamination Control No. 90/007,578, US
Patent No. 4,925,294, Decision on Appeal, 88 pages. cited by
applicant .
Murray et al., Active Tracking, IEEE International Conference on
Intelligent Robots and Systems, Sep. 1993, pp. 1021-1028. cited by
applicant .
Gao et al., Perceptual Motion Tracking from Image Sequences, IEEE,
Jan. 2001, pp. 389-392. cited by applicant .
Yasushi Mae, et al., "Object Tracking in Cluttered Background Based
on Optical Flow and Edges," Proc. 13th Int. Conf. on Pattern
Recognition, vol. 1, pp. 196-200, Apr. 1996. cited by applicant
.
Di Zhong, Shih-Fu Chang, "AMOS: An Active System for MPEG-4 Video
Object Segmentation," ICIP (2) 8: 647-651, Apr. 1998. cited by
applicant .
Hua Zhong, et al., "Interactive Tracker--A Semi-automatic Video
Object Tracking and Segmentation System," Microsoft Research China,
http://research.microsoft.com (Aug. 26, 2003). cited by applicant
.
Eric N. Mortensen, William A. Barrett, "Interactive segmentation
with Intelligent Scissors," Graphical Models and Image Processing,
v.60 n. 5, p. 349-384, Sep. 2002. cited by applicant .
Michael Gleicher, "Image Snapping," SIGGRAPH: 183-190, Jun. 1995.
cited by applicant .
Joseph Weber, et al., "Rigid Body Segmentation and Shape
Description . . . ," IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 19, No. 2, Feb. 1997,pp. 139-143. cited
by applicant .
E. N. Mortensen and W. A. Barrett, "Intelligent Scissors for Image
Composition," Computer Graphics (SIGGRAPH '95), pp. 191-198, Los
Angeles, CA, Aug. 1995. cited by applicant.
|
Primary Examiner: Kassa; Yosef
Attorney, Agent or Firm: ARC IP Law, PC Mayo; Joseph J.
Parent Case Text
This application is a continuation in part of U.S. Utility patent
application Ser. No. 13/029,862, filed 17 Feb. 2011 now U.S. Pat.
No. 8,385,684, which is a continuation in part of U.S. Utility
patent application Ser. No. 12/976,970, filed 22 Dec. 2010 now U.S.
Pat. No. 8,401,336, which is a continuation in part of U.S. Utility
patent application Ser. No. 12/913,614, filed 27 Oct. 2010 now U.S.
Pat. No. 8,396,328, which is a continuation in part of U.S. Utility
patent application Ser. No. 12/542,498, filed 17 Aug. 2009, now
U.S. Pat. No. 7,907,793, which is a continuation in part of U.S.
Utility patent application Ser. No. 12/032,969, filed 18 Feb. 2008
now U.S. Pat. No. 7,577,312, which is a continuation of U.S.
Utility patent application Ser. No. 11/324,815, issued as U.S. Pat.
No. 7,333,670, filed 4 Jan. 2006, which is a divisional of U.S.
Utility patent application Ser. No. 10/450,970, issued as U.S. Pat.
No. 7,181,081, filed Jun. 18.sup.th, 2003 which is a national stage
entry of Patent Cooperation Treaty Application Ser. No.
PCT/US02/14192, filed May 6.sup.th, 2002, which claims the benefit
of U.S. Provisional Patent Application 60/288,929, filed May
4.sup.th, 2001, the specifications of which are all hereby
incorporated herein by reference.
Claims
What is claimed is:
1. A system configured to modify a set of time ordered digital
images comprising at least one computer configured to: obtain a
source image and at least one mask associated with a region in said
source image; obtain a depth map or translation values associated
with said at least one mask of said source image; obtain a render
of a left viewpoint image of said source image created with ray
tracing to a left virtual camera; obtain a render of a right
viewpoint image of said source image created with ray tracing to a
right virtual camera; modify said at least one mask or an area next
to said at least one mask where missing background information is
located or said depth map or said translation values or any
combination thereof to create at least one modified mask or
modified missing background information or modified depth map or
modified translation values or any combination thereof; and, update
said left viewpoint image and said right viewpoint image based on
said at least one modified mask or said modified depth map or said
modified translation values or said any combination thereof without
another said ray tracing to said left virtual camera and without
said ray tracing to said right virtual camera.
2. The system of claim 1 wherein said depth map comprises depth
information related to said at least one mask and wherein said
depth map is sliced to exclude areas further than a threshold and
wherein said at least one mask is reshaped to create said at least
one modified depth mask.
3. The system of claim 1 wherein said depth map comprises a
distance away from a pair of virtual cameras.
4. The system of claim 1 wherein said translation values comprise a
left translation map and a right translation map that each contain
pixel-by-pixel horizontal offsets of each pixel in said source
image to said respective pixel in said left viewpoint image and
said right viewpoint image respectively.
5. The system of claim 1 wherein said left translation map and said
right translation map are UV maps or U maps.
6. The system of claim 1 wherein said at least one mask is created
on a computer that is located distally to said at least one
computer configured to update said left and right viewpoint
images.
7. The system of claim 1 wherein said depth map or said translation
values are created on a computer that is located distally to said
at least one computer configured to update said left and right
viewpoint images.
8. The system of claim 7 wherein said at least one computer
configured to render said left viewpoint image and said right
viewpoint image is configured to be controlled by a second
laborer.
9. The system of claim 1 wherein said update or said render or said
update and said render of said left and right viewpoint images
utilizes computer generated elements.
10. The system of claim 1 wherein said update of said left and
right viewpoint images utilizes generated missing background
information for at least one portion of said source image that does
not expose at least one area of another layer.
11. The system of claim 1 wherein said update of said left and
right viewpoint images utilizes generated missing background
information or gap fill data for at least one portion of said
source image that does not expose at least one area of another
layer wherein said generated missing background information or said
gap fill data is applied to areas detected in said translation
values that indicate an over threshold left or right shift of image
data and wherein said areas are optionally blurred, optionally
clamped and optionally dilated.
12. The system of claim 1 wherein said update of said left and
right viewpoint images utilizes gap fill data for at least one
portion of said source image that does not expose at least one area
of another layer wherein said gap fill data comprises data that
matches horizontal or vertical data near said missing background
information in said source image.
13. The system of claim 1 wherein said update of said left and
right viewpoint images utilizes gap fill data for at least one
portion of said source image that does not expose at least one area
of another layer wherein said gap fill data comprises blurred data
from said source image.
14. The system of claim 1 wherein update of said left and right
viewpoint images utilizes gap fill data for at least one portion of
said source image that does not expose at least one area of another
layer wherein said gap fill data comprises film grain from a region
of said source image.
15. The system of claim 1 wherein said update of said left and
right viewpoint images utilizes gap fill data for at least one
portion of said source image that does not expose at least one area
of another layer wherein said gap fill data comprises blurred data
from said source image and film grain from a region of said source
image.
16. The system of claim 1 wherein said update of said left and
right viewpoint images utilizes gap fill data for at least one
portion of said source image that does not expose at least one area
of another layer wherein said gap fill data comprises film grain
from a region of said source image that is applied as a function of
underlying luminance.
17. The system of claim 1 wherein said update of said left and
right viewpoint images utilizes modified translation values that
are created by alteration of at least one translation value in a UV
map or U map.
18. The system of claim 1 wherein said update of said left and
right viewpoint images utilizes a modified depth map that is
created by alteration of at least one depth value in said depth
map.
19. The system of claim 1 wherein said update of said left and
right viewpoint images utilizes a modified depth map that is
created through displacement of pixels associated with said at
least one mask left and right respectively by a numerical value
shift in said depth map.
20. The system of claim 1 wherein said computer is further
configured to display a gap with a color based on a function of the
size of said gap.
Description
BACKGROUND OF THE INVENTION
1. Field of the Invention
One or more embodiments of the invention are related to the field
of image analysis and image enhancement and computer graphics
processing of two-dimensional images into three-dimensional images.
More particularly, but not by way of limitation, one or more
embodiments of the invention enable a rapid workflow system and
method for image sequence depth enhancement that enables high
quality conversion of a large number of two-dimensional images into
corresponding stereoscopic image pairs, or other three-dimensional
viewing enabled images, such as an anaglyph through local
modification of images that eliminates computationally expensive
iterative ray tracing of light paths through each pixel in large
format left and right eye images each time a minor change is made
to depth or to fix artifacts.
2. Description of the Related Art
Known methods for the colorizing of black and white feature films
involves the identification of gray scale regions within a picture
followed by the application of a pre-selected color transform or
lookup tables for the gray scale within each region defined by a
masking operation covering the extent of each selected region and
the subsequent application of said masked regions from one frame to
many subsequent frames. The primary difference between U.S. Pat.
No. 4,984,072, System And Method For Color Image Enhancement, and
U.S. Pat. No. 3,705,762, Method For Converting Black-And-White
Films To Color Films, is the manner by which the regions of
interest (ROIs) are isolated and masked, how that information is
transferred to subsequent frames and how that mask information is
modified to conform with changes in the underlying image data. In
the U.S. Pat. No. 4,984,072 system, the region is masked by an
operator via a one-bit painted overlay and operator manipulated
using a digital paintbrush method frame by frame to match the
movement. In the U.S. Pat. No. 3,705,762 process, each region is
outlined or rotoscoped by an operator using vector polygons, which
are then adjusted frame by frame by the operator, to create
animated masked ROIs. Various masking technologies are generally
also utilized in the conversion of 2D movies to 3D movies.
In both systems described above, the color transform lookup tables
and regions selected are applied and modified manually to each
frame in succession to compensate for changes in the image data
that the operator detects visually. All changes and movement of the
underlying luminance gray scale is subjectively detected by the
operator and the masks are sequentially corrected manually by the
use of an interface device such as a mouse for moving or adjusting
mask shapes to compensate for the detected movement. In all cases
the underlying gray scale is a passive recipient of the mask
containing pre-selected color transforms with all modifications of
the mask under operator detection and modification. In these prior
inventions the mask information does not contain any information
specific to the underlying luminance gray scale and therefore no
automatic position and shape correction of the mask to correspond
with image feature displacement and distortion from one frame to
another is possible.
Existing systems that are utilized to convert two-dimensional
images to three-dimensional images may also require the creation of
wire frame models for objects in images that define the 3D shape of
the masked objects. The creation of wire frame models is a large
undertaking in terms of labor. These systems also do not utilize
the underlying luminance gray scale of objects in the images to
automatically position and correct the shape of the masks of the
objects to correspond with image feature displacement and
distortion from one frame to another. Hence, great amounts of labor
are required to manually shape and reshape masks for applying depth
or Z-dimension data to the objects. Motion objects that move from
frame to frame thus require a great deal of human intervention. In
addition, there are no known solutions for enhancing
two-dimensional images into three-dimensional images that utilize
composite backgrounds of multiple images in a frame for spreading
depth information to background and masked objects. This includes
data from background objects whether or not pre-existing or
generated for an occluded area where missing data exists, i.e.,
where motion objects never uncover the background. In other words,
known systems gap fill using algorithms for inserting image data
where none exists, which causes artifacts.
Current methods for converting movies from 2D to 3D that include
computer-generated elements or effects, generally utilize only the
final sequence of 2D images that make up the movie. This is the
current method used for conversion of all movies from
two-dimensional data to left and right image pairs for
three-dimensional viewing. There are no known current methods that
obtain and make use of metadata associated with the
computer-generated elements for a movie to be converted. This is
the case since studios that own the older 2D movies may not have
retained intermediate data for a movie, i.e., the metadata
associated with computer generated elements, since the amount of
data in the past was so large that the studios would only retain
the final movie data with rendered computer graphics elements and
discard the metadata. For movies having associated metadata that
has been retained, (i.e., intermediate data associated with the
computer-generated elements such as mask, or alpha and/or depth
information), use of this metadata would greatly speed the depth
conversion process.
In addition, typical methods for converting movies from 2D to 3D in
an industrial setting capable of handling the conversion of
hundreds of thousands of frames of a movie with large amounts of
labor or computing power, make use of an iterative workflow. The
iterative workflow includes masking objects in each frame, adding
depth and then rendering the frame into left and right viewpoints
forming an anaglyph image or a left and right image pair. If there
are errors in the edges of the masked objects for example, then the
typical workflow involves an "iteration", i.e., sending the frames
back to the workgroup responsible for masking the objects, (which
can be in a country with cheap unskilled labor half way around the
world), after which the masks are sent to the workgroup responsible
for rendering the images, (again potentially in another country),
wherein rendering is accomplished by ray tracing the path of light
through each pixel in left and right images to simulate the light
effects the path of light interacts with and for example bounces
off of or through, which is computationally extremely expensive.
After ray tracing, the rendered image pair is sent back to the
quality assurance group. It is not uncommon in this workflow
environment for many iterations of a complicated frame to take
place. This is known as "throw it over the fence" workflow since
different workgroups work independently to minimize their current
work load and not as a team with overall efficiency in mind. With
hundreds of thousands of frames in a movie, the amount of time that
it takes to iterate back through frames containing artifacts can
become high, causing delays in the overall project. Even if the
re-rendering process takes place locally, the amount of time to
re-render or ray-trace all of the images of a scene can cause
significant processing and hence delays on the order of at least
hours. Elimination of iterations such as this would provide a huge
savings in wall-time, or end-to-end time that a conversion project
takes, thereby increasing profits and minimizing the workforce
needed to implement the workflow.
Hence there is a need for a rapid workflow system and method for
image sequence depth enhancement.
BRIEF SUMMARY OF THE INVENTION
Embodiments of the invention generally classify scenes to be
colorized and/or converted from two-dimensional to
three-dimensional into movies into two separate categories. Scenes
generally include two or more images in time sequence for example.
The two categories include background elements (i.e. sets and
foreground elements that are stationary) or motion elements (e.g.,
actors, automobiles, etc.) that move throughout the scene. These
background elements and motion elements are treated separately in
embodiments of the invention similar to the manner in which
traditional animation is produced. In addition, many movies now
include computer-generated elements (also known as computer
graphics or CG, or also as computer-generated imagery or CGI) that
include objects that do not exist in reality, such as robots or
spaceships for example, or which are added as effects to movies,
for example dust, fog, clouds, etc. Computer-generated elements may
include background elements, or motion elements.
Motion Elements: The motion elements are displayed as a series of
sequential tiled frame sets or thumbnail images complete with
background elements. The motion elements are masked in a key frame
using a multitude of operator interface tools common to paint
systems as well as unique tools such as relative bimodal
thresholding in which masks are applied selectively to contiguous
light or dark areas bifurcated by a cursor brush. After the key
frame is fully designed and masked, the mask information from the
key frame is then applied to all frames in the display-using mask
fitting techniques that include:
1. Automatic mask fitting using Fast Fourier Transform and Gradient
Decent Calculations based on luminance and pattern matching which
references the same masked area of the key frame followed by all
prior subsequent frames in succession. Since the computer system
implementing embodiments of the invention can reshape at least the
outlines of masks from frame to frame, large amounts of labor can
be saved from this process that traditionally has been done by
hand. In 2D to 3D conversion projects, sub-masks can be adjusted
manually within a region of interest when a human recognizable
object rotates for example, and this process can be "tweened" such
that the computer system automatically adjusts sub-masks from frame
to frame between key frames to save additional labor.
2. Bezier curve animation with edge detection as an automatic
animation guide
3. Polygon animation with edge detection as an automatic animation
guide
In one or more embodiments of the invention, computer-generated
elements are imported using RGBAZ files that include an optional
alpha mask and/or depths on a pixel-by-pixel, or
sub-pixel-by-sub-pixel basis for a computer-generated element.
Examples of this type of file include the EXR file format. Any
other file format capable of importing depth and/or alpha
information is in keeping with the spirit of the invention.
Embodiments of the invention import any type of file associated
with a computer-generated element to provide instant depth values
for a portion of an image associated with a computer-generated
element. In this manner, no mask fitting or reshaping is required
for any of the computer-generated elements from frame to frame
since the alpha and depth on a pixel-by-pixel or
sub-pixel-by-sub-pixel basis already exists, or is otherwise
imported or obtained for the computer-generated element. For
complicated movies with large amounts of computer-generated
elements, the import and use of alpha and depth for
computer-generated elements makes the conversion of a
two-dimensional image to a pair of images for right and left eye
viewing economically viable. One or more embodiments of the
invention allow for the background elements and motion elements to
have depths associated with them or otherwise set or adjusted, so
that all objects other than computer-generated objects are
artistically depth adjusted. In addition, embodiments of the
invention allow for the translation, scaling or normalization of
the depths for example imported from an RGBAZ file that are
associated with computer-generated objects so as to maintain the
relative integrity of depth for all of the elements in a frame or
sequence of frames. In addition, any other metadata such as
character mattes or alphas or other masks that exist for elements
of the images that make up a movie can also be imported and
utilized to improve the operated-defined masks used for conversion.
On format of a file that may be imported to obtain metadata for
photographic elements in a scene includes the RGBA file format. By
layering different objects from deepest to closest, i.e.,
"stacking" and applying any alpha or mask of each element, and
translating the closest objects the most horizontally for left and
right images, a final pair of depth enhanced images is thus created
based on the input image and any computer-generated element
metadata.
In another embodiment of this invention, these background elements
and motion elements are combined separately into single frame
representations of multiple frames, as tiled frame sets or as a
single frame composite of all elements (i.e., including both motion
and backgrounds/foregrounds) that then becomes a visual reference
database for the computer controlled application of masks within a
sequence composed of a multiplicity of frames. Each pixel address
within the reference visual database corresponds to mask/lookup
table address within the digital frame and X, Y, Z location of
subsequent "raw" frames that were used to create the reference
visual database. Masks are applied to subsequent frames based on
various differentiating image processing methods such as edge
detection combined with pattern recognition and other sub-mask
analysis, aided by operator segmented regions of interest from
reference objects or frames, and operator directed detection of
subsequent regions corresponding to the original region of
interest. In this manner, the gray scale actively determines the
location and shape of each mask (and corresponding color lookup
from frame to frame for colorization projects or depth information
for two-dimensional to three-dimensional conversion projects) that
is applied in a keying fashion within predetermined and
operator-controlled regions of interest.
Camera Pan Background and Static Foreground Elements: Stationary
foreground and background elements in a plurality of sequential
images comprising a camera pan are combined and fitted together
using a series of phase correlation, image fitting and focal length
estimation techniques to create a composite single frame that
represents the series of images used in its construction. During
the process of this construction the motion elements are removed
through operator adjusted global placement of overlapping
sequential frames.
For colorization projects, the single background image representing
the series of camera pan images is color designed using multiple
color transform look up tables limited only by the number of pixels
in the display. This allows the designer to include as much detail
as desired including air brushing of mask information and other
mask application techniques that provide maximum creative
expression. For depth conversion projects, (i.e., two-dimensional
to three-dimensional movie conversion for example), the single
background image representing the series of camera pan images may
be utilized to set depths of the various items in the background.
Once the background color/depth design is completed the mask
information is transferred automatically to all the frames that
were used to create the single composited image. In this manner,
color or depth is performed once per multiple images and/or scene
instead of once per frame, with color/depth information
automatically spread to individual frames via embodiments of the
invention. Masks from colorization projects may be combined or
grouped for depth conversion projects since the colorization masks
may contain more sub-areas than a depth conversion mask. For
example, for a coloration project, a person's face may have several
masks applied to areas such as lips, eyes, hair, while a depth
conversion project may only require an outline of the person's head
or an outline of a person's nose, or a few geometric shape
sub-masks to which to apply depth. Masks from a colorization
project can be utilized as a starting point for a depth conversion
project since defining the outlines of human recognizable objects
by itself is time consuming and can be utilized to start the depth
conversion masking process to save time. Any computer-generated
elements at the background level may be applied to the single
background image.
In one or more embodiments of the invention, image offset
information relative to each frame is registered in a text file
during the creation of the single composite image representing the
pan and used to apply the single composite mask to all the frames
used to create the composite image.
Since the foreground moving elements have been masked separately
prior to the application of the background mask, the background
mask information is applied wherever there is no pre-existing mask
information.
Static Camera Scenes With and Without Film Weave, Minor Camera
Following and Camera Drift: In scenes where there is minor camera
movement or film weave resulting from the sprocket transfer from 35
mm or 16 mm film to digital format, the motion objects are first
fully masked using the techniques listed above. All frames in the
scene are then processed automatically to create a single image
that represents both the static foreground elements and background
elements, eliminating all masked moving objects where they both
occlude and expose the background.
Wherever the masked moving object exposes the background or
foreground, the instance of background and foreground previously
occluded is copied into the single image with priority and proper
offsets to compensate for camera movement. The offset information
is included in a text file associated with each single
representation of the background so that the resulting mask
information can be applied to each frame in the scene with proper
mask offsets.
The single background image representing the series of static
camera frames is color designed using multiple color transform look
up tables limited only by the number of pixels in the display.
Where the motion elements occlude the background elements
continuously within the series of sequential frames they are seen
as black figure that are ignored and masked over. The black objects
are ignored in colorization-only projects during the masking
operation because the resulting background mask is later applied to
all frames used to create the single representation of the
background only where there is no pre-existing mask. If background
information is created for areas that are never exposed, then this
data is treated as any other background data that is spread through
a series of images based on the composite background. This allows
for minimization of artifacts or artifact-free two-dimensional to
three-dimensional conversion since there is never any need to
stretch objects or extend pixels as for missing data, since image
data that has been generated to be believable to the human observer
is generated for and then taken from the occluded areas when needed
during the depth conversion process. Hence for motion elements and
computer-generated elements, realistic looking data can be utilized
for areas behind these elements when none exists. This allows the
designer to include as much detail as desired including air
brushing of mask information and other mask application techniques
that provide maximum creative expression. Once the background color
design is completed the mask information is transferred
automatically to all the frames that were used to create the single
composited image. For depth projects, the distance from the camera
to each item in the composite frame is automatically transferred to
all the frames that were used to create the single composited
image. By shifting masked background objects horizontally more or
less, their perceived depth is thus set in a secondary viewpoint
frame that corresponds to each frame in the scene. This horizontal
shifting may utilize data generated by an artist for the occluded
or alternatively, areas where no image data exists yet for a second
viewpoint may be marked in one or more embodiments of the invention
using a user defined color that allows for the creation missing
data to ensure that no artifacts occur during the two-dimension to
three-dimension conversion process. Any technique known may be
utilized in embodiments of the invention to cover areas in the
background where unknown data exists, i.e., (as displayed in some
color that shows where the missing data exists) that may not be
borrowed from another scene/frame for example by having artists
create complete backgrounds or smaller occluded areas with artist
drawn objects. After assigning depths to objects in the composite
background, or by importing depths associated with
computer-generated elements at the background depth, a second
viewpoint image may be created for each image in a scene in order
to produce a stereoscopic view of the movie, for example a left eye
view where the original frames in the scene are assigned to the
right eye viewpoint, for example by translating foreground objects
horizontally for the second viewpoint, or alternatively by
translating foreground objects horizontally left and right to
create two viewpoints offset from the original viewpoint.
Embodiments of the invention enable local processing and eliminates
iterative ray tracing. For example, embodiments of the invention
enable local masking of desired regions to be depth enhanced,
realistic gap fill to fill in areas the are not visible in any
other frames in a scene once an object is translated left and right
to add depth, including gap fill with grain and blur to emulate
existing portions of an image that are not occluded. Embodiments of
the invention also enable sophisticated depth dilation and
distortion, and in addition enable the detection of gaps in
translation maps. The system also enables real-time editing of 3D
images without re-rendering for example to alter
layers/colors/masks and/or remove artifacts and to minimize or
eliminate iterative workflow paths back through different
workgroups by generating translation files that can be utilized as
portable pixel-wise editing files. Great amounts of time are saved
by eliminating ray tracing, and enabling adjustments to be made
local to a work group. Embodiments of the system thus greatly aid
the artist in the enhancement of images to include depth by
providing realistic depth modifications to minimize manual
manipulation of images. For example, a mask group takes source
images and creates masks for items, areas or human recognizable
objects in each frame of a sequence of images that make up a movie.
The depth augmentation group applies depths, and for example
shapes, to the masks created by the mask group. When rendering an
image pair, left and right viewpoint images and left and right
translation files may be generated by one or more embodiments of
the invention. The left and right viewpoint images allow 3D viewing
of the original 2D image. The translation files specify the pixel
offsets for each source pixel in the original 2D image, for example
in the form of UV or U maps. These files are generally related to
an alpha mask for each layer, for example a layer for an actress, a
layer for a door, a layer for a background, etc. These translation
files, or maps are passed from the depth augmentation group that
renders 3D images, to the quality assurance workgroup. This allows
the quality assurance workgroup (or other workgroup such as the
depth augmentation group) to perform real-time editing of 3D images
without re-rendering for example to alter layers/colors/masks
and/or remove artifacts such as masking errors without delays
associated with processing time/re-rendering and/or iterative
workflow that requires such re-rendering or sending the masks back
to the mask group for rework, wherein the mask group may be in a
third world country with unskilled labor on the other side of the
globe. In addition, when rendering the left and right images, i.e.,
3D images, the Z depth of regions within the image, such as actors
for example, may also be passed along with the alpha mask to the
quality assurance group, who may then adjust depth as well without
re-rendering with the original rendering software. This may be
performed for example with generated missing background data from
any layer so as to allow "downstream" real-time editing without
re-rendering or ray-tracing for example. Quality assurance may give
feedback to the masking group or depth augmentation group for
individuals so that these individuals may be instructed to produce
work product as desired for the given project, without waiting for,
or requiring the upstream groups to rework anything for the current
project. This allows for feedback yet eliminates iterative delays
involved with sending work product back for rework and the
associated delay for waiting for the reworked work product.
Elimination of iterations such as this provide a huge savings in
wall-time, or end-to-end time that a conversion project takes,
thereby increasing profits and minimizing the workforce needed to
implement the workflow.
BRIEF DESCRIPTION OF THE DRAWINGS
The patent or application file contains at least one drawing
executed in color. Copies of this patent or patent application
publication with color drawing(s) will be provided by the Office
upon request and payment of the necessary fee.
FIG. 1 shows a plurality of feature film or television film frames
representing a scene or cut in which there is a single instance or
perceptive of a background.
FIG. 2 shows an isolated background processed scene from the
plurality of frames shown in FIG. 1 in which all motion elements
are removed using various subtraction and differencing techniques.
The single background image is then used to create a background
mask overlay representing designer selected color lookup tables in
which dynamic pixel colors automatically compensate or adjust for
moving shadows and other changes in luminance.
FIG. 3 shows a representative sample of each motion object
(M-Object) in the scene receives a mask overlay that represents
designer selected color lookup tables in which dynamic pixel colors
automatically compensate or adjust for moving shadows and other
changes in luminance as the M-Object moves within the scene.
FIG. 4 shows all mask elements of the scene are then rendered to
create a fully colored frame in which M-Object masks are applied to
each appropriate frame in the scene followed by the background
mask, which is applied only where there is no pre-existing mask in
a Boolean manner.
FIGS. 5A and 5B show a series of sequential frames loaded into
display memory in which one frame is fully masked with the
background (key frame) and ready for mask propagation to the
subsequent frames via automatic mask fitting methods.
FIGS. 6A and 6B show the child window displaying an enlarged and
scalable single image of the series of sequential images in display
memory. The Child window enables the operator to manipulate masks
interactively on a single frame or in multiple frames during real
time or slowed motion.
FIGS. 7A and 7B shows a single mask (flesh) is propagated
automatically to all frames in the display memory.
FIG. 8 shows all masks associated with the motion object are
propagated to all sequential frames in display memory.
FIG. 9A shows a picture of a face.
FIG. 9B shows a close up of the face in FIG. 9A wherein the "small
dark" pixels shown in FIG. 9B are used to calculate a weighed index
using bilinear interpolation.
FIGS. 10A-D show searching for a Best Fit on the Error Surface: An
error surface calculation in the Gradient Descent Search method
involves calculating mean squared differences of pixels in the
square fit box centered on reference image pixel (x0, y0), between
the reference image frame and the corresponding (offset) location
(x, y) on the search image frame.
FIGS. 11A-C show a second search box derived from a descent down
the error surface gradient (evaluated separately), for which the
evaluated error function is reduced, possibly minimized, with
respect to the original reference box (evident from visual
comparison of the boxes with the reference box in FIGS. 10A, B, C
and D).
FIG. 12 depicts the gradient component evaluation. The error
surface gradient is calculated as per definition of the gradient.
Vertical and horizontal error deviations are evaluated at four
positions near the search box center position, and combined to
provide an estimate of the error gradient for that position.
12.
FIG. 13 shows a propagated mask in the first sequential instance
where there is little discrepancy between the underlying image data
and the mask data. The dress mask and hand mask can be clearly seen
to be off relative to the image data.
FIG. 14 shows that by using the automatic mask fitting routine, the
mask data adjusts to the image data by referencing the underlying
image data in the preceding image.
FIG. 15 shows the mask data in later images within the sequence
show marked discrepancy relative to the underlying image data. Eye
makeup, lipstick, blush, hair, face, dress and hand image data are
all displaced relative to the mask data.
FIG. 16 shows that the mask data is adjusted automatically based on
the underlying image data from the previous mask and underlying
image data.
FIG. 17 shows the mask data from FIG. 16 is shown with appropriate
color transforms after whole frame automatic mask fitting. The mask
data is adjusted to fit the underlying luminance pattern based on
data from the previous frame or from the initial key frame.
FIG. 18 shows polygons that are used to outline a region of
interest for masking in frame one. The square polygon points snap
to the edges of the object of interest. Using a Bezier curve the
Bezier points snap to the object of interest and the control
points/curves shape to the edges.
FIG. 19 shows the entire polygon or Bezier curve is carried to a
selected last frame in the display memory where the operator
adjusts the polygon points or Bezier points and curves using the
snap function which automatically snaps the points and curves to
the edges of the object of interest.
FIG. 20 shows that if there is a marked discrepancy between the
points and curves in frames between the two frames where there was
an operator interactive adjustment, the operator will further
adjust a frame in the middle of the plurality of frames where there
is maximum error of fit.
FIG. 21 shows that when it is determined that the polygons or
Bezier curves are correctly animating between the two adjusted
frames, the appropriate masks are applied to all frames.
FIG. 22 shows the resulting masks from a polygon or Bezier
animation with automatic point and curve snap to edges. The brown
masks are the color transforms and the green masks are the
arbitrary color masks.
FIG. 23 shows an example of two pass blending: The objective in
two-pass blending is to eliminate moving objects from the final
blended mosaic. This can be done by first blending the frames so
the moving object is completely removed from the left side of the
background mosaic. As shown in FIG. 23, the character can is
removed from the scene, but can still be seen in the right side of
the background mosaic.
FIG. 24 shows the second pass blend. A second background mosaic is
then generated, where the blend position and width is used so that
the moving object is removed from the right side of the final
background mosaic. As shown in FIG. 24, the character can is
removed from the scene, but can still be seen the left side of the
background mosaic. In the second pass blend as shown in FIG. 24,
the moving character is shown on the left.
FIG. 25 shows the final background corresponding to FIGS. 23-24.
The two-passes are blended together to generate the final blended
background mosaic with the moving object removed from the scene. As
shown in FIG. 25, the final blended background with moving
character is removed.
FIG. 26 shows an edit frame pair window.
FIG. 27 shows sequential frames representing a camera pan that are
loaded into memory. The motion object (butler moving left to the
door) has been masked with a series of color transform information
leaving the background black and white with no masks or color
transform information applied.
FIG. 28 shows six representative sequential frames of the pan above
are displayed for clarity.
FIG. 29 shows the composite or montage image of the entire camera
pan that was built using phase correlation techniques. The motion
object (butler) included as a transparency for reference by keeping
the first and last frame and averaging the phase correlation in two
directions. The single montage representation of the pan is color
designed using the same color transform masking techniques as used
for the foreground object.
FIG. 30 shows that the sequence of frames in the camera pan after
the background mask color transforms the montage has been applied
to each frame used to create the montage. The mask is applied where
there is no pre-existing mask thus retaining the motion object mask
and color transform information while applying the background
information with appropriate offsets.
FIG. 31 shows a selected sequence of frames in the pan for clarity
after the color background masks have been automatically applied to
the frames where there is no pre-existing masks.
FIG. 32 shows a sequence of frames in which all moving objects
(actors) are masked with separate color transforms.
FIG. 33 shows a sequence of selected frames for clarity prior to
background mask information. All motion elements have been fully
masked using the automatic mask-fitting algorithm.
FIG. 34 shows the stationary background and foreground information
minus the previously masked moving objects. In this case, the
single representation of the complete background has been masked
with color transforms in a manner similar to the motion objects.
Note that outlines of removed foreground objects appear truncated
and unrecognizable due to their motion across the input frame
sequence interval, i.e., the black objects in the frame represent
areas in which the motion objects (actors) never expose the
background and foreground. The black objects are ignored during the
masking operation in colorization-only projects because the
resulting background mask is later applied to all frames used to
create the single representation of the background only where there
is no pre-existing mask. In depth conversion projects the missing
data area may be displayed so that image data may be
obtained/generated for the missing data area so as to provide
visually believable image data when translating foreground objects
horizontally to generate a second viewpoint.
FIG. 35 shows the sequential frames in the static camera scene cut
after the background mask information has been applied to each
frame with appropriate offsets and where there is no pre-existing
mask information.
FIG. 36 shows a representative sample of frames from the static
camera scene cut after the background information has been applied
with appropriate offsets and where there is no pre-existing mask
information.
FIGS. 37A-C show embodiments of the Mask Fitting functions,
including calculate fit grid and interpolate mask on fit grid.
FIGS. 38A-B show embodiments of the extract background
functions.
FIGS. 39A-C show embodiments of the snap point functions.
FIGS. 40A-C show embodiments of the bimodal threshold masking
functions, wherein FIG. 40C corresponds to step 2.1 in FIG. 40A,
namely "Create Image of Light/Dark Cursor Shape" and FIG. 40B
corresponds to step 2.2 in FIG. 40A, namely "Apply Light/Dark shape
to mask".
FIGS. 41A-B show embodiments of the calculate fit value
functions.
FIG. 42 shows two image frames that are separated in time by
several frames, of a person levitating a crystal ball wherein the
various objects in the image frames are to be converted from
two-dimensional objects to three-dimensional objects.
FIG. 43 shows the masking of the first object in the first image
frame that is to be converted from a two-dimensional image to a
three-dimensional image.
FIG. 44 shows the masking of the second object in the first image
frame.
FIG. 45 shows the two masks in color in the first image frame
allowing for the portions associated with the masks to be
viewed.
FIG. 46 shows the masking of the third object in the first image
frame.
FIG. 47 shows the three masks in color in the first image frame
allowing for the portions associated with the masks to be
viewed.
FIG. 48 shows the masking of the fourth object in the first image
frame.
FIG. 49 shows the masking of the fifth object in the first image
frame.
FIG. 50 shows a control panel for the creation of three-dimensional
images, including the association of layers and three-dimensional
objects to masks within an image frame, specifically showing the
creation of a Plane layer for the sleeve of the person in the
image.
FIG. 51 shows a three-dimensional view of the various masks shown
in FIGS. 43-49, wherein the mask associated with the sleeve of the
person is shown as a Plane layer that is rotated toward the left
and right viewpoints on the right of the page.
FIG. 52 shows a slightly rotated view of FIG. 51.
FIG. 53 shows a slightly rotated view of FIG. 51.
FIG. 54 shows a control panel specifically showing the creation of
a sphere object for the crystal ball in front of the person in the
image.
FIG. 55 shows the application of the sphere object to the flat mask
of the crystal ball, that is shown within the sphere and as
projected to the front and back of the sphere to show the depth
assigned to the crystal ball.
FIG. 56 shows a top view of the three-dimensional representation of
the first image frame showing the Z-dimension assigned to the
crystal ball shows that the crystal ball is in front of the person
in the scene.
FIG. 57 shows that the sleeve plane rotating in the X-axis to make
the sleeve appear to be coming out of the image more.
FIG. 58 shows a control panel specifically showing the creation of
a Head object for application to the person's face in the image,
i.e., to give the person's face realistic depth without requiring a
wire model for example.
FIG. 59 shows the Head object in the three-dimensional view, too
large and not aligned with the actual person's head.
FIG. 60 shows the Head object in the three-dimensional view,
resized to fit the person's face and aligned, e.g., translated to
the position of the actual person's head.
FIG. 61 shows the Head object in the three-dimensional view, with
the Y-axis rotation shown by the circle and Y-axis originating from
the person's head thus allowing for the correct rotation of the
Head object to correspond to the orientation of the person's
face.
FIG. 62 shows the Head object also rotated slightly clockwise,
about the Z-axis to correspond to the person's slightly tilted
head.
FIG. 63 shows the propagation of the masks into the second and
final image frame.
FIG. 64 shows the original position of the mask corresponding to
the person's hand.
FIG. 65 shows the reshaping of the mask, that can be performed
automatically and/or manually, wherein any intermediate frames get
the tweened depth information between the first image frame masks
and the second image frame masks.
FIG. 66 shows the missing information for the left viewpoint as
highlighted in color on the left side of the masked objects in the
lower image when the foreground object, here a crystal ball is
translated to the right.
FIG. 67 shows the missing information for the right viewpoint as
highlighted in color on the right side of the masked objects in the
lower image when the foreground object, here a crystal ball is
translated to the left.
FIG. 68 shows an anaglyph of the final depth enhanced first image
frame viewable with Red/Blue 3-D glasses.
FIG. 69 shows an anaglyph of the final depth enhanced second and
last image frame viewable with Red/Blue 3-D glasses, note rotation
of person's head, movement of person's hand and movement of crystal
ball.
FIG. 70 shows the right side of the crystal ball with fill mode
"smear", wherein the pixels with missing information for the left
viewpoint, i.e., on the right side of the crystal ball are taken
from the right edge of the missing image pixels and "smeared"
horizontally to cover the missing information.
FIG. 71 shows a mask or alpha plane, for an actor's upper torso and
head (and transparent wings). The mask may include opaque areas
shown as black and transparent areas that are shown as grey
areas.
FIG. 72 shows an occluded area, that corresponds to the actor of
FIG. 71, and that shows an area of the background that is never
exposed in any frame in a scene. This may be a composite background
for example.
FIG. 73 shows the occluded area artistically rendered to generate a
complete and realistic background for use in two-dimensional to
three-dimensional conversion, so as to enable an artifact-free
conversion.
FIG. 73A shows the occluded area partially drawn or otherwise
rendered to generate just enough of a realistic looking background
for use in minimizing artifacts two-dimensional to
three-dimensional conversion.
FIG. 74 shows a light area of the shoulder portion on the right
side of FIG. 71 that represents a gap where stretching (as is also
shown in FIG. 70) would be used when shifting the foreground object
to the left to create a right viewpoint. The dark portion of the
figure is taken from the background where data is available in at
least one frame of a scene.
FIG. 75 shows an example of the stretching of pixels, i.e.,
smearing, corresponding to the light area in FIG. 74 without the
use of a generated background, i.e., if no background data is
available for an area that is occluded in all frames of a
scene.
FIG. 76 shows a result of a right viewpoint without artifacts on
the edge of the shoulder of the person wherein the dark area
includes pixels available in one or more frames of a scene, and
generated data for always-occluded areas of a scene.
FIG. 77 shows an example of a computer-generated element, here a
robot, which is modeled in three-dimensional space and projected as
a two-dimensional image. If metadata such as alpha, mask, depth or
any combination thereof exists, the metadata can be utilized to
speed the conversion process from two-dimensional image to a pair
of two-dimensional images for left and right eye for
three-dimensional viewing.
FIG. 78 shows an original image separated into a background and
foreground elements, (mountain and sky in the background and
soldiers in the bottom left also see FIG. 79) along with the
imported color and depth of the computer-generated element, i.e.,
the robot with depth automatically set via the imported depth
metadata. As shown in the background, any area that is covered for
the scene can be artistically rendered for example to provide
believable missing data, as is shown in FIG. 73 based on the
missing data of FIG. 73A, which results in artifact free edges as
shown in FIG. 76 for example.
FIG. 79 shows masks associated with the photograph of soldiers in
the foreground to apply depth to the various portions of the
soldiers that lie in depth in front of the computer-generated
element, i.e., the robot. The dashed lines horizontally extending
from the mask areas show horizontal translation of the foreground
objects takes place and where imported metadata can be utilized to
accurately auto-correct over-painting of depth or color on the
masked objects when metadata exists for the other elements of a
movie. For example, when an alpha exists for the objects that occur
in front of the computer-generated elements. One type of file that
can be utilized to obtain mask edge data is a file with alpha file
and/or mask data such as an RGBA file.
FIG. 80 shows an imported alpha layer which can also be utilized as
a mask layer to limit the operator defined, and potentially less
accurate masks used for applying depth to the edges of the three
soldiers A, B and C. In addition, a computer-generated element for
dust can be inserted into the scene along the line annotated as
"DUST", to augment the reality of the scene.
FIG. 81 shows the result of using the operator-defined masks
without adjustment when overlaying a motion element such as the
soldier on the computer-generated element such as the robot.
Through use of the alpha metadata of FIG. 80 applied to the
operated-defined mask edges of FIG. 79, artifact free edges on the
overlapping areas is thus enabled.
FIG. 82 shows a source image to be depth enhanced and provided
along with left and right translation files and alpha masks so that
downstream workgroups may perform real-time editing of 3D images
without re-rendering for example to alter layers/colors/masks
and/or remove and/or or adjust depths without iterative workflow
paths back to the original workgroups.
FIG. 83 shows masks generated by the mask workgroup for the
application of depth by the depth augmentation group, wherein the
masks are associated with objects, such as for example human
recognizable objects in the source image of FIG. 82.
FIG. 84 shows areas where depth is applied generally as darker for
nearer objects and lighter for objects that are further away.
FIG. 85A shows a left UV map containing translations or offsets in
the horizontal direction for each source pixel.
FIG. 85B shows a right UV map containing translations or offsets in
the horizontal direction for each source pixel.
FIG. 85C shows a black value shifted portion of the left UV map of
FIG. 85A to show the subtle contents therein.
FIG. 85D shows a black value shifted portion of the right UV map of
FIG. 85B to show the subtle contents therein.
FIG. 86A shows a left U map containing translations or offsets in
the horizontal direction for each source pixel.
FIG. 86B shows a right U map containing translations or offsets in
the horizontal direction for each source pixel.
FIG. 86C shows a black value shifted portion of the left U map of
FIG. 86A to show the subtle contents therein.
FIG. 86D shows a black value shifted portion of the right U map of
FIG. 86B to show the subtle contents therein.
FIG. 87 shows known uses for UV maps, wherein a three-dimensional
model is unfolded so that an image in UV space can be painted onto
the 3D model using the UV map.
FIG. 88 shows a disparity map showing the areas where the
difference between the left and right translation maps is the
largest.
FIG. 89 shows a left eye rendering of the source image of FIG.
82.
FIG. 90 shows a right eye rendering of the source image of FIG.
82.
FIG. 91 shows an anaglyph of the images of FIG. 89 and FIG. 90 for
use with Red/Blue glasses.
FIG. 92 shows an image that has been masked and is in the process
of depth enhancement for the various layers.
FIG. 93 shows a UV map overlaid onto an alpha mask associated with
the actress shown in FIG. 92 which sets the translation offsets in
the resulting left and right UV maps based on the depth settings of
the various pixels in the alpha mask.
FIG. 94 shows a workspace generated for a second depth enhancement
program, or compositing program such as NUKE.RTM., i.e., generated
for the various layers shown in FIG. 92, i.e., left and right UV
translation maps for each of the alphas wherein the workspace
allows for quality assurance personnel (or other work groups) to
perform real-time editing of 3D images without re-rendering for
example to alter layers/colors/masks and/or remove artifacts or
otherwise adjust masks and hence alter the 3D image pair (or
anaglyph) without iteratively sending fixes to any other
workgroup.
FIG. 95 shows a workflow for iterative corrective workflow.
FIG. 96 shows an embodiment of the workflow enabled by one or more
embodiments of the system in that each workgroup can perform
real-time editing of 3D images without re-rendering for example to
alter layers/colors/masks and/or remove artifacts and otherwise
correct work product from another workgroup without iterative
delays associated with re-rendering/ray-tracing or sending work
product back through the workflow for corrections.
FIG. 97 shows an embodiment of the rapid workflow for local
modification of masks, gaps next to masks, depth maps, translation
values such as UV or U Maps or any combination thereof to remove
artifacts, create missing background information or otherwise
adjust, alter, improve or otherwise modify previously ray traced or
rendered images without re-rendering the entire image or
stereoscopic images or otherwise re-ray tracing the image or
stereoscopic images.
FIGS. 98A-D show Z Depth Alpha processing.
FIGS. 99A-D show UV Gap Detection processing.
FIGS. 100A-E show Gap Fill processing.
FIGS. 101A-E show Gap Blur Grain processing.
FIGS. 102A-B show Grain Merge processing.
FIG. 103 shows gap analysis processing that enables color coding of
gaps based on their thickness to enable rapid identification and
local modification of areas where artifacts may be visible or
unacceptable for example wherein for example Red areas will likely
need more attention than green areas and wherein the color also
gives artists a quick view of areas of an image where internal
portions of characters/objects are broken, e.g., where regions are
not aligned with their neighbors correctly.
DETAILED DESCRIPTION OF THE INVENTION
Feature Film and TV series Data Preparation for Colorization/Depth
enhancement: Feature films are tele-cined or transferred from 35 mm
or 16 mm film using a high resolution scanner such as a 10-bit
SPIRIT DATACINE.RTM. or similar device to HDTV (1920 by 1080 24P)
or data-cined on a laser film scanner such as that manufactured by
IMAGICA.RTM. Corp. of America at a larger format 2000 lines to 4000
lines and up to 16 bits of grayscale. The high resolution frame
files are then converted to standard digital files such as
uncompressed TIP files or uncompressed TGA files typically in 16
bit three-channel linear format or 8 bit three channel linear
format. If the source data is HDTV, the 10-bit HDTV frame files are
converted to similar TIF or TGA uncompressed files at either
16-bits or 8-bit per channel. Each frame pixel is then averaged
such that the three channels are merged to create a single 16 bit
channel or 8 bit channel respectively. Any other scanning
technologies capable of scanning an existing film to digital format
may be utilized. Currently, many movies are generated entirely in
digital format, and thus may be utilized without scanning the
movie. For digital movies that have associated metadata, for
example for movies that make use of computer-generated characters,
backgrounds or any other element, the metadata can be imported for
example to obtain an alpha and/or mask and/or depth for the
computer-generated element on a pixel-by-pixel or
sub-pixel-by-sub-pixel basis. One format of a file that contains
alpha/mask and depth data is the RGBAZ file format, of which one
implementation is the EXR file format.
Digitization Telecine and Format Independence Monochrome elements
of either 35 or 16 mm negative or positive film are digitized at
various resolutions and bit depth within a high resolution film
scanner such as that performed with a SPIRIT DATACINE.RTM. by
PHILIPS.RTM. and EASTMAN KODAK.RTM. which transfers either 525 or
625 formats, HDTV, (HDTV) 1280.times.720/60 Hz progressive, 2K, DTV
(ATSC) formats like 1920.times.1080/24 Hz/25 Hz progressive and
1920.times.1080/48 Hz/50 Hz segmented frame or 1920.times.1080 50I
as examples. The invention provides improved methods for editing
film into motion pictures. Visual images are transferred from
developed motion picture film to a high definition video storage
medium, which is a storage medium adapted to store images and to
display images in conjunction with display equipment having a scan
density substantially greater than that of an NTSC compatible video
storage medium and associated display equipment. The visual images
are also transferred, either from the motion picture film or the
high definition video storage medium to a digital data storage
format adapted for use with digital nonlinear motion picture
editing equipment. After the visual images have been transferred to
the high definition video storage medium, the digital nonlinear
motion picture editing equipment is used to generate an edit
decision list, to which the motion picture film is then conformed.
The high definition video storage medium is generally adapted to
store and display visual images having a scan density of at least
1080 horizontal lines. Electronic or optical transformation may be
utilized to allow use of visual aspect ratios that make full use of
the storage formats used in the method. This digitized film data as
well as data already transferred from film to one of a multiplicity
of formats such as HDTV are entered into a conversion system such
as the HDTV STILL STORE 0 manufactured by AVICA.RTM. Technology
Corporation. Such large scale digital buffers and data converters
are capable of converting digital image to all standard formats
such as 1080i HDTV formats such as 720 p, and 1080 p/24. An Asset
Management System server provides powerful local and server back
ups and archiving to standard SCSI devices, C2-level security,
streamlined menu selection and multiple criteria database
searches.
During the process of digitizing images from motion picture film
the mechanical positioning of the film frame in the telecine
machine suffers from an imprecision known as "film weave", which
cannot be fully eliminated. However various film registration and
ironing or flattening gate assemblies are available such as that
embodied in U.S. Pat. No. 5,328,073, Film Registration and Ironing
Gate Assembly, which involves the use of a gate with a positioning
location or aperture for focal positioning of an image frame of a
strip film with edge perforations. Undersized first and second pins
enter a pair of transversely aligned perforations of the film to
register the image frame with the aperture. An undersized third pin
enters a third perforation spaced along the film from the second
pin and then pulls the film obliquely to a reference line extending
between the first and second pins to nest against the first and
second pins the perforations thereat and register the image frame
precisely at the positioning location or aperture. A pair of
flexible bands extending along the film edges adjacent the
positioning location moves progressively into incrementally
increasing contact with the film to iron it and clamp its
perforations against the gate. The pins register the image frame
precisely with the positioning location, and the bands maintain the
image frame in precise focal position. Positioning can be further
enhanced following the precision mechanical capture of images by
methods such as that embodied in U.S. Pat. No. 4,903,131, Method
For The Automatic Correction Of Errors In Image Registration During
Film Scanning.
To remove or reduce the random structure known as grain within
exposed feature film that is superimposed on the image as well as
scratches or particles of dust or other debris which obscure the
transmitted light various algorithms will be used such as that
embodied in U.S. Pat. No. 6,067,125 Structure And Method For Film
Grain Noise Reduction and U.S. Pat. No. 5,784,176,
Method of Image Noise Reduction Processing.
Reverse Editing of the Film Element Preliminary to Visual Database
Creation:
The digital movie is broken down into scenes and cuts. The entire
movie is then processed sequentially for the automatic detection of
scene changes including dissolves, wipe-a-ways and cuts. These
transitions are further broken down into camera pans, camera zooms
and static scenes representing little or no movement. All database
references to the above are entered into an edit decision list
(EDT) within the database based on standard SMPTE time code or
other suitable sequential naming convention. There exists, a great
deal of technologies for detecting dramatic as well as subtle
transitions in film content such as:
U.S. Pat. No. 5,959,697 Sep. 28, 1999 Method And System For
Detecting Dissolve Transitions In A Video Signal
U.S. Pat. No. 5,920,360 Jul. 6, 1999 Method And System For
Detecting Fade Transitions In A Video Signal
U.S. Pat. No. 5,841,512 Nov. 24, 1998 Methods Of Previewing And
Editing Motion Pictures
U.S. Pat. No. 5,835,163 Nov. 10, 1998 Apparatus For Detecting A Cut
In A Video
U.S. Pat. No. 5,767,923 Jun. 16, 1998 Method And System For
Detecting Cuts In A Video Signal
U.S. Pat. No. 5,778,108 Jul. 6, 1996 Method And System For
Detecting Transitional Markers Such As Uniform Fields In A Video
Signal
U.S. Pat. No. 5,920,360 Jun. 7, 1999 Method And System For
Detecting Fade Transitions In A Video Signal
All cuts that represent the same content such as in a dialog
between two or more people where the camera appears to volley
between the two talking heads are combined into one file entry for
later batch processing.
An operator checks all database entries visually to ensure
that:
1. Scenes are broken down into camera moves
2. Cuts are consolidated into single batch elements where
appropriate
3. Motion is broken down into simple and complex depending on
occlusion elements, number of moving objects and quality of the
optics (e.g., softness of the elements, etc).
Pre-Production--scene analysis and scene breakdown for reference
frame ID and data base creation:
Files are numbered using sequential SMPTE time code or other
sequential naming convention. The image files are edited together
at 24-frame/sec speed (without field related 3/2 pull down which is
used in standard NTSC 30 frame/sec video) onto a DVD using
ADOBE.RTM. AFTER EFFECTS.RTM. or similar programs to create a
running video with audio of the feature film or TV series. This is
used to assist with scene analysis and scene breakdown.
Scene and Cut Breakdown:
1. A database permits the entering of scene, cut, design, key frame
and other critical data in time code format as well as descriptive
information for each scene and cut.
2. Each scene cut is identified relative to camera technique. Time
codes for pans, zooms, static backgrounds, static backgrounds with
unsteady or drifting camera and unusual camera cuts that require
special attention.
3. Designers and assistant designers study the feature film for
color clues and color references or for the case of depth projects,
the film is studied for depth clues, generally for non-standard
sized objects. Research is provided for color/depth accuracy where
applicable. The Internet for example may be utilized to determine
the color of a particular item or the size of a particular item.
For depth projects, knowing the size of an object allows for the
calculation of the depth of an item in a scene for example. For
depth projects related to converting two-dimensional movies to
three-dimensional movies where depth metadata is available for
computer-generated elements within the movies, the depth metadata
can be scaled, or translated or otherwise normalized to the
coordinate system or units used for the background and motion
elements for example.
4. Single frames from each scene are selected to serve as design
frames. These frames are color designed or metadata is imported for
depth and/or mask and/or alpha for computer-generated elements, or
depth assignments (see FIGS. 42-70) are made to background elements
or motion elements in the frames to represent the overall look and
feel of the feature film. Approximately 80 to 100 design frames are
typical for a feature film.
5. In addition, single frames called key frames from each cut of
the feature film are selected that contain all the elements within
each cut that require color/depth consideration. There may be as
many as 1,000 key frames. These frames will contain all the
color/depth transform information necessary to apply color/depth to
all sequential frames in each cut without additional color
choices.
Color/Depth Selection:
Historical reference, studio archives and film analysis provides
the designer with color references. Using an input device such as a
mouse, the designer masks features in a selected single frame
containing a plurality of pixels and assigns color to them using an
HSL color space model based on creative considerations and the
grayscale and luminance distribution underlying each mask. One or
more base colors are selected for image data under each mask and
applied to the particular luminance pattern attributes of the
selected image feature. Each color selected is applied to an entire
masked object or to the designated features within the luminance
pattern of the object based on the unique gray-scale values of the
feature under the mask.
A lookup table or color transform for the unique luminance pattern
of the object or feature is thus created which represent the color
to luminance values applied to the object. Since the color applied
to the feature extends the entire range of potential grayscale
values from dark to light the designer can insure that as the
distribution of the gray-scale values representing the pattern
change homogeneously into dark or light regions within subsequent
frames of the movie such as with the introduction of shadows or
bright light, the color for each feature also remains consistently
homogeneous and correctly lighten or darken with the pattern upon
which it is applied.
Depth can be imported for computer-generated objects where metadata
exists and/or can be assigned to objects and adjusted using
embodiments of the invention using an input device such as a mouse
to assign objects particular depths including contour depths, e.g.,
geometric shapes such as an ellipsoid to a face for example. This
allows objects to appear natural when converted to
three-dimensional stereoscopic images. For computer-generated
elements, the imported depth and/or alpha and/or mask shape can be
adjusted if desired. Assigning a fixed distance to foreground
objects tends to make the objects appear as cut-outs, i.e., flat.
See also FIGS. 42-70.
Propagation of Mask Color Transform/Depth Information from One
Frame to a Series of Subsequent Frames:
The masks representing designed selected color transforms/depth
contours in the single design frame are then copied to all
subsequent frames in the series of movie frames by one or more
methods such as auto-fitting Bezier curves to edges, automatic mask
fitting based on Fast Fourier Transforms and Gradient Descent
Calculation tied to luminance patterns in a subsequent frame
relative to the design frame or a successive preceding frames, mask
paint to a plurality of successive frames by painting the object
within only one frame, auto-fitting vector points to edges and
copying and pasting individual masks or a plurality of masks to
selected subsequent frames. In addition, depth information may be
"tweened" to account for forward/backward motion or zooming with
respect to the camera capture location. For computer-generated
elements, the alpha and/or mask data is generally correct and may
be skipped for reshaping processes since the metadata associated
with computer-generated elements is obtained digitally from the
original model of an object and hence does not require adjustment
in general. (See FIG. 37C, step 3710 for setting mask fit location
to border of CG element to potentially skip large amounts of
processing in fitting masks in subsequent frames to reshape the
edges to align a photographic element). Optionally,
computer-generated elements may be morphed or reshaped to provide
special effects not originally in a movie scene.
Single Frame Set Design and Colorization:
In embodiments of the invention, camera moves are consolidated and
separated from motion elements in each scene by the creation of a
montage or composite image of the background from a series of
successive frames into a single frame containing all background
elements for each scene and cut. The resulting single frame becomes
a representation of the entire common background of a multiplicity
of frames in a movie, creating a visual database of all elements
and camera offset information within those frames.
In this manner most set backgrounds can be designed and
colorized/depth enhanced in one pass using a single frame montage.
Each montage is masked without regard to the foreground moving
objects, which are masked separately. The background masks of the
montage are then automatically extracted from the single background
montage image and applied to the subsequent frames that were used
to create the single montage using all the offsets stored in the
image data for correctly aligning the masks to each subsequent
frame.
There is a basic formula in filmmaking that varies little within
and between feature films (except for those films employing
extensive hand-held or stabilized camera shots.) Scenes are
composed of cuts, which are blocked for standard camera moves,
i.e., pans, zooms and static or locked camera angles as well as
combinations of these moves. Cuts are either single occurrences or
a combination of cut-a-ways where there is a return to a particular
camera shot such as in a dialog between two individuals. Such
cut-a-ways can be considered a single scene sequence or single cut
and can be consolidate in one image-processing pass.
Pans can be consolidated within a single frame visual database
using special panorama stitching techniques but without lens
compensation. Each frame in a pan involves:
1. The loss of some information on one side, top and/or bottom of
the frame
2. Common information in the majority of the frame relative to the
immediately preceding and subsequent frames and
3. New information on the other side, top and/or bottom of the
frame.
By stitching these frames together based on common elements within
successive frames and thereby creating a panorama of the background
elements a visual database is created with all pixel offsets
available for referencing in the application of a single mask
overlay to the complete set of sequential frames.
Creation of a Visual Database:
Since each pixel within a single frame visual database of a
background corresponds to an appropriate address within the
respective "raw" (unconsolidated) frame from which it was created,
any designer determined masking operation and corresponding masking
lookup table designation applied to the visual database will be
correctly applied to each pixel's appropriate address within the
raw film frames that were used to create the single frame
composite.
In this manner, sets for each scene and cut are each represented by
a single frame (the visual database) in which pixels have either
single or multiple representations within the series of raw frames
from which they were derived. All masking within a single visual
database frame will create a one-bit mask per region representation
of an appropriate lookup table that corresponds to either common or
unique pixel addresses within the sequential frames that created
the single composite frame. These address-defined masking pixels
are applied to the full resolution frames where total masking is
automatically checked and adjusted where necessary using feature,
edge detection and pattern recognition routines. Where adjustments
are required, i.e., where applied masked region edges do not
correspond to the majority of feature edges within the gray scale
image, a "red flag" exception comment signals the operator that
frame-by-frame adjustments may be necessary.
Single Frame Representation of Motion within Multiple Frames:
The differencing algorithm used for detecting motion objects will
generally be able to differentiate dramatic pixel region changes
that represent moving objects from frame to frame. In cases where
cast shadows on a background from a moving object may be confused
with the moving object the resulting masks will be assigned to a
default alpha layer that renders that part of the moving object
mask transparent. In some cases an operator using one or more
vector or paint tools will designate the demarcation between the
moving object and cast shadow. In most cases however, the cast
shadows will be detected as an extraneous feature relative to the
two key motion objects. In this invention cast shadows are handled
by the background lookup table that automatically adjusts color
along a luminance scale determined by the spectrum of light and
dark gray scale values in the image.
Action within each frame is isolated via differencing or
frame-to-frame subtraction techniques that include vector (both
directional and speed) differencing (i.e., where action occurs
within a pan) as well as machine vision techniques, which model
objects and their behaviors. Difference pixels are then composited
as a single frame (or isolated in a tiling mode) representing a
multiplicity of frames thus permitting the operator to window
regions of interest and otherwise direct image processing
operations for computer controlled subsequent frame masking.
As with the set or background montage discussed above, action
taking place in multiple frames within a scene can be represented
by a single frame visual database in which each unique pixel
location undergoes appropriate one bit masking from which
corresponding lookup tables are applied. However, unlike the set or
background montage in which all color/depth is applied and
designated within the single frame pass, the purpose of creating an
action composite visual data base is to window or otherwise
designate each feature or region of interest that will receive a
particular mask and apply region of interest vectors from one key
frame element to subsequent key frame elements thus provide
operator assistance to the computer processing that will track each
region of interest.
During the design phase, masks are applied to designer designated
regions of interest for a single instance of a motion object
appearing within the background (i.e., a single frame of action
appears within the background or stitched composited background in
the proper x, y coordinates within the background corresponding to
the single frame of action from which it was derived). Using an
input device such as a mouse the operator uses the following tools
in creating the regions of interest for masking. Alternatively,
projects having associated computer-generated element metadata may
import and if necessary, scale the metadata to the units utilized
for depth in the project. Since these masks are digitally created,
they can be assumed to be accurate throughout the scene and thus
the outlines and depths of the computer-generated areas may be
ignored for reshaping operations. Elements that border these
objects, may thus be more accurately reshaped since the outlines of
the computer-generated elements are taken as correct. Hence, even
for computer-generated elements having the same underlying gray
scale of a contiguous motion or background element, the shape of
the mask at the junction can be taken to be accurate even though
there is no visual difference at the junction. Again, see FIG. 37C,
step 3710 for setting mask fit location to border of CG element to
potentially skip large amounts of processing in fitting masks in
subsequent frames to reshape the edges to align a photographic
element
1. A combination of edge detection algorithms such as standard
Laplacian filters and pattern recognition routines
2. Automatic or assisted closing of a regions
3. Automatic seed fill of selected regions
4. Bimodal luminance detection for light or dark regions
5. An operator-assisted sliding scale and other tools create a
"best fit" distribution index corresponding to the dynamic range of
the underlying pixels as well as the underlying luminance values,
pattern and weighted variables
6. Subsequent analysis of underlying gray scale, luminance, area,
pattern and multiple weighting characteristics relative to
immediately surrounding areas creating a unique
determination/discrimination set called a Detector File.
In the pre-production key frame phase--The composited single,
design motion database described above is presented along with all
subsequent motion inclusive of selected key frame motion objects.
All motion composites can be toggled on and off within the
background or viewed in motion within the background by turning
each successive motion composite on and off sequentially.
Key Frame Motion Object Creation: The operator windows all masked
regions of interest on the design frame in succession and directs
the computer by various pointing instruments and routines to the
corresponding location (regions of interest) on selected key frame
motion objects within the visual database thereby reducing the area
on which the computer must operate (i.e., the operator creates a
vector from the design frame moving object to each subsequent key
frame moving object following a close approximation to the center
of the region of interest represented within the visual database of
the key frame moving object. This operator-assisted method
restricts the required detection operations that must be performed
by the computer in applying masks to the corresponding regions of
interest in the raw frames).
In the production phase--The composited key frame motion object
database described above is presented along with all subsequent
motion inclusive of fully masked selected key frame motion objects.
As above, all motion composites can be toggled on and off within
the background or sequentially turned on and off in succession
within the background to simulate actual motion. In addition, all
masked regions (regions of interest) can be presented in the
absence of their corresponding motion objects. In such cases the
one-bit color masks are displayed as either translucent or opaque
arbitrary colors.
During the production process and under operator visual control,
each region of interest within subsequent motion object frames,
between two key motion object frames undergoes a computer masking
operation. The masking operation involves a comparison of the masks
in a preceding motion object frame with the new or subsequent
Detector File operation and underlying parameters (i.e., mask
dimensions, gray scale values and multiple weighting factors that
lie within the vector of parameters in the subsequent key frame
motion object) in the successive frame. This process is aided by
the windowing or pointing (using various pointing instruments) and
vector application within the visual database. If the values within
an operator assisted detected region of the subsequent motion
object falls within the range of the corresponding region of the
preceding motion object, relative to the surrounding values and if
those values fall along a trajectory of values (vectors)
anticipated by a comparison of the first key frame and the second
key frame then the computer will determine a match and will attempt
a best fit.
The uncompressed, high resolution images all reside at the server
level, all subsequent masking operations on the regions of interest
are displayed on the compressed composited frame in display memory
or on a tiled, compressed frame in display memory so that the
operator can determine correct tracking and matching of regions. A
zoomed region of interest window showing the uncompressed region is
displayed on the screen to determine visually the region of
interest best fit. This high-resolution window is also capable of
full motion viewing so that the operator can determine whether the
masking operation is accurate in motion.
In a first embodiment as shown in FIG. 1, a plurality of feature
film or television film frames 14 a-n representing a scene or cut
in which there is a single instance or perceptive of a background
16 (FIG. 3). In the scene 10 shown, several actors or motion
elements 18', 18'' and 18''' are moving within an outdoor stage and
the camera is performing a pan left. FIG. 1 shows selected samples
of the 120 total frames 14 making up the 5-second pan.
In FIG. 2, an isolated background 16 processed scene from the
plurality of frames 14a-n represented in FIG. 1 in which all motion
elements 18 are removed using various subtraction and differencing
techniques. The separate frames that created the pan are combined
into a visual database in which unique and common pixels from each
of the 120 frames 14 composing the original pan are represented in
the single composite background image 12 shown in FIG. 3. The
single background image 12 is then used to create a background mask
overlay 20 representing designer selected color lookup tables in
which dynamic pixel colors automatically compensate or adjust for
moving shadows and other changes in luminance. For depth projects,
any object in the background may be assigned any depth. A variety
of tools may be utilized to perform the assignment of depth
information to any portion of the background including paint tools,
geometric icon based tools that allow setting a contour depth to an
object, or text field inputs to allow for numeric inputs. The
composite background shown in FIG. 2 for example may also have a
ramp function assigned to allow for a nearer depth to be assigned
to the left portion of the scene and a linear increase in depth to
the right of the image to be automatically assigned. See also FIGS.
42-70.
In one illustrative embodiment of this invention, operator assisted
and automated operations are used to detect obvious anchor points
represented by clear edge detected intersects and other contiguous
edges n each frame 14 making up the single composite image 12 and
over laid mask 20. These anchor points are also represented within
the composite image 12 and are used to aide in the correct
assignment of the mark to each frame 14 represented by the single
composite image 12.
Anchor points and objects and/or areas that are clearly defined by
closed or nearly closed edges are designed as a single mask area
and given a single lookup table. Within those clearly delineated
regions polygons are created of which anchor points are dominant
points. Where there is no clear edge detected to create a perfectly
closed region, polygons are generated using the edge of the applied
mask.
The resulting polygon mesh includes the interior of anchor point
dominant regions plus all exterior areas between those regions.
Pattern parameters created by the distribution of luminance within
each polygon are registered in a database for reference when
corresponding polygonal addresses of the overlying masks are
applied to the appropriate addresses of the frames that were used
to create the composite single image 12.
In FIG. 3, a representative sample of each motion object (M-Object)
18 in the scene 10 receives a mask overlay that represents designer
selected color lookup tables/depth assignments in which dynamic
pixel colors automatically compensate or adjust for moving shadows
and other changes in luminance as the M-Object 18 moves within the
scene 10. The representative sample are each considered Key
M-Objects 18 that are used to define the underlying patterns,
edges, grouped luminance characteristics, etc., within the masked
M-Object 18. These characteristics are used to translate the design
masks from one Key M-Object 18a to subsequent M-Objects 18b along a
defined vector of parameters leading to Key M-Object 18c, each
Subsequent M-Object becoming the new Key M-Object in succession as
masks are applied. As shown, Key M-Object 18a may be assigned a
depth of 32 feet from the camera capture point while Key M-Object
18c may be assigned a depth of 28 feet from the camera capture
point. The various depths of the object may be "tweened" between
the various depth points to allow for realistic three-dimensional
motion to occur within the cut without for example requiring wire
frame models of all of the objects in the objects in a frame.
As with the background operations above, operator assisted and
automated operations are used to detect obvious anchor points
represented by clear edge detected intersects and other contiguous
edges in each motion object used to create a keyframe.
Anchor points and specific regions of interest within each motion
object that are clearly defined by closed or nearly closed edges
are designated as a single mask area and given a single lookup
table. Within those clearly delineated regions, polygons are
created of which anchor points are dominant points. Where there is
no clear edge detected to create a perfectly closed region,
polygons are generated using the edge of the applied mask.
The resulting polygon mesh includes the interior of the anchor
point dominant regions plus all exterior areas between those
regions.
Pattern parameters created by the distribution of luminance values
within each polygon are registered in a database for reference when
corresponding polygonal addresses of the overlying masks are
applied to the appropriate addresses of the frames that were used
to create the composite single frame 12.
The greater the polygon sampling the more detailed the assessment
of the underlying luminance values and the more precise the fit of
the overlying mask.
Subsequent or in-between motion key frame objects 18 are processed
sequentially. The group of masks comprising the motion key frame
object remains in its correct address location in the subsequent
frame 14 or in the subsequent instance of the next motion object
18. The mask is shown as an opaque or transparent color. An
operator indicates each mask in succession with a mouse or other
pointing device and along with its corresponding location in the
subsequent frame and/or instance of the motion object. The computer
then uses the prior anchor point and corresponding polygons
representing both underlying luminance texture and mask edges to
create a best fit to the subsequent instance of the motion
object.
The next instance of the motion object 18 is operated upon in the
same manner until all motion objects 18 in a cut 10 and/or scene
are completed between key motion objects.
In FIG. 4, all mask elements of the scene 10 are then rendered to
create a fully colored and/or depth enhanced frame in which
M-Object 18 masks are applied to each appropriate frame in the
scene followed by the background mask 20, which is applied only
where there is no pre-existing mask in a Boolean manner. Foreground
elements are then applied to each frame 14 according to a
pre-programmed priority set. Aiding the accurate application of
background masks 20 are vector points which are applied by the
designer to the visual database at the time of masking where there
are well defined points of reference such as edges and/or distinct
luminance points. These vectors create a matrix of reference points
assuring accuracy of rendering masks to the separate frames that
compose each scene. The applied depths of the various objects
determine the amount of horizontal translation applied when
generating left and right viewpoints as utilized in
three-dimensional viewing as one skilled in the art will
appreciate. In one or more embodiments of the invention, the
desired objects may be dynamically displayed while shifting by an
operator set and observe a realistic depth. In other embodiments of
the invention, the depth value of an object determines the
horizontal shift applied as one skilled in the art will recognize
and which is taught in at least U.S. Pat. No. 6,031,564, to Ma et
al., the specification of which is hereby incorporated herein by
reference.
The operator employs several tools to apply masks to successive
movie frames.
Display: A key frame that includes all motion objects for that
frame is fully masked and loaded into the display buffer along with
a plurality of subsequent frames in thumbnail format; typically 2
seconds or 48 frames.
FIGS. 5A and 5B show a series of sequential frames 14a-n loaded
into display memory in which one frame 14 is fully masked with the
background (key frame) and ready for mask propagation to the
subsequent frames 14 via automatic mask fitting methods.
All frames 14 along with associated masks and/or applied color
transforms/depth enhancements can also be displayed sequentially in
real-time (24 frames/sec) using a second (child) window to
determine if the automatic masking operations are working
correctly. In the case of depth projects, stereoscopic glasses or
red/blue anaglyph glasses may be utilized to view both viewpoints
corresponding to each eye. Any type of depth viewing technology may
be utilized to view depth enhanced images including video displays
that require no stereoscopic glasses yet which utilizes more than
two image pairs which may be created utilizing embodiments of the
invention.
FIGS. 6A and 6B show the child window displaying an enlarged and
scalable single image of the series of sequential images in display
memory. The Child window enables the operator to manipulate masks
interactively on a single frame or in multiple frames during real
time or slowed motion.
Mask Modification: Masks can be copied to all or selected frames
and automatically modified in thumbnail view or in the preview
window. In the preview window mask modification takes place on
either individual frames in the display or on multiple frames
during real-time motion.
Propagation of Masks to Multiple Sequential Frames in Display
Memory: Key Frame masks of foreground motion objects are applied to
all frames in the display buffer using various copy functions:
Copy all masks in one frame to all frames;
Copy all masks in one frame to selected frames;
Copy selected mask or masks in one frame to all frames;
Copy selected mask or masks in one frame to selected frames;
and
Create masks generated in one frame with immediate copy at the same
addresses in all other frames.
Refining now to FIGS. 7A and 7B, a single mask (flesh) is
propagated automatically to all frames 14 in the display memory.
The operator could designate selective frames to apply the selected
mask or indicate that it is applied to all frames 14. The mask is a
duplication of the initial mask in the first fully masked frame.
Modifications of that mask occur only after they have been
propagated.
As shown in FIG. 8, all masks associated with the motion object are
propagated to all sequential frames in display memory. The images
show the displacement of the underlying image data relative to the
mask information.
None of the propagation methods listed above actively fit the masks
to objects in the frames 14. They only apply the same mask shape
and associated color transform information from one frame,
typically the key frame to all other frames or selected frames.
Masks are adjusted to compensate for object motion in subsequent
frames using various tools based on luminance, pattern and edge
characteristics of the image.
Automatic Mask Fitting: Successive frames of a feature film or TV
episode exhibit movement of actors and other objects. These objects
are designed in a single representative frame within the current
embodiment such that operator selected features or regions have
unique color transformations identified by unique masks, which
encompass the entire feature. The purpose of the mask-fitting tool
is to provide an automated means for correct placement and
reshaping of a each mask region of interest (ROI) in successive
frames such that the mask accurately conforms to the correct
spatial location and two dimensional geometry of the ROI as it
displaces from the original position in the single representative
frame. This method is intended to permit propagation of a mask
region from an original reference or design frame to successive
frames, and automatically enabling it to adjust shape and location
to each image displacement of the associated underlying image
feature. For computer-generated elements, the associated masks are
digitally created and can be assumed to be accurate throughout the
scene and thus the outlines and depths of the computer-generated
areas may be ignored for automatic mask fitting or reshaping
operations. Elements that border these objects, may thus be more
accurately reshaped since the outlines of the computer-generated
elements are taken as correct. Hence, even for computer-generated
elements having the same underlying gray scale of a contiguous
motion or background element, the shape of the mask at the junction
can be taken to be accurate even though there is no visual
difference at the junction. Hence, whenever automatic mask fitting
of mask takes shape with a border of a computer-generated element
mask, the computer-generated element mask can be utilized to define
the border of the operator-defined mask as per step 3710 of FIG.
37C. This saves processing time since automatic mask fitting in a
scene with numerous computer-generated element masks can be
minimized.
The method for automatically modifying both the location and
correctly fitting all masks in an image to compensate for movement
of the corresponding image data between frames involves the
following:
Set Reference Frame Mask and Corresponding Image Data:
1. A reference frame (frame 1) is masked by an operator using a
variety of means such as paint and polygon tools so that all
regions of interest (i.e., features) are tightly covered.
2. The minimum and maximum x,y coordinate values of each masked
region are calculated to create rectangular bounding boxes around
each masked region encompassing all underlying image pixels of each
masked region.
3. A subset of pixels are identified for each region of interest
within its bounding rectangle (i.e., every 10th pixel)
Copy Reference Frame Mask and Corresponding Image Data To All
Subsequent Frames: The masks, bounding boxes and corresponding
subset of pixel locations from the reference frame are copied over
to all subsequent frames by the operator.
Approximate Offset of Regions Between Reference Frame and the Next
Subsequent Frame:
1. Fast Fourier Transform (FFT) are calculated to approximate image
data displacements between frame 1 and frame 2
2. Each mask in frame 2 with the accompanying bounding boxes are
moved to compensate for the displacement of corresponding image
data from frame 1 using the FFT calculation.
3. The bounding box is augmented by an additional margin around the
region to accommodate other motion and shape morphing effects.
Fit Masks to the New Location:
1. Using the vector of offset determined by the FFT, a gradient
decent of minimum errors is calculated in the image data underlying
each mask by:
2. Creating a fit box around each pixel within the subset of the
bounding box
3. Calculating a weighed index of all pixels within the fit box
using a bilinear interpolation method.
4. Determining offset and best fit to each subsequent frame use
Gradient Decent calculations to fit the mask to the desired
region.
Mask fit initialization: An operator selects image features in a
single selected frame of a scene (the reference frame) and creates
masks with contain all color transforms (color lookup tables) for
the underlying image data for each feature. The selected image
features that are identified by the operator have well-defined
geometric extents which are identified by scanning the features
underlying each mask for minimum and maximum x, y coordinate
values, thereby defining a rectangular bounding box around each
mask.
The Fit Grid used for Fit Grid Interpolation: For optimization
purposes, only a sparse subset of the relevant mask-extent region
pixels within each bounding box are fit with the method; this
subset of pixels defines a regular grid in the image, as labeled by
the light pixels of FIG. 9A.
The "small dark" pixels shown in FIG. 9B are used to calculate a
weighed index using bilinear interpolation. The grid spacing is
currently set at 10 pixels, so that essentially no more than 1 in
50 pixels are presently fit with a gradient descent search. This
grid spacing could be a user controllable parameter.
Fast Fourier Transform (FFT) to Estimate Displacement Values: Masks
with corresponding rectangular bounding boxes and fit grids are
copied to subsequent frames. Forward and inverse FFTs are
calculated between the reference frame the next subsequent frame to
determine the x,y displacement values of image features
corresponding to each mask and bounding box. This method generates
a correlation surface, the largest value of which provides a "best
fit" position for the corresponding feature's location in the
search image. Each mask and bounding box is then adjusted within
the second frame to the proper x,y locations.
Fit Value Calculation (Gradient Descent Search): The FFT provides a
displacement vector, which directs the search for ideal mask
fitting using the Gradient Descent Search method. Gradient descent
search requires that the translation or offset be less than the
radius of the basin surrounding the minimum of the matching error
surface. A successful FFT correlation for each mask region and
bounding box will create the minimum requirements.
Searching for a Best Fit on the Error Surface: An error surface
calculation in the Gradient Descent Search method involves
calculating mean squared differences of pixels in the square fit
box centered on reference image pixel (x0, y0), between the
reference image frame and the corresponding (offset) location (x,
y) on the search image frame, as shown in FIGS. 10A, B, C and
D.
Corresponding pixel values in two (reference and search) fit boxes
are subtracted, squared, summed/accumulated, and the square-root of
the resultant sum finally divided by the number of pixels in the
box (#pixels=height.times.width=height2) to generate the root mean
square fit difference ("Error") value at the selected fit search
location
Error(x0,y0;x,y)={.SIGMA.i.quadrature..SIGMA..quadrature.(reference
box(x0,y0)pixel[i,j]-search box(x,y)pixel[i,j])2}/(height2)
Fit Value Gradient: The displacement vector data derived from the
FFT creates a search fit location, and the error surface
calculation begins at that offset position, proceeding down
(against) the gradient of the error surface to a local minimum of
the surface, which is assumed to be the best fit. This method finds
best fit for each next frame pixel or groups of pixels based on the
previous frame, using normalized squared differences, for instance
in a 10.times.10 box and finding a minimum down the mean squared
difference gradients. This technique is similar to a cross
correlation but with a restricted sampling box for the calculation.
In this way the corresponding fit pixel in the previous frame can
be checked for its mask index, and the resulting assignment is
complete.
FIGS. 11A, B and C show a second search box derived from a descent
down the error surface gradient (evaluated separately), for which
the evaluated error function is reduced, possibly minimized, with
respect to the original reference box (evident from visual
comparison of the boxes with the reference box in FIGS. 10A, B, C
and D).
The error surface gradient is calculated as per definition of the
gradient. Vertical and horizontal error deviations are evaluated at
four positions near the search box center position, and combined to
provide an estimate of the error gradient for that position. The
gradient component evaluation is explained with the help of FIG.
12.
The gradient of a surface S at coordinate (x, y) is given by the
directional derivatives of the surface:
gradient(x,y)=[dS(x,y)/dx,dS(x,y)/dy],
which for the discrete case of the digital image is provided by:
gradient(x,y)=[(Error(x+dx,y)-Error(x-dx,y))/(2*dx),(Error(x,y+dy)-Error(-
x,y-dy))/(2*dy)]
where dx, dy are one-half the box-width or box-height, also defined
as the fit-box "box-radius":
box-width=box-height=2.times.box-radius+1
Note that with increasing box-radius, the fit-box dimensions
increase and consequently the size and detail of an image feature
contained therein increase as well; the calculated fit accuracy is
therefore improved with a larger box and more data to work with,
but the computation time per fit (error) calculation increases as
the square of the radius increase. For any computer-generated
element mask area pixel that is found at a particular pixel x, y
location, then that location is taken to be the edge of the
overlying operated-defined mask and mask fitting continues at other
pixel locations until all pixels of the mask are checked
Previous vs. Propagated Reference Images: The reference image
utilized for mask fitting is usually an adjacent frame in a film
image-frame sequence. However, it is sometimes preferable to use an
exquisitely fit mask as a reference image (e.g. a key frame mask,
or the source frame from which mask regions were
propagated/copied). The present embodiment provides a switch to
disable "adjacent" reference frames, using the propagated masks of
the reference image if that frame is defined by a recent
propagation event.
The process of mask fitting: In the present embodiment the operator
loads n frames into the display buffer. One frame includes the
masks that are to be propagated and fitted to all other frames. All
or some of the mask(s) are then propagated to all frames in the
display buffer. Since the mask-fitting algorithm references the
preceding frame or the first frame in the series for fitting masks
to the subsequent frame, the first frame masks and/or preceding
masks must be tightly applied to the objects and/or regions of
interest. If this is not done, mask errors will accumulate and mask
fitting will break down. The operator displays the subsequent
frame, adjusts the sampling radius of the fit and executes a
command to calculate mask fitting for the entire frame. The
execution command can be a keystroke or mouse-hotkey command.
As shown in FIG. 13, a propagated mask in the first sequential
instance where there is little discrepancy between the underlying
image data and the mask data. The dress mask and hand mask can be
clearly seen to be off relative to the image data.
FIG. 14 shows that by using the automatic mask fitting routine, the
mask data adjusts to the image data by referencing the underlying
image data in the preceding image.
In FIG. 15, the mask data in later images within the sequence show
marked discrepancy relative to the underlying image data. Eye
makeup, lipstick, blush, hair, face, dress and hand image data are
all displaced relative to the mask data.
As shown in FIG. 16, the mask data is adjusted automatically based
on the underlying image data from the previous mask and underlying
image data. In this FIG. 13, the mask data is shown with random
colors to show the regions that were adjusted automatically based
on underlying pattern and luminance data. The blush and eye makeup
did not have edge data to reference and were auto-adjusted on the
basis of luminance and grayscale pattern.
In FIG. 17, mask data from FIG. 16 is shown with appropriate color
transforms after whole frame automatic mask fitting. The mask data
is adjusted to fit the underlying luminance pattern based on data
from the previous frame or from the initial key frame.
Mask Propagation With Bezier and Polygon Animation Using Edge Snap:
Masks for motion objects can be animated using either Bezier curves
or polygons that enclose a region of interest. A plurality of
frames are loaded into display memory and either Bezier points and
curves or polygon points are applied close to the region of
interest where the points automatically snap to edges detected
within the image data. Once the object in frame one has been
enclosed by the polygon or Bezier curves the operator adjusts the
polygon or Bezier in the last frame of the frames loaded in display
memory. The operator then executes a fitting routine, which snaps
the polygons or Bezier points plus control curves to all
intermediate frames, animating the mask over all frames in display
memory. The polygon and Bezier algorithms include control points
for rotation, scaling and move-all to handle camera zooms, pans and
complex camera moves.
In FIG. 18, polygons are used to outline a region of interest for
masking in frame one. The square polygon points snap to the edges
of the object of interest. Using a Bezier curve the Bezier points
snap to the object of interest and the control points/curves shape
to the edges.
As disclosed in FIG. 19, the entire polygon or Bezier curve is
carried to a selected last frame in the display memory where the
operator adjusts the polygon points or Bezier points and curves
using the snap function, which automatically snaps the points and
curves to the edges of the object of interest.
As shown in FIG. 20, if there is a marked discrepancy between the
points and curves in frames between the two frames where there was
an operator interactive adjustment, the operator will further
adjust a frame in the middle of the plurality of frames where there
is maximum error of fit.
As shown in FIG. 21, when it is determined that the polygons or
Bezier curves are correctly animating between the two adjusted
frames, the appropriate masks are applied to all frames. In these
figures, the arbitrary mask color is seen filling the polygon or
Bezier curves.
FIG. 22 shows the resulting masks from a polygon or Bezier
animation with automatic point and curve snap to edges. The brown
masks are the color transforms and the green masks are the
arbitrary color masks. For depth projects, areas that have been
depth assigned may be of one color while those areas that have yet
to be depth assigned may be of another color for example.
Colorization/Depth Enhancement of Backgrounds in feature films and
television episode: The process of applying mask information to
sequential frames in a feature film or television episode is known,
but is laborious for a number of reasons. In all cases, these
processes involve the correction of mask information from frame to
frame to compensate for the movement of underlying image data. The
correction of mask information not only includes the re-masking of
actors and other moving objects within a scene or cut but also
correction of the background and foreground information that the
moving objects occlude or expose during their movement. This has
been particularly difficult in camera pans where the camera follows
the action to the left, right, up or down in the scene cut. In such
cases the operator must not only correct for movement of the motion
object, the operator must also correct for occlusion and exposure
of the background information plus correct for the exposure of new
background information as the camera moves to new parts of the
background and foreground. Typically these instances greatly
increase the time and difficulty factor of colorizing a scene cut
due to the extreme amount of manual labor involved. Embodiments of
the invention include a method and process for automatically
colorizing/depth enhancing a plurality of frames in scenes cuts
that include complex camera movements as well as scene cuts where
there is camera weave or drifting cameras movement that follows
erratic action of the motion objects.
Camera Pans: For a pan camera sequence, the background associated
with non-moving objects in a scene form a large part of the
sequence. In order to colorize/depth enhance a large amount of
background objects for a pan sequence, a mosaic that includes the
background objects for an entire pan sequence with moving objects
removed is created. This task is accomplished with a pan background
stitcher tool. Once a background mosaic of the pan sequence is
generated, it can be colorized/depth enhanced once and applied to
the individual frames automatically, without having to manually
colorize/depth assign the background objects in each frame of the
sequence.
The pan background stitcher tool generates a background image of a
pan sequence using two general operations. First, the movement of
the camera is estimated by calculating the transformation needed to
align each frame in the sequence with the previous frame. Since
moving objects form a large portion of cinematic sequences,
techniques are used that minimize the effects of moving objects on
the frame registration. Second, the frames are blended into a final
background mosaic by interactively selecting two pass blending
regions that effectively remove moving objects from the final
mosaic.
Background composite output data includes a greyscale/(or possibly
color for depth projects) image file of standard digital format
such as TIFF image file (bkg.*.tif) comprised of a background image
of the entire pan shot, with the desired moving objects removed,
ready for color design/depth assignments using the masking
operations already described, and an associated background text
data file needed for background mask extraction after associated
background mask/colorization/depth data components (bkg.*.msk,
bkg.*lut, . . . ) have been established. The background text data
file provides filename, frame position within the mosaic, and other
frame-dimensioning information for each constituent (input) frame
associated with the background, with the following per line (per
frame) content: Frame-filename, frame-x-position, frame-y-position,
frame-width, frame-height, frame-left-margin-x-max,
frame-right-margin-x-min. Each of the data fields are integers
except for the first (frame-filename), which is a string.
Generating Transforms: In order to generate a background image for
a pan camera sequence, the motion of the camera first is
calculated. The motion of the camera is determined by examining the
transformation needed to bring one frame into alignment with the
previous frame. By calculating the movement for each pair of
consecutive frames in the sequence, a map of transformations giving
each frame's relative position in the sequence can be
generated.
Translation Between Image Pairs: Most image registration techniques
use some form of intensity correlation. Unfortunately, methods
based on pixel intensities will be biased by any moving objects in
the scene, making it difficult to estimate the movement due to
camera motion. Feature based methods have also been used for image
registration. These methods are limited by the fact that most
features occur on the boundaries of moving objects, also giving
inaccurate results for pure camera movement. Manually selecting
feature points for a large number of frames is also too costly.
The registration method used in the pan stitcher uses properties of
the Fourier transform in order to avoid bias towards moving objects
in the scene. Automatic registration of frame pairs is calculated
and used for the final background image assembly.
Fourier Transform of an Image Pair: The first step in the image
registration process consists of taking the Fourier transform of
each image. The camera motion can be estimated as a translation.
The second image is translated by a certain amount given by:
I.sub.2(x,y)=I.sub.1(x-x.sub.0,y-y.sub.0). (1)
Taking the Fourier transform of each image in the pair yields the
following relationship:
F.sub.2(.alpha.,.beta.)=e.sup.-j2.pi.(.alpha.x.sup.0.sup.-.beta.y.sup.0.s-
up.)F.sub.1(.alpha.,.beta.). (2)
Phase Shift Calculation: The next step involves calculating the
phase shift between the images. Doing this results in an expression
for the phase shift in terms of the Fourier transform of the first
and second image:
e.times..times..pi..alpha..times..times..beta..times..times.
##EQU00001##
Inverse Fourier Transform
By taking the inverse Fourier transform of the phase shift
calculation given in (3) results in delta function whose peak is
located at the translation of the second image.
.delta..function..function.e.times..times..pi..alpha..times..times..beta.-
.times..times..function. ##EQU00002##
Peak Location: The two-dimensional surface that results from (4)
will have a maximum peak at the translation point from the first
image to the second image. By searching for the largest value in
the surface, it is simple to find the transform that represents the
camera movement in the scene. Although there will be spikes present
due to moving objects, the dominant motion of the camera should
represent the largest peak value. This calculation is performed for
every consecutive pair of frames in the entire pan sequence.
Dealing with Image Noise: Unfortunately, spurious results can occur
due to image noise which can drastically change the results of the
transform calculation. The pan background stitcher deals with these
outliers using two methods that detect and correct erroneous cases:
closest peak matching and interpolated positions. If these
corrections fail for a particular image pair, the stitching
application has an option to manually correct the position of any
pair of frames in the sequence.
Closest Matching Peak: After the transform is calculated for an
image pair, the percent difference between this transform and the
previous transform is determined. If the difference is higher than
a predetermined threshold than a search for neighboring peaks is
done. If a peak is found that is a closer match and below the
difference threshold, then this value is used instead of the
highest peak value.
This assumes that for a pan camera shot, the motion with be
relatively steady, and the differences between motions for each
frame pair will be small. This corrects for the case where image
noise may cause a peak that is slightly higher that the true peak
corresponding to the camera transformation.
Interpolating Positions: If the closest matching peak calculation
fails to yield a reasonable result given by the percent difference
threshold, then the position is estimated based on the result from
the previous image pair. Again, this gives generally good results
for a steady pan sequence since the difference between consecutive
camera movements should be roughly the same. The peak correlation
values and interpolated results are shown in the stitching
application, so manual correction can be done if needed.
Generating the Background: Once the relative camera movement for
each consecutive frame pair has been calculated, the frames can be
composited into a mosaic which represents the entire background for
the sequence. Since the moving objects in the scene need to be
removed, different image blending options are used to effectively
remove the dominant moving objects in the sequence.
Assembling the Background Mosaic: First a background image buffer
is generated which is large enough to span the entire sequence. The
background can be blended together in a single pass, or if moving
objects need to be removed, a two-pass blend is used, which is
detailed below. The position and width of the blend can be edited
in the stitching application and can be set globally set or
individually set for each frame pair. Each blend is accumulated
into the final mosaic and then written out as a single image
file.
Two Pass Blending: The objective in two-pass blending is to
eliminate moving objects from the final blended mosaic. This can be
done by first blending the frames so the moving object is
completely removed from the left side of the background mosaic. An
example is shown in FIG. 23, where the character can is removed
from the scene, but can still be seen in the right side of the
background mosaic. FIG. 23. In the first pass blend shown in FIG.
23, the moving character is shown on the stairs to the right
A second background mosaic is then generated, where the blend
position and width is used so that the moving object is removed
from the right side of the final background mosaic. An example of
this is shown in FIG. 24, where the character can is removed from
the scene, but can still be seen the left side of the background
mosaic. In the second pass blend as shown in FIG. 24, the moving
character is shown on the left.
Finally, the two-passes are blended together to generate the final
blended background mosaic with the moving object removed from the
scene. The final background corresponding to FIGS. 23 and 24 is
shown in FIG. 25. As shown in FIG. 25, the final blended background
with moving character is removed.
In order to facilitate effective removal of moving objects, which
can occupy different areas of the frame during a pan sequence, the
stitcher application has on option to interactively set the
blending width and position for each pass and each frame
individually or globally. An example screen shot from the blend
editing tool, showing the first and second pass blend positions and
widths, can be seen in FIG. 26, which is a screen shot of the
blend-editing tool.
Background Text Data Save: An output text data file containing
parameter values relevant for background mask extraction as
generated from the initialization phase described above. As
mentioned above, each text data record includes: Frame-filename
frame-x-position frame-y-position frame-width frame-height
frame-left-margin-x-max frame-right-margin-x-min.
The output text data filename is composed from the first composite
input frame rootname by prepending the "bkg." prefix and appending
the ".txt" extension.
EXAMPLE
Representative lines output text data file called "bkga.00233.txt"
that may include data from 300 or more frames making up the blended
image
4.00233.tif 0 0 1436 1080 0 1435
4.00234.tif 7 0 1436 1080 0 1435
4.00235.tif 20 0 1436 1080 0 1435
4.00236.tif 37 0 1436 1080 0 1435
4.00237.tif 58 0 1436 1080 0 1435
Image offset information used to create the composite
representation of the series of frames is contained within a text
file associated with the composite image and used to apply the
single composite mask to all the frames used to create the
composite image.
In FIG. 27, sequential frames representing a camera pan are loaded
into memory. The motion object (butler moving left to the door) has
been masked with a series of color transform information leaving
the background black and white with no masks or color transform
information applied. Alternatively for depth projects, the motion
object may be assigned a depth and/or depth shape. See FIGS.
42-70.
In FIG. 28, six representative sequential frames of the pan above
are displayed for clarity.
FIG. 29 show the composite or montage image of the entire camera
pan that was built using phase correlation techniques. The motion
object (butler) included as a transparency for reference by keeping
the first and last frame and averaging the phase correlation in two
directions. The single montage representation of the pan is color
designed using the same color transform masking techniques as used
for the foreground object.
FIG. 30 shows that the sequence of frames in the camera pan after
the background mask color transforms the montage has been applied
to each frame used to create the montage. The mask is applied where
there is no pre-existing mask thus retaining the motion object mask
and color transform information while applying the background
information with appropriate offsets. Alternatively for depth
projects, the left and right eye views of each frame may be shown
as pairs, or in a separate window for each eye for example.
Furthermore, the images may be displayed on a three-dimensional
viewing display as well.
In FIG. 31, a selected sequence of frames in the pan for clarity
after the color background/depth enhanced background masks have
been automatically applied to the frames where there is no
pre-existing masks.
Static and drifting camera shots: Objects which are not moving and
changing in a film scene cut can be considered "background"
objects, as opposed to moving "foreground" objects. If a camera is
not moving throughout a sequence of frames, associated background
objects appear to be static for the sequence duration, and can be
masked and colorized only once for all associated frames. This is
the "static camera" (or "static background") case, as opposed to
the moving (e.g. panning) camera case, which requires stitching
tool described above to generate a background composite.
Cuts or frame sequences involving little or no camera motion
provide the simplest case for generating frame-image background
"composites" useful for cut background colorization. However, since
even a "static" camera experiences slight vibrations for a variety
of reasons, the static background composition tool cannot assume
perfect pixel alignment from frame-to-frame, requiring an
assessment of inter-frame shifts, accurate to 1 pixel, in order to
optimally associated pixels between frames prior to adding their
data contribution into the composite (an averaged value). The
Static Background Composite tool provides this capability,
generating all the data necessary to later colorize and extract
background colorization information for each of the associated
frames.
Moving foreground objects such as actors, etc., are masked leaving
the background and stationary foreground objects unmasked. Wherever
the masked moving object exposes the background or foreground the
instance of background and foreground previously occluded is copied
into the single image with priority and proper offsets to
compensate for movement. The offset information is included in a
text file associated with the single representation of the
background so that the resulting mask information can be applied to
each frame in the scene cut with proper mask offsets.
Background composite output data uses a greyscale TIFF image file
(bkg.*.tif) that includes averaged input background pixel values
lending itself to colorization/depth enhancement, and an associated
background text data file required for background mask extraction
after associated background mask/colorization data/depth
enhancement components (bkg.*.msk, bkg.*.lut, . . . ) have been
established. Background text data provides filename, mask-offset,
and other frame-dimensioning information for each constituent
(input) frame associated with the composite, with the following per
line (per frame) format: Frame-filename frame-x-offset
frame-y-offset frame-width frame-height frame-left-margin-x-max
frame-right-margin-x-min. Each of these data fields are integers
except for the first (frame-filename), which is a string.
Initialization: Initialization of the static background composition
process involves initializing and acquiring the data necessary to
create the composited background image-buffer and -data. This
requires a loop over all constituent input image frames. Before any
composite data initialization can occur, the composite input frames
must be identified, loaded, and have all foreground objects
identified/colorized (i.e. tagged with mask labels, for exclusion
from composite). These steps are not part of the static background
composition procedure, but occur prior to invoking the composite
tool after browsing a database or directory tree, selecting and
loading relevant input frames, painting/depth assigning the
foreground objects.
Get Frame Shift: Adjacent frames' image background data in a static
camera cut may exhibit small mutual vertical and horizontal
offsets. Taking the first frame in the sequence as a baseline, all
successive frames' background images are compared to the first
frames', fitting line-wise and column-wise, to generate two
histograms of "measured" horizontal and vertical offsets, from all
measurable image-lines and -columns. The modes of these histograms
provide the most frequent (and likely) assessed frame offsets,
identified and stored in arrays DVx[iframe], DVy[iframe] per frame
[iframe]. These offset arrays are generated in a loop over all
input frames.
Get Maximum Frame Shift: While looping over input frames during
initialization to generate the DVx[ ], DVy[ ] offset array data,
the absolute maximum DVxMax, DVyMax values are found from the DVx[
], DVy[ ] values. These are required when appropriately
dimensioning the resultant background composite image to
accommodate all composited frames' pixels without clipping.
Get Frame Margin: While looping over input frames during
initialization, an additional procedure is invoked to find the
right edge of the left image margin as well as the left edge of the
right image margin. As pixels in the margins have zero or near-zero
values, the column indexes to these edges are found by evaluating
average image-column pixel values and their variations. The edge
column-indexes are stored in arrays lMarg[iframe] and rMarg[iframe]
per frame [iframe], respectively.
Extend Frame Shifts with Maximum: The Frame Shifts evaluated in the
GetFrameShift( ) procedure described are relative to the "baseline"
first frame of a composited frame sequence, whereas the sought
frame shift values are shifts/offsets relative to the resultant
background composite frame. The background composite frame's
dimensions equal the first composite frame's dimensions extended by
vertical and horizontal margins on all sides with widths DVxMax,
DVyMax pixels, respectively. Frame offsets must therefore include
margin widths relative to the resultant background frame, and
therefore need to be added, per iframe, to the calculated offset
from the first frame: DVx[iframe]=DVx[iframe]+DVxMax
DVy[iframe]=DVy[iframe]+DVyMax
Initialize Composite Image: An image-buffer class object instance
is created for the resultant background composite. The resultant
background composite has the dimensions of the first input frame
increased by 2*DVxMax (horizontally) and 2*DVyMax (vertically)
pixels, respectively. The first input frame background image pixels
(mask-less, non-foreground pixels) are copied into the background
image buffer with the appropriate frame offset. Associated pixel
composite count buffer values are initialized to one (1) for pixels
receiving an initialization, zero (0) otherwise. See FIG. 38A for
the flow of the processing for extracting a background, which
occurs by generating a frame mask for all frames of a scene for
example. FIG. 38B illustrations the determination of the amount of
Frame shift and margin that is induced for example by a camera pan.
The composite image is saved after determining and overlaying the
shifted images from each of the desired frames for example.
FIG. 39A shows the edgeDetection and determination of points to
snap to (1.1 and 1.2 respectively), which are detailed in FIGS. 39B
and 39C respectively and which enable one skilled in the art to
implement a image edge detection routine via Average Filter,
Gradient Filter, Fill Gradient Image and a comparison with a
Threshold. In addition, the GetSnapPoint routine of FIG. 39C shows
the determination of a NewPoint based on the BestSnapPoint as
determined by the RangeImage less than MinDistance as shown.
FIGS. 40A-C shows how a bimodal threshold tool is implemented in
one or more embodiments of the invention. Creation of an image of
light and dark cursor shape is implemented with the MakeLightShape
routine wherein the light/dark values for the shape are applied
with the respective routine as shown at the end of FIG. 40A. These
routines are shown in FIGS. 40C and 40B respectively. FIGS. 41A-B
show the calculation of FitValues and gradients for use in one or
more of the above routines.
Composite Frame Loop: Input frames are composited (added)
sequentially into the resultant background via a loop over the
frames. Input frame background pixels are added into the background
image buffer with the relevant offset (DVx[iframe], DVy[iframe])
for each frame, and associated pixel composite count values are
incremented by one (1) for pixels receiving a composite addition (a
separate composite count array/buffer is provided for this). Only
background pixels, those without an associated input mask index,
are composited (added) into the resultant background; pixels with
nonzero (labeled) mask values are treated as foreground pixels and
are therefore not subject to composition into the background; thus
they are ignored. A status bar in the Gill is incremented per pass
through the input frame loop.
Composite Finish: The final step in generating the output composite
image buffer requires evaluating pixel averages which constitute
the composite image. Upon completion of the composite frame loop, a
background image pixel value represents the sum of all contributing
aligned input frame pixels. Since resultant output pixels must be
an average of these, division by a count of contributing input
pixels is required. The count per pixel is provided by the
associated pixel composite count buffer, as mentioned. All pixels
with nonzero composite counts are averaged; other pixels remain
zero.
Composite Image Save: A TIFF format output gray-scale image with 16
bits per pixel is generated from composite-averaged background
image buffer. The output filename is composed from the first
composite input frame filename by pre-pending the "bkg." prefix
(and appending the usual ".tif` image extension if required), and
writing to the associated background folder at path "../Bckgrnd
Frm", if available, otherwise to the default path (same as input
frames').
Background Text Data Save: An output text data file containing
parameter values relevant for background mask extraction as
generated from the initialization phase described in (40A-C). As
mentioned in the introduction (see FIG. 39A), each text data record
consists of: Frame-filename frame-x-offset frame-y-offset
frame-width frame-height frame-left-margin-x-max
frame-right-margin-x-min.
The output text data filename is composed from the first composite
input frame rootname by prepending the "bkg." prefix and appending
the ".txt" extension, and writing to the associated background
folder at path "../Bckgrnd Frm", if available, otherwise to the
default path (same as input frames').
EXAMPLE
A complete output text data file called "bkg.02.00.06.02.txt":
C:\NewYolder\Static_Backgrounding_Test\02.00.06.02.tif 1 4 1920
1080 0 1919
C:\New_Folder\Static_Backgrounding_Test\02.00.06.03.tif 1 4 1920
1080 0 1919
C:\New_Folder\Static_Backgrounding_Test\02.00.06.04.tif 1 3 1920
1080 0 1919
C:\New_Folder\Static_Backgrounding_Test\02.00.06.05.tif 2 3 1920
1080 0 1919
C:\New_Folder\Static_Backgrounding_Test\02.00.06.06.tif 1 3 1920
1080 0 1919
Data Cleanup: Releases memory allocated to data objects used by the
static background composite procedure. These include the background
composite GUI dialog object and its member arrays DVx[ ], DVy[ ],
lMarg[ ], rMarg[ ], and the background composite image buffer
object, whose contents have previously been saved to disk and are
no longer needed.
Colorization/Depth Assignment of the Composite Background
Once the background is extracted as described above the single
frame can be masked by an operator with.
The offset data for the background composite is transferred to the
mask data overlaying the background such that the mask for each
successive frame used to create the composite is placed
appropriately.
The background mask data is applied to each successive frame
wherever there are no pre-existing masks (e.g. the foreground
actors).
FIG. 32 shows a sequence of frames in which all moving objects
(actors) are masked with separate color transforms/depth
enhancements.
FIG. 33 shows a sequence of selected frames for clarity prior to
background mask information. All motion elements have been fully
masked using the automatic mask-fitting algorithm.
FIG. 34 shows the stationary background and foreground information
minus the previously masked moving objects. In this case, the
single representation of the complete background has been masked
with color transforms in a manner similar to the motion objects.
Note that outlines of removed foreground objects appear truncated
and unrecognizable due to their motion across the input frame
sequence interval, i.e., the black objects in the frame represent
areas in which the motion objects (actors in this case) never
expose the background and foreground, i.e., missing background
image data 3401. The black objects are ignored for
colorization-only projects during the masking operation because the
resulting background mask is later applied to all frames used to
create the single representation of the background only where there
is no pre-existing mask. For depth related projects, the black
objects where missing background image data 3401 exists, may
artistically or realistically rendered, for example to fill in
information to be utilized in the conversion of two-dimensional
images into three-dimensional images. Since these areas are areas
where pixels may not be borrowed from other frames since they are
never exposed in a scene, drawing them or otherwise creating
believable images there, allows for all background information to
be present and used for artifact free two-dimensional to
three-dimensional conversion. For example, in order to create
artifact-free three-dimensional image pairs from a two-dimensional
image having areas that are never exposed in a scene, backgrounds
having all or enough required information for the background areas
that are always occluded may be generated. The missing background
image data 3401 may be painted, drawn, created, computer-generated
or otherwise obtained from a studio for example, so that there is
enough information in a background, including the black areas to
translate foreground objects horizontally and borrow generated
background data for the translated edges for occluded areas. This
enables the generation of artifact free three-dimensional image
pairs since translation of foreground objects horizontally, which
may expose areas that are always occluded in a scene, results in
the use of the newly created background data instead of stretching
objects or morphing pixels which creates artifacts that are human
detectable errors. Hence, obtaining backgrounds with occluded areas
filled in, either partially with enough horizontal realistic image
data or fully with all occluded areas rendered into a realistic
enough looking area, i.e., drawn and colorized and/or depth
assigned, thus results in artifact free edges for depth enhanced
frames. See also FIGS. 70 and 71-76 and the associated description
respectively. Generation of missing background data may also be
utilized to create artifact free edges along computer-generated
elements as well.
FIG. 35 shows the sequential frames in the static camera scene cut
after the background mask information has been applied to each
frame with appropriate offsets and where there is no pre-existing
mask information.
FIG. 36 shows a representative sample of frames from the static
camera scene cut after the background information has been applied
with appropriate offsets and where there is no pre-existing mask
information.
Colorization Rendering: After color processing is completed for
each scene, subsequent or sequential color motion masks and related
lookup tables are combined within 24-bit or 48-bit RGB color space
and rendered as TIF or TGA files. These uncompressed,
high-resolution images are then rendered to various media such as
HDTV, 35 mm negative film (via digital film scanner), or a variety
of other standard and non standard video and film formats for
viewing and exhibit.
Process Flow:
Digitization, Stabilization and Noise Reduction:
1. 35 mm film is digitized to 1920.times.1080.times.10 in any one
of several digital formats.
2. Each frame undergoes standard stabilization techniques to
minimize natural weaving motion inherent in film as it traverses
camera sprockets as well as any appropriate digital telecine
technology employed. Frame-differencing techniques are also
employed to further stabilize image flow.
3. Each frame then undergoes noise reduction to minimize random
film grain and electronic noise that may have entered into the
capture process.
Pre-Production Movie Dissection Into Camera Elements and Visual
Database Creation:
1. Each scene of the movie is broken down into background and
foreground elements as well as movement objects using various
subtraction, phase correlation and focal length estimation
algorithms. Background and foreground elements may include
computer-generated elements or elements that exist in the original
movie footage for example.
2. Backgrounds and foreground elements m pans are combined into a
single frame using uncompensated (lens) stitching routines.
3. Foregrounds are defined as any object and/or region that move in
the same direction as the background but may represent a faster
vector because of its proximity to the camera lens. In this method
pans are reduced to a single representative image, which contains
all of the background and foreground information taken from a
plurality of frames.
4. Zooms are sometimes handled as a tiled database in which a
matrix is applied to key frames where vector points of reference
correspond to feature points in the image and correspond to feature
points on the applied mask on the composited mask encompassing any
distortion.
5. A database is created from the frames making up the single
representative or composited frame (i.e., each common and novel
pixel during a pan is assigned to the plurality of frames from
which they were derived or which they have in common).
6. In this manner, a mask overlay representing an underlying lookup
table will be correctly assigned to the respective novel and common
pixel representations of backgrounds and foregrounds in
corresponding frames.
Pre-Production Design Background Design:
1. Each entire background is colorized/depth assigned as a single
frame in which all motion objects are removed. Background masking
is accomplished using a routine that employs standard paint, fill,
digital airbrushing, transparency, texture mapping, and similar
tools. Color selection is accomplished using a 24-bit color lookup
table automatically adjusted to match the density of the underlying
gray scale and luminance. Depth assignment is accomplished via
assigning depths, assigning geometric shapes, entry of numeric
values with respect to objects, or in any other manner in the
single composite frame. In this way creatively selected
colors/depths are applied that are appropriate for mapping to the
range of gray scale/depth underlying each mask. The standard color
wheel used to select color ranges detects the underlying grayscale
dynamic range and determines the corresponding color range from
which the designer may choose (i.e., only from those color
saturations that will match the grayscale luminance underlying the
mask.)
2. Each lookup table allows for a multiplicity of colors applied to
the range of gray scale values underlying the mask. The assigned
colors will automatically adjust according to luminance and/or
according to pre-selected color vectors compensating for changes in
the underlying gray scale density and luminance.
Pre-Production Design Motion Element Design:
1. Design motion object frames are created which include the entire
scene background as well as a single representative moment of
movement within the scene in which all characters and elements
within the scene are present. These moving non-background elements
are called Design Frame Objects (DFO).
2. Each DFO is broken down into design regions of interest (regions
of interest) with special attention focused on contrasting elements
within the DFOs that can be readily be isolated using various gray
scale and luminance analyses such as pattern recognition and or
edge detection routines. As existing color movies may be utilized
for depth enhancement, regions of interest may be picked with color
taken into account.
3. The underlying gray scale- and luminance distribution of each
masked region is displayed graphically as well as other gray scale
analyses including pattern analysis together with a graphical
representation of the region's shape with area, perimeter and
various weighting parameters.
4. Color selection is determined for each region of interest
comprising each object based on appropriate research into the film
genre, period, creative intention, etc. and using a 24 bit color
lookup table automatically adjusted to match the density of the
underlying gray scale and luminance suitable and creatively
selected colors are applied. The standard color wheel detects the
underlying grayscale range and restricts the designer to choose
only from those color saturations that will match the grayscale
luminance underlying the mask. Depth assignments may be made or
adjusted for depth projects until realistic depth is obtained for
example.
5. This process continues until a reference design mask is created
for all objects that move in the scene.
Pre-Production Design Key Frame Objects Assistant Designer:
1. Once all color selection/depth assignment is generally completed
for a particular scene the design motion object frame is then used
as a reference to create the larger number of key frame objects
within the scene.
2. Key Frame Objects (all moving elements within the scene such as
people, cars, etc that do not include background elements) are
selected for masking.
3. The determining factor for each successive key frame object is
the amount of new information between one key frame and the next
key frame object.
Method of Colorizing/Depth Enhancing Motion Elements in Successive
Frames:
1. The Production Colorist (operator) loads a plurality of frames
into the display buffer.
2. One of the frames in the display buffer will include a key frame
from which the operator obtains all masking information. The
operator makes no creative or color/depth decisions since all color
transform information is encoded within the key frame masks.
3. The operator can toggle from the colorized or applied lookup
tables to translucent masks differentiated by arbitrary but highly
contrasting colors.
4. The operator can view the motion of all frames in the display
buffer observing the motion that occurs in successive frames or
they can step through the motion from one key frame to the
next.
5. The operator propagates (copies) the key frame mask information
to all frames in the display buffer.
6. The operator then executes the mask fitting routine on each
frame successively. FIG. 37A shows the mask fitting generally
processing flow chart that is broken into subsequent detailed flow
charts 37B and 37C. The program makes a best fit based on the
grayscale/luminance, edge parameters and pattern recognition based
on the gray scale and luminance pattern of the key frame or the
previous frame in the display. For computer-generated elements, the
mask fitting routines are skipped since the masks or alphas define
digitally created (and hence non-operator-defined) edges that
accurately define the computer-generated element boundaries. Mask
fitting operations take into account the computer-generated element
masks or alphas and stop when hitting the edge of a
computer-generated element mask since these boundaries are accepted
as accurate irrespective of grey-scale as per step 3710 of FIG.
37C. This enhances the accuracy of mask edges and reshapes when
colors of a computer-generated element and operator-defined mask
are of the same base luminance for example. As shown in FIG. 37A,
the Mask Fit initializes the region and fit grid parameters, then
calls the Calculate fit grid routine and then the Interpolate mask
on fit grid routine, which execute on any computer as described
herein, wherein the routines are specifically configured to
calculate fit grids as specified in FIGS. 37B and 37C. The flow of
processing of FIG. 37B from the Initialize region routine, to the
initialization of image line and image column and reference image
flows into the CalculateFitValue routine which calls the fit
gradient routine which in turn calculates xx, and yy as the
difference between the xfit, yfit and gradients for x and y. If the
FitValue is greater than the fit, for x, y and xx and yy, then the
xfit and yfit values are stored in the FitGrid. Otherwise,
processing continues back at the fit gradient routine with new
values for xfit and yfit. When the processing for the size of the
Grid is complete for x and y, then the mask is interpolated as per
FIG. 37C. After initialization, the indices i and j for the
FitGridCell are determined and a bilinear interpolation is
performed at the fitGridA-D locations wherein the Mask is fit up to
any border found for any CG element at 3710 (i.e., for a known
alpha border or border with depth values for example that define a
digitally rendered element that is taken as a certified correct
mask border). The mask fitting interpolation is continued up to the
size of the mask defined by xend and vend.
7. In the event that movement creates large deviations in regions
from one frame to the next the operator can select individual
regions to mask-fit. The displaced region is moved to the
approximate location of the region of interest where the program
attempts to create a best fit. This routine continues for each
region of interest in succession until all masked regions have been
applied to motion objects in all sequential frames in the display
memory.
a. The operator clicks on a single mask in each successive frame on
the corresponding area where it belongs in frame 2. The computer
makes a best fit based on the grayscale/luminance, edge parameters,
gray scale pattern and other analysis.
b. This routine continues for each region in succession until all
regions of interest have been repositioned in frame two.
c. The operator then indicates completion with a mouse click and
masks in frame two are compared with gray scale parameters in frame
three.
d. This operation continues until all motion in all frames between
two or more key frames is completely masked.
8. Where there is an occlusion, a modified best-fit parameter is
used. Once the occlusion is passed, the operator uses the
pre-occlusion frame as a reference for the post occlusion
frames.
9. After all motion is completed, the background/set mask is
applied to each frame in succession. Application is: apply mask
where no mask exists.
10. Masks for motion objects can also be animated using either
Bezier curves or polygons that enclose a region of interest.
a. A plurality of frames are loaded into display memory and either
Bezier points and curves of polygon points are applied close to the
region of interest where the points automatically snap to edges
detected within the image data.
b. Once the object in frame one has been enclosed by the polygon or
Bezier curves the operator adjusts the polygon or Bezier in the
last frame of the frames loaded in display memory.
c. The operator then executes a fitting routine, which snaps the
polygons or Bezier points plus control curves to all intermediate
frames, animating the mask over all frames in display memory.
d. The polygon and Bezier algorithms include control points for
rotation, scaling and move-all to handle zooms, pans and complex
camera moves where necessary.
FIG. 42 shows two image frames that are separated in time by
several frames, of a person levitating a crystal ball wherein the
various objects in the image frames are to be converted from
two-dimensional objects to three-dimensional objects. As shown the
crystal ball moves with respect to the first frame (shown on top)
by the time that the second frame (shown on the bottom) occurs. As
the frames are associated with one another, although separated in
time, much of the masking information can be utilized for both
frames, as reshaped using embodiments of the invention previously
described above. For example, using the mask reshaping techniques
described above for colorization, i.e., using the underlying
grey-scale for tracking and reshaping masks, much of the labor
involved with converting a two-dimensional movie to a
three-dimensional movie is eliminated. This is due to the fact that
once key frames have color or depth information applied to them,
the mask information can be propagated automatically throughout a
sequence of frames which eliminates the need to adjust wire frame
models for example. Although there are only two images shown for
brevity, these images are separated by several other images in time
as the crystal ball slowly moves to the right in the sequence of
images.
FIG. 43 shows the masking of the first object in the first image
frame that is to be converted from a two-dimensional image to a
three-dimensional image. In this figure, the first object masked is
the crystal ball. There is no requirement to mask objects in any
order. In this case a simple free form drawing tool is utilized to
apply a somewhat round mask to the crystal ball. Alternatively, a
circle mask may be dropped on the image and resized and translated
to the correct position to correspond to the round crystal ball.
However, since most objects masked are not simple geometric shapes,
the alternative approach is shown herein. The grey-scale values of
the masked object are thus utilized to reshape the mask in
subsequent frames.
FIG. 44 shows the masking of the second object in the first image
frame. In this figure, the hair and face of the person behind the
crystal ball are masked as the second object using a free form
drawing tool. Edge detection or grey-scale thresholds can be
utilized to accurately set the edges of the masks as has been
previously described above with respect to colorization. There is
no requirement that an object be a single object, i.e., the hair
and face of a person can be masked as a single item, or not and
depth can thus be assigned to both or individually as desired.
FIG. 45 shows the two masks in color in the first image frame
allowing for the portions associated with the masks to be viewed.
This figure shows the masks as colored transparent masks so that
the masks can be adjusted if desired.
FIG. 46 shows the masking of the third object in the first image
frame. In this figure the hand is chosen as the third object. A
free form tool is utilized to define the shape of the mask.
FIG. 47 shows the three masks in color in the first image frame
allowing for the portions associated with the masks to be viewed.
Again, the masks can be adjusted if desired based on the
transparent masks.
FIG. 48 shows the masking of the fourth object in the first image
frame. As shown the person's jacket form the fourth object.
FIG. 49 shows the masking of the fifth object in the first image
frame. As shown the person's sleeve forms the fifth object.
FIG. 50 shows a control panel for the creation of three-dimensional
images, including the association of layers and three-dimensional
objects to masks within an image frame, specifically showing the
creation of a Plane layer for the sleeve of the person in the
image. On the right side of the screendump, the "Rotate" button is
enabled, shown a "Translate Z" rotation quantity showing that the
sleeve is rotated forward as is shown in the next figure.
FIG. 51 shows a three-dimensional view of the various masks shown
in FIGS. 43-49, wherein the mask associated with the sleeve of the
person is shown as a Plane layer that is rotated toward the left
and right viewpoints on the right of the page. Also, as is shown
the masks associated with the jacket and person's face have been
assigned a Z-dimension or depth that is in front of the
background.
FIG. 52 shows a slightly rotated view of FIG. 51. This figure shows
the Plane layer with the rotated sleeve tilted toward the
viewpoints. The crystal ball is shown as a flat object, still in
two-dimensions as it has not yet been assigned a three-dimensional
object type.
FIG. 53 shows a slightly rotated view of FIGS. 51 (and 52), wherein
the sleeve is shown tilting forward, again without ever defining a
wire frame model for the sleeve. Alternatively, a three-dimensional
object type of column can be applied to the sleeve to make an even
more realistically three-dimensional shaped object. The Plane type
is shown here for brevity.
FIG. 54 shows a control panel specifically showing the creation of
a sphere object for the crystal ball in front of the person in the
image. In this figure, the Sphere three-dimensional object is
created and dropped into the three-dimensional image by clicking
the "create selected" button in the middle of the frame, which is
then shown (after translation and resizing onto the crystal ball in
the next figure).
FIG. 55 shows the application of the sphere object to the flat mask
of the crystal ball, that is shown within the sphere and as
projected to the front and back of the sphere to show the depth
assigned to the crystal ball. The Sphere object can be translated,
i.e., moved in three axis, and resized to fit the object that it is
associated with. The projection of the crystal ball onto the sphere
shows that the Sphere object is slightly larger than the crystal
ball, however this ensures that the full crystal ball pixels are
assigned depths. The Sphere object can be resized to the actual
size of the sphere as well for more refined work projects as
desired.
FIG. 56 shows a top view of the three-dimensional representation of
the first image frame showing the Z-dimension assigned to the
crystal ball shows that the crystal ball is in front of the person
in the scene.
FIG. 57 shows that the sleeve plane rotating in the X-axis to make
the sleeve appear to be coming out of the image more. The circle
with a line (X axis line) projecting through it defines the plane
of rotation of the three-dimensional object, here a plane
associated with the sleeve mask.
FIG. 58 shows a control panel specifically showing the creation of
a Head object for application to the person's face in the image,
i.e., to give the person's face realistic depth without requiring a
wire model for example. The Head object is created using the
"Created Selected" button in the middle of the screen and is shown
in the next figure.
FIG. 59 shows the Head object in the three-dimensional view, too
large and not aligned with the actual person's head. After creating
the Head object as per FIG. 58, the Head object shows up in the
three-dimensional view as a generic depth primitive that is
applicable to heads in general. This is due to the fact that depth
information is not exactly required for the human eye. Hence, in
depth assignments, generic depth primitives may be utilized in
order to eliminate the need for three-dimensional wire frames. The
Head object is translated, rotated and resized in subsequent
figures as detailed below.
FIG. 60 shows the Head object in the three-dimensional view,
resized to fit the person's face and aligned, e.g., translated to
the position of the actual person's head.
FIG. 61 shows the Head object in the three-dimensional view, with
the Y-axis rotation shown by the circle and Y-axis originating from
the person's head thus allowing for the correct rotation of the
Head object to correspond to the orientation of the person's
face.
FIG. 62 shows the Head object also rotated slightly clockwise,
about the Z-axis to correspond to the person's slightly tilted
head. The mask shows that the face does not have to be exactly
lined up for the result three-dimensional image to be believable to
the human eye. More exacting rotation and resizing can be utilized
where desired.
FIG. 63 shows the propagation of the masks into the second and
final image frame. All of the methods previously disclosed above
for moving masks and reshaping them are applied not only to
colorization but to depth enhancement as well. Once the masks are
propagated into another frame, all frames between the two frames
may thus be tweened. By tweening the frames, the depth information
(and color information if not a color movie) are thus applied to
non-key frames.
FIG. 64 shows the original position of the mask corresponding to
the person's hand.
FIG. 65 shows the reshaping of the mask, that is performed
automatically and with can be adjusted in key frames manually if
desired, wherein any intermediate frames get the tweened depth
information between the first image frame masks and the second
image frame masks. The automatic tracking of masks and reshaping of
the masks allows for great savings in labor. Allowing manual
refinement of the masks allows for precision work where
desired.
FIG. 66 shows the missing information for the left viewpoint as
highlighted in color on the left side of the masked objects in the
lower image when the foreground object, here a crystal ball is
translated to the right. In generating the left viewpoint of the
three-dimensional image, the highlighted data must be generated to
fill the missing information from that viewpoint.
FIG. 67 shows the missing information for the right viewpoint as
highlighted in color on the right side of the masked objects in the
lower image when the foreground object, here a crystal ball is
translated to the left. In generating the right viewpoint of the
three-dimensional image, the highlighted data must be generated to
fill the missing information from that viewpoint. Alternatively, a
single camera viewpoint may be offset from the viewpoint of the
original camera, however the missing data is large for the new
viewpoint. This may be utilized if there are a large number of
frames and some of the missing information is found in adjacent
frames for example.
FIG. 68 shows an anaglyph of the final depth enhanced first image
frame viewable with Red/Blue 3-D glasses. The original
two-dimensional image is now shown in three-dimensions.
FIG. 69 shows an anaglyph of the final depth enhanced second and
last image frame viewable with Red/Blue 3-D glasses, note rotation
of person's head, movement of person's hand and movement of crystal
ball. The original two-dimensional image is now shown in
three-dimensions as the masks have been moved/reshaped using the
mask tracking/reshaping as described above and applying depth
information to the masks in this subsequent frame from an image
sequence. As described above, the operations for applying the depth
parameter to a subsequent frame is performed using a general
purpose computer having a central processing unit (CPU), memory,
bus situated between the CPU and memory for example specifically
programmed to do so wherein figures herein which show computer
screen displays are meant to represent such a computer.
FIG. 70 shows the right side of the crystal ball with fill mode
"smear", wherein the pixels with missing information for the left
viewpoint, i.e., on the right side of the crystal ball are taken
from the right edge of the missing image pixels and "smeared"
horizontally to cover the missing information. Any other method for
introducing data into hidden areas is in keeping with the spirit of
the invention. Stretching or smearing pixels where missing
information is creates artifacts that are recognizable to human
observers as errors. By obtaining or otherwise creating realistic
data for the missing information is, i.e., for example via a
generated background with missing information filled in, methods of
filling missing data can be avoided and artifacts are thus
eliminated. For example, providing a composite background or frame
with all missing information designated in a way that an artist can
use to create a plausible drawing or painting of a missing area is
one method of obtaining missing information for use in
two-dimensional to three-dimensional conversion projects.
FIG. 71 shows a mask or alpha plane for a given frame of a scene,
for an actor's upper torso and head 7101, and transparent wings
7102. The mask may include opaque areas shown as black and
transparent areas that are shown as grey areas. The alpha plane may
be generated for example as an 8 bit grey-scale "OR" of all
foreground masks. Any other method of generating a foreground mask
having motion objects or foreground object related masks defined is
in keeping with the spirit of the invention.
FIG. 72 shows an occluded area, i.e., missing background image data
7201 as a colored sub-area of the actor of FIG. 71 that never
uncovers the underlying background, i.e., where missing information
in the background for a scene or frame occurs. This area is the
area of the background that is never exposed in any frame in a
scene and hence cannot be borrowed from another frame. When for
example generating a composite background, any background pixel not
covered by a motion object mask or foreground mask can have a
simple Boolean TRUE value, all other pixels are thus the occluded
pixels as is also shown in FIG. 34.
FIG. 73 shows the occluded area of FIG. 72 with generated data
7201a for missing background image data that is artistically drawn
or otherwise rendered to generate a complete and realistic
background for use in artifact free two-dimensional to
three-dimensional conversion. See also FIG. 34 and the description
thereof. As shown, FIG. 73 also has masks drawn on background
objects, which are shown in colors that differ from the source
image. This allows for colorization or colorization modifications
for example as desired.
FIG. 73A shows the occluded area with missing background image data
7201b partially drawn or otherwise rendered to generate just enough
of a realistic looking background for use in artifact free
two-dimensional to three-dimensional conversion. An artist in this
example may draw narrower versions of the occluded areas, so that
offsets to foreground objects would have enough realistic
background to work with when projecting a second view, i.e.,
translating a foreground object horizontally which exposes occluded
areas. In other words, the edges of the missing background image
data area may be drawn horizontally inward by enough to allow for
some of the generated data to be used, or all of the generated data
to be used in generating a second viewpoint for a three-dimensional
image set.
In one or more embodiments of the invention, a number of scenes
from a movie may be generated for example by computer drawing by
artists or sent to artists for completion of backgrounds. In one or
more embodiments, a website may be created for artists to bid on
background completion projects wherein the website is hosted on a
computer system connected for example to the Internet. Any other
method for obtaining backgrounds with enough information to render
a two-dimensional frame into a three-dimensional pair of viewpoints
is in keeping with the spirit of the invention, including rendering
a full background with realistic data for all of the occluded area
of FIG. 72 (which is shown in FIG. 73) or only a portion of the
edges of the occluded area of FIG. 72, (which is shown as FIG.
73A). By estimating a background depth and a depth to a foreground
object and knowing the offset distance desired for two viewpoints,
it is thus possible to obtain less than the whole occluded area for
use in artifact free two-dimensional to three-dimensional
conversion. In one or more embodiments, a fixed offset, e.g., 100
pixels on each edge of each occluded area, or a percentage of the
size of the foreground object, i.e., 5% for example, may flagged to
be created and if more data is needed, then the frame is flagged
for updating, or smearing or pixel stretching may be utilized to
minimize the artifacts of missing data.
FIG. 74 shows a light area of the shoulder portion on the right
side of FIG. 71, where missing background image data 7201 exists
when generating a right viewpoint for a right image of a
three-dimensional image pair. Missing background image data 7201
represents a gap where stretching (as is also shown in FIG. 70) or
other artifact producing techniques would be used when shifting the
foreground object to the left to create a right viewpoint. The dark
portion of the figure is taken from the background where data is
available in at least one frame of a scene.
FIG. 75 shows an example of the stretching of pixels, or "smeared
pixels" 7201c, corresponding to the light area in FIG. 74, i.e.,
missing background image data 7201, wherein the pixels are created
without the use of a generated background, i.e., if no background
data is available for an area that is occluded in all frames of a
scene.
FIG. 76 shows a result of a right viewpoint without artifacts on
the edge of the shoulder of the person through use of generated
data 7201a (or 7201b) for missing background image data 7201 shown
as for always-occluded areas of a scene.
FIG. 77 shows an example of a computer-generated element, here
robot 7701, which is modeled in three-dimensional space and
projected as a two-dimensional image. The background is grey to
signify invisible areas. As is shown in the following figures,
metadata such as alpha, mask, depth or any combination thereof is
utilized to speed the conversion process from two-dimensional image
to a pair of two-dimensional images for left and right eye for
three-dimensional viewing. Masking this character by hand, or even
in a computer-aided manner by an operator is extremely time
consuming since there are literally hundreds if not thousands of
sub-masks required to render depth (and/or color) correctly to this
complex object.
FIG. 78 shows an original image separated into background 7801 and
foreground elements 7802 and 7803, (mountain and sky in the
background and soldiers in the bottom left also see FIG. 79) along
with the imported color and depth of the computer-generated
element, i.e., robot 7803 with depth automatically set via the
imported depth metadata. Although the soldiers exist in the
original image, their depths are set by an operator, and generally
shapes or masks with varying depths are applied at these depths
with respect to the original objects to obtain a pair of stereo
images for left and right eye viewing. (See FIG. 79). As shown in
the background, any area that is covered for the scene such as
outline 7804 (of a soldier's head projected onto the background)
can be artistically rendered for example to provide believable
missing data, as is shown in FIG. 73 based on the missing data of
FIG. 73A, which results in artifact free edges as shown in FIG. 76
for example. Importing data for computer generated elements may
include reading a file that has depth information on a
pixel-by-pixel basis for computer-generated element 7701 and
displaying that information in a perspective view on a computer
display as an imported element, e.g., robot 7803. This import
process saves enormous amounts of operator time and makes
conversion of a two-dimensional movie into a three-dimensional
movie economically viable. One or more embodiments of the invention
store the masks and imported data in computer memory and/or
computer disk drives for use by one or more computers in the
conversion process.
FIG. 79 shows mask 7901 (forming a portion of the helmet of the
rightmost soldier) associated with the photograph of soldiers 7802
in the foreground. Mask 7901 along with all other operated-defined
masks shown in multiple artificial colors on the soldiers, to apply
depth to the various portions of the soldiers occurring in the
original image that lie in depth in front of the computer-generated
element, i.e., robot 7803. The dashed lines horizontally extending
from the mask areas 7902 and 7903 show horizontal translation of
the foreground objects takes place and where imported metadata can
be utilized to accurately auto-correct over-painting of depth or
color on the masked objects when metadata exists for the other
elements of a movie. For example, when an alpha exists for the
objects that occur in front of the computer-generated elements, the
edges can be accurately determined. One type of file that can be
utilized to obtain mask edge data is a file with alpha file and/or
mask data such as an RGBA file. (See FIG. 80). In addition, use of
generated data for missing areas of the background at these
horizontally translated mask areas 7902 and 7903 enables artifact
free two-dimensional to three-dimensional conversion.
FIG. 80 shows an imported alpha layer 8001 shown as a dark blue
overlay, which can also be utilized as a mask layer to limit the
operator defined, and potentially less accurate masks used for
applying depth to the edges of the three soldiers 7802 and
designated as soldiers A, B and C. In addition, an optional
computer-generated element, such as dust can be inserted into the
scene along the line annotated as "DUST", to augment the reality of
the scene if desired. Any of the background, foreground or
computer-generated elements can be utilized to fill portions of the
final left and right image pairs as is required.
FIG. 81 shows the result of using the operator-defined masks
without adjustment when overlaying a motion element such as the
soldier on the computer-generated element such as the robot.
Without the use of metadata associated with the original image
objects, such as matte or alpha 8001, artifacts occur where
operator-defined masks do not exactly align with the edges of the
masked objects. In the topmost picture, the soldier's lips show a
light colored edge 8101 while the lower picture shows an artifact
free edge since the alpha of FIG. 80 is used to limit the edges of
any operator-defined masks. Through use of the alpha metadata of
FIG. 80 applied to the operated-defined mask edges of FIG. 79,
artifact free edges on the overlapping areas is thus enabled. As
one skilled in the art will appreciate, application of successively
nearer elements combined with their alphas is used to layer all of
the objects at their various depths from back to front to create a
final image pair for left eye and right eye viewing.
Embodiments of the invention enable real-time editing of 3D images
without re-rendering for example to alter layers/colors/masks/depth
or add or improve missing background information for example in
gaps and/or remove artifacts in these elements and to minimize or
eliminate iterative workflow paths back through different
workgroups. Embodiments enable this functionality by locally
altering masks, for example based on Z depth alpha mask
manipulation, generating translation files, e.g., U Maps, that can
be utilized as portable pixel-wise editing files, detecting gaps in
the translations files for occluded areas and filling the gaps for
example to make clean plates, or to more realistically fill the gap
with blur and grain, or to add film grain based on luminance. In
addition, depth may be manipulated by locally manipulating the U
Maps or Z depth map or to otherwise displace regions in source
files without re-rendering, e.g., without ray tracing and the huge
computational effort involved with tracing light through each pixel
in large format left and right eyes, through local shifting and
manipulation of pixels which is several orders of magnitude faster.
For example, a mask group takes source images and creates masks for
items, areas or human recognizable objects in each frame of a
sequence of images that make up a movie. The depth augmentation
group applies depths, and for example shapes, to the masks created
by the mask group. When rendering an image pair, left and right
viewpoint images and left and right translation files and/or Z
depth map may be generated by one or more embodiments of the
invention. The left and right viewpoint images allow 3D viewing of
the original 2D image. The translation files specify the pixel
offsets for each source pixel in the original 2D image, for example
in the form of UV or U maps. These files are generally related to
an alpha mask for each layer, for example a layer for an actress, a
layer for a door, a layer for a background, etc. These translation
files, or maps are passed from the depth augmentation group that
renders 3D images, to the quality assurance workgroup or composite
group. This allows the quality assurance workgroup (or other
workgroup such as the depth augmentation group) to perform
real-time editing of 3D images without re-rendering for example to
alter layers/colors/masks/depth or add or improve missing
background information for example in gaps and/or remove artifacts
such as masking errors without delays associated with processing
time/re-rendering and/or iterative workflow that requires such
re-rendering or sending the masks back to the mask group for
rework, wherein the mask group may be in a third world country with
unskilled labor on the other side of the globe. In addition, when
rendering the left and right images, i.e., 3D images, the Z depth
of regions within the image, such as actors for example, may also
be passed along with the alpha mask to the quality assurance group,
who may then adjust depth as well without re-rendering with the
original rendering software. This may be performed for example with
generated missing background data from any layer so as to allow
"downstream" real-time editing without re-rendering or ray-tracing
for example. Quality assurance may give feedback to the masking
group or depth augmentation group for individuals so that these
individuals may be instructed to produce work product as desired
for the given project, without waiting for, or requiring the
upstream groups to rework anything for the current project. This
allows for feedback yet eliminates iterative delays involved with
sending work product back for rework and the associated delay for
waiting for the reworked work product. Elimination of iterations
such as this provide a huge savings in wall-time, or end-to-end
time that a conversion project takes, thereby increasing profits
and minimizing the workforce needed to implement the workflow.
FIG. 82 shows a source image to be depth enhanced and provided
along with left and right translation files (see FIGS. 85A-D and
86A-D for embodiments of translation files) and alpha masks (such
as shown in FIG. 79) to enable real-time editing of 3D images
without re-rendering or ray-tracing the entire image sequence in a
scene (e.g., by downstream workgroups) for example to alter
layers/colors/masks and/or remove and/or or adjust depths or
otherwise change the 3D images without iterative workflow paths
back to the original workgroups (as per FIG. 96 versus FIG.
95).
FIG. 83 shows masks generated by the mask workgroup for the
application of depth by the depth augmentation group, wherein the
masks are associated with objects, such as for example human
recognizable objects in the source image of FIG. 82. Generally,
unskilled labor is utilized to mask human recognizable objects in
key frames within a scene or sequence of images. The unskilled
labor is cheap and generally located offshore. Hundreds of workers
may be hired at low prices to perform this tedious work associated
with masking Any existing colorization masks may be utilized as a
starting point for 3D masks, which may be combined to form a 3D
mask outline that is broken into sub-masks that define differing
depths within a human recognizable object. Any other method of
obtaining masks for areas of an image are in keeping with the
spirit of the invention.
FIG. 84 shows areas where depth is applied generally as darker for
nearer objects and lighter for objects that are further away. This
view gives a quick overview of the relative depths of objects in a
frame.
FIG. 85A shows a left UV map containing translations or offsets in
the horizontal direction for each source pixel. When rendering a
scene with depths applied, translation maps that map the offsets of
horizontal movement of individual pixels in a graphical manner may
be utilized. FIG. 85B shows a right UV map containing translations
or offsets in the horizontal direction for each source pixel. Since
each of these images looks the same, it is easier to observe that
there are subtle differences in the two files by shifting the black
value of the color, so as to highlight the differences in a
particular area of FIGS. 85A and 85B. FIG. 85C shows a black value
shifted portion of the left UV map of FIG. 85A to show the subtle
contents therein. This area corresponds to the tree branches shown
in the upper right corner of FIGS. 82, 83 and 84 just above the
cement mixer truck and to the left of the light pole. FIG. 85D
shows a black value shifted portion of the right UV map of FIG. 85B
to show the subtle contents therein. The branches shown in the
slight variances of color signify that those pixels would be
shifted to the corresponding location in a pure UV map that maps
Red from darkest to lightest in the horizontal direction and maps
Green from darkest to lightest in the vertical direction. In other
words, the translation map in the UV embodiment is a graphical
depiction of the shifting that occurs when generating a left and
right viewpoint with respect to the original source image. UV maps
may be utilized, however, any other file type that contains
horizontal offsets from a source image on a pixel-by-pixel basis
(or finer grained) may be utilized, including compressed formats
that are not readily viewable as images. Some software packages for
editing come with pre-built UV widgets, and hence, UV translation
files or maps can therefore be utilized if desired. For example,
certain compositing programs have pre-built objects that enable UV
maps to be readily utilized and otherwise manipulated graphically
and hence for these implementations, graphically viewable files may
be utilized, but are not required.
Since creation of a left and right viewpoint from a 2D image uses
horizontal shifts, it is possible to use a single color for the
translation file. For example, since each row of the translation
file is already indexed in a vertical direction based on the
location in memory, it is possible to simply use one increasing
color, for example Red in the horizontal direction to signify an
original location of a pixel. Hence, any shift of pixels in the
translation map are shown as shifts of a given pixel value from one
horizontal offset to another, which makes for subtle color changes
when the shifts are small, for example in the background. FIG. 86A
shows a left U map containing translations or offsets in the
horizontal direction for each source pixel. FIG. 86B shows a right
U map containing translations or offsets in the horizontal
direction for each source pixel. FIG. 86C shows a black value
shifted portion of the left U map of FIG. 86A to show the subtle
contents therein. FIG. 86D shows a black value shifted portion of
the right U map of FIG. 86B to show the subtle contents therein.
Again there is no requirement that a humanly viewable file format
be utilized, and any format that stores horizontal offsets on a
pixel-by-pixel basis relative to a source image may be utilized.
Since memory and storage is so cheap, any format whether compressed
or not may be utilized without any significant increase in cost
however. Generally, creation of a right eye image makes foreground
portions of the U map (or UV map) appear darker since they are
shifting left and visa versa. This is easy to observe by looking at
something in the foreground with only the right eye open and then
moving slightly to the right (to observe that the foreground object
has indeed been shifted to the left). Since the U map (or UV map)
in the unaltered state is a simple ramp of color from dark to
light, it then follows that shifting something to the left, i.e.,
for the right viewpoint, maps it to a darker area of the U map (or
UV map). Hence the same tree branches in the same area of each U
map (or UV map) are darker for the right eye and brighter for the
left eye with respect to un-shifted pixels. Again, use of a
viewable map is not required, but shows the concept of shifting
that occurs for a given viewpoint.
FIG. 87 shows known uses for UV maps, wherein a three-dimensional
model is unfolded so that an image in UV space can be painted onto
the 3D model using the UV map. This figure shows how UV maps have
traditionally been utilized to apply a texture map to a 3D shape.
For example, the texture, here a painting or flat set of captured
images of the Earth is mapped to a U and V coordinate system, that
is translated to an X, Y and Z coordinate on the 3D model.
Traditional animation has been performed in this manner in that
wire frame models are unraveled and flattened, which defines the U
and V coordinate system in which to apply a texture map.
Embodiments of the invention described herein utilize UV and U maps
in a new manner in that a pair of maps are utilized to define the
horizontal offsets for two images (left and right) that each source
pixel is translated to as opposed to a single map that is utilized
to define a coordinate onto which a texture map is placed on a 3D
model or wire frame. I.e., embodiments of the invention utilize UV
and U maps (or any other horizontal translation file format) to
allow for adjustments to the offset objects without re-rendering
the entire scene. Again, as opposed to the known use of a UV map,
for example that maps two orthogonal coordinates to a
three-dimensional object, embodiments of the invention enabled
herein utilize two maps, i.e., one for a left and one for a right
eye, that map horizontal translations for the left and right
viewpoints. In other words, since pixels translate only in the
horizontal direction (for left and right eyes), embodiments of the
invention map within one-dimension on a horizontal line-by-line
basis. I.e., the known art maps 2 dimensions to 3 dimensions, while
embodiments of the invention utilize 2 maps of translations within
1 dimension (hence visible embodiments of the translation map can
utilize one color). For example, if one line of a translation file
contains 0, 1, 2, 3 . . . 1918, 1919, and the 2.sup.nd and 3.sup.rd
pixels are translated right by 4 pixels, then the line of the file
would read 0, 4, 5, 3 . . . 1918, 1919. Other formats showing
relative offsets are not viewable as ramped color areas, but may
provide great compression levels, for example a line of the file
using relative offsets may read, 0, 0, 0, 0 . . . 0, 0, while a
right shift of 4 pixels in the 2.sup.nd and 3.sup.rd pixels would
make the file read 0, 4, 4, 0, . . . 0, 0. This type of file can be
compressed to a great extent if there are large portions of
background that have zero horizontal offsets in both the right and
left viewpoints. However, this file could be viewed as a standard U
file is it was ramped, i.e., made absolute as opposed to relative
to view as a color-coded translation file. Any other format capable
of storing offsets for horizontal shifts for left and right
viewpoints may be utilized in embodiments of the invention. UV
files similarly have a ramp function in the Y or vertical axis as
well, the values in such a file would be (0,0), (0,1), (0,2) . . .
(0, 1918), (0,1919) corresponding to each pixel, for example for
the bottom row of the image and (1,0), (1,1), etc., for the second
horizontal line, or row for example. This type of offset file
allows for movement of pixels in non-horizontal rows, however
embodiments of the invention simply shift data horizontally for
left and right viewpoints, and so do not need the to keep track of
which vertical row a source pixel moves to since horizontal
movement is by definition within the same row.
FIG. 88 shows a disparity map showing the areas where the
difference between the left and right translation maps is the
largest. This shows that objects closest to the viewer have pixels
that are shifted the most between the two UV (or U) maps shown in
FIG. 85A-B (or 86A-B).
FIG. 89 shows a left eye rendering of the source image of FIG. 82.
FIG. 90 shows a right eye rendering of the source image of FIG. 82.
FIG. 91 shows an anaglyph of the images of FIG. 89 and FIG. 90 for
use with Red/Blue glasses.
FIG. 92 shows an image that has been masked and is in the process
of depth enhancement for the various layers, including the actress
layer, door layer, background layer (showing missing background
information that may be filled in through generation of missing
information--see FIGS. 34, 73 and 76 for example). I.e., the empty
portion of the background behind the actress in FIG. 92 can be
filled with generated image data, (see the outline of the actress's
head on the background wall). Through utilization of generated
image data for each layer, a compositing program for example may be
utilized as opposed to re-rendering or ray-tracing all images in a
scene for real-time editing. For example, if the hair mask of the
actress in FIG. 92 is altered to more correctly cover the hair,
then any pixels uncovered by the new mask that are obtained from
the background and are nearly instantaneous available to view (as
opposed to standard re-rendering or ray-tracing that can take hours
of processing power to re-render all of the images in a scene when
anything in a scene is edited). This may include obtaining
generated data for any layer including the background for use in
artifact free 3D image generation.
FIG. 93 shows a UV map overlaid onto an alpha mask associated with
the actress shown in FIG. 92 which sets the translation offsets in
the resulting left and right UV maps based on the depth settings of
the various pixels in the alpha mask. This UV layer may be utilized
with other UV layers to provide a quality assurance workgroup (or
other workgroup) with the ability to real-time edit the 3D images,
for example to correct artifacts, or correct masking errors without
re-rendering an entire image. Iterative workflows however may
require sending the frame back to a third-world country for rework
of the masks, which are then sent back to a different workgroup for
example in the United States to re-render the image, which then
viewed again by the quality assurance workgroup. This type of
iterative workflow is eliminated or minor artifacts altogether
since the quality assurance workgroup can simply reshape an alpha
mask and regenerate the pixel offsets from the original source
image to edit the 3D images in real-time and avoid involving other
workgroups for example. Setting the depth of the actress as per
FIGS. 42-70 for example or in any other method determines the
amount of shift that the unaltered UV map undergoes to generate to
UV maps, one for left-eye and one for right-eye image manipulation
as per FIG. 85A-D, (or U maps in FIGS. 86A-D). The maps may be
supplied for each layer along with an alpha mask for example to any
compositing program, wherein changes to a mask for example allows
the compositing program to simply obtain pixels from other layers
to "add up" an image in real-time. This may include using generated
image data for any layer (or gap fill data if no generated data
exists for a deeper layer). One skilled in the art will appreciate
that a set of layers with masks are combined in a compositing
program to form an output image by arbitrating or otherwise
determining which layers and corresponding images to lay on top of
one another to form an output image. Any method of combining a
source image pixel to form an output pixel using a pair of
horizontal translation maps without re-rendering or ray-tracing
again after adding depth is in keeping with the spirit of the
invention.
FIG. 94 shows a workspace generated for a second depth enhancement
program, based on the various layers shown in FIG. 92, i.e., left
and right UV translation maps for each of the alphas wherein the
workspace allows for quality assurance personnel (or other
workgroups) to adjust masks and hence alter the 3D image pair (or
anaglyph) in real-time without re-rendering or ray-tracking and/or
without iteratively sending fixes to any other workgroup. One or
more embodiments of the invention may loop through a source file
for the number of layers and create script that generates the
workspace as shown in FIG. 94. For example, once the mask workgroup
has created the masks for the various layers and generated mask
files, the rendering group may read in the mask files
programmatically and generate script code that includes generation
of a source icon, alpha copy icons for each layer, left and right
UV maps for each layer based on the rendering groups rendered
output, and other icons to combine the various layers into left and
right viewpoint images. This allows the quality assurance workgroup
to utilize tools that they are familiar with and which may be
faster and less complex than the rendering tools utilized by the
rendering workgroup. Any method for generation of a graphical user
interface for a worker to enable real-time editing of 3D images
including a method to create a source icon for each frame, that
connects to an alpha mask icon for each layer and generates
translation maps for left and right viewpoints that connect to one
another and loops for each layer until combining with an output
viewpoint for 3D viewing is in keeping with the spirit of the
invention. Alternatively, any other method that enables real-time
editing of images without re-rendering through use of a pair of
translation maps is in keeping with the spirit of the invention
even if the translation maps are not viewable or not shown to the
user.
FIG. 95 shows a workflow for iterative corrective workflow. A mask
workgroup generates masks for objects, such as for example, human
recognizable objects or any other shapes in an image sequence at
9501. This may include generation of groups of sub-masks and the
generation of layers that define different depth regions. This step
is generally performed by unskilled and/or low wage labor,
generally in a country with very low labor costs. The masked
objects are viewed by higher skilled employees, generally artists,
who apply depth and/or color to the masked regions in a scene at
9502. The artists are generally located in an industrialized
country with higher labor costs. Another workgroup, generally a
quality assurance group then views the resulting images at 9503 and
determines if there are any artifacts or errors that need fixing
based on the requirements of the particular project. If so, the
masks with errors or locations in the image where errors are found
are sent back to the masking workgroup for rework, i.e., from 9504
to 9501. Once there are no more errors, the process completes at
9505. Even in smaller workgroups, errors may be corrected by
re-reworking masks and re-rendering or otherwise ray-tracing all of
the images in a scene which can take hours of processing time to
make a simple change for example. Errors in depth judgment
generally occur less often as the higher skilled laborers apply
depths based on a higher skill level, and hence kickbacks to the
rendering group occur less often in general, hence this loop is not
shown in the figure for brevity although this iterative path may
occur. Masking "kickback" may take a great amount of time to work
back through the system since the work product must be re-masked
and then re-rendered by other workgroups.
FIG. 96 shows an embodiment of the workflow enabled by one or more
embodiments of the system in that each workgroup can perform
real-time editing of 3D images without re-rendering for example to
alter layers/colors/masks and/or remove artifacts and otherwise
correct work product from another workgroup without iterative
delays associated with re-rendering/ray-tracing or sending work
product back through the workflow for corrections. The generation
of masks occurs as in FIG. 95 at 9501, depth is applied as occurs
in FIG. 95 at 9502. In addition, the rendering group generates
translation maps that accompany the rendered images to the quality
assurance group at 9601. The quality assurance group views the work
product at 9503 as in FIG. 95 and also checks for artifacts as in
FIG. 95 at 9504. However, since the quality assurance workgroup (or
other workgroup) has translation maps, and the accompanying layers
and alpha masks, they can edit 3D images in real-time or otherwise
locally correct images without re-rendering at 9602, for example
using commercially available compositing programs such as NUKE.RTM.
as one skilled in the art will appreciate. For instance as is shown
in FIG. 94, the quality assurance workgroup can open a graphics
program that they are familiar with (as opposed to a complex
rendering program used by the artists), and adjust an alpha mask
for example wherein the offsets in each left right translation map
are reshaped as desired by the quality assurance workgroup and the
output images are formed layer by layer (using any generated
missing background information as per FIGS. 34, 73 and 76 and any
computer generated element layers as per FIG. 79). As one skilled
in the art will recognize, generating two output images from
furthest back layer to foreground layer can be done without
ray-tracing, by only overlaying pixels from each layer onto the
final output images nearly instantaneously. This effectively allows
for local pixel-by-pixel image manipulation by the quality
assurance workgroup instead of 3D modeling and ray-tracing, etc.,
as utilized by the rendering workgroup. This can save multiple
hours of processing time and/or delays associated with waiting for
other workers to re-render a sequence of images that make up a
scene.
FIG. 97 shows an embodiment of the rapid workflow for local
modification of masks, gaps next to masks, depth maps, translation
values such as UV or U Maps or any combination thereof to remove
artifacts, create missing background information or otherwise
adjust, alter, improve or otherwise modify rendered images without
re-rendering or otherwise re-ray tracing the images. This saves a
large amount of computing time since rendering/ray tracing is
computationally expensive. In addition, this saves a tremendous
amount of wall time since the masking process and rendering
processes may be done by different sets of employees, for example
distally located. Source images, for example an image sequence or
frame of images in a scene are obtained at 9701, wherein masks
associated with various objects or items in the images are also
obtained. Depth maps, translation maps or values and optionally
colors for the masked regions for the images are obtained at 9702.
The rendered right and left images are obtained at 9703 and result
from ray tracing source images using the depths associated with the
masks with respect to two virtual cameras focused on a convergence
point wherein the two virtual cameras offset horizontally by an
intraocular or interaxial distance. The ray tracing process
generates two images for each input source image, wherein the two
images are intended for left and right eye viewing to produce
highly realistic stereoscopic images for three-dimensional viewing,
i.e., provide a stereoscopic 3D effect. Ray tracing enables optical
effects such as refraction, reflection, dispersion and scattering
for example but requires tracing each path of light through
potentially numerous reflections and through translucent matter for
each pixel in large format left and right eye images. Thus ray
tracing in one or more embodiments of the invention is done once,
wherein subsequent editing is done by local manipulation or
shifting of pixels that is several orders of magnitude faster as
per step 9706. The resulting left and right images may be combined
into one image that may be viewed with polarized lenses or red/cyan
lenses in the case of a polarized or an anaglyph image respectively
through superposition of two images that are coded through polarity
or color for example. Regardless of the output format of the right
and left viewpoints, the ray tracing is computationally expensive
and thus embodiments of the invention may entirely eliminate
iterative ray tracing when masks, background missing information,
gap fills, depths or any other value changes that would require
re-rendering, i.e., ray tracing another time. The rendered images,
for example created with ray tracing are displayed at 9704 and thus
viewed with appropriate three-dimensional viewing apparatus by a
quality assurance worker or compositing worker for example. If
artifacts are detected at 9705, then embodiments of the invention
at 9706 are utilized to locally modify masks, or depth maps, or
translation maps or values, or any combination thereof without
re-rendering or otherwise ray tracing the images/masks again. This
step is performed generally until the results are satisfactory, or
up to a quality assurance level that is acceptable or until
otherwise desired, for example until the artifacts are no longer
visible or objected to when the image sequence is viewed. If no
artifacts are detected or not of a nature that warrants
modification, as decided at 9705, or if local modification is
complete at 9706, then processing completes at 9707.
In one or more embodiments of the invention gap analysis processing
may be utilized to display color coding of gaps or other artifacts
for example based on their magnitude or thickness to enable rapid
identification and local modification of areas where artifacts may
be visible or unacceptable. This is described below with further
detail with respect to FIG. 103.
In one or more embodiments of the invention, the local modification
of masks, depth maps, translation maps or masks or missing
background information as performed at 9706 may include processing
for Z Depth Alpha, UV Gap Detection, Gap Fill, Gap Blur Grain,
Grain Merge, UV Distort, Z Distort or Displacer processing as
enabled below.
Z Depth Alpha processing is shown in FIGS. 98A-D. Z Depth
processing enables rapid workflow for 2D to 3D image conversion,
and specifically enables dilation of edges of foreground masks to
generate clean edges on foreground objects without re-rendering
pairs of images. The depth mask shown in FIG. 98A includes masks
for a man on the left facing away and a portion of a woman on the
right facing towards the camera. As shown, the darker the pixel,
the closer the pixel is and the lighter the pixel, the further away
it is from the camera. This relationship may be inverted or encoded
in any other manner so long as the depth map includes distance away
from a camera, or cameras in the case of stereoscopic projection.
The depth map may be encoded and not visual in nature if desired.
By slicing the depth map, through for example accepting a cut off
value of depth, so that any values greater than a threshold
distance away from the camera are eliminated, the depth map of FIG.
98B is created. This eliminates the depths associated with any
objects further in Z depth than the man for example. By accepting a
fill amount, which controls the amount of expansion of the mask,
the mask is effectively reshaped to create a modified depth mask.
This may be utilized for example to refine the specific area where
depth is applied to the source image, which enables one or more
embodiments of the invention to update the left and right viewpoint
images without ray tracing the entire image again. Thus, by
enabling masks with three-dimensional information to be locally
altered and new left and right viewpoint images to be updated, vast
amounts of time may be saved when compared to sending the updated
masks to a different workgroup for re-rendering for example.
UV Gap Detection processing is shown in FIGS. 99A-D. Specifically,
this process takes input UV or U maps and looks for values that are
the same horizontally within a threshold. Gaps are thus detected
when the values are above threshold, i.e., have shifted enough to
produce at least one pixel worth of missing background information.
An output alpha is produced after analyzing the pixel-by-pixel
horizontal offsets, or U values in ramp function that are shifted
for example left and right. The initial pass through the U Map is
shown in FIG. 99A. By increasing the threshold or tolerance, the
narrower gaps are shown in FIG. 99B. The gaps may be blurred and
clamped to remove gaps under a particular width or threshold and
this is shown in FIG. 99C. The resulting gaps may be dilated to
widen them as is shown in FIG. 99D. This produces the more
important gaps in the form of an alpha mask in one or more
embodiments of the invention. Any generated missing background
information, or gap fill image data may be utilized in the gaps
shown at the white lines in the alpha as desired. The gap may be
displayed with different colors that represent the size of the gap
so that the artist can concentrate on more obvious gaps for
example, and as is shown in FIG. 103 which is described below.
Gap Fill processing is shown in FIGS. 100A-E and is utilized to
extend the edge pixels of a masked area inward or outward. This
process may be utilized to generate missing background information,
expand depth masks, for example to extend edges of masked areas out
when they are lost due to motion blur, or to otherwise restore
motion blur or to account for other focus effects. In one or more
embodiments, the input source image is masked with an alpha mask of
a foreground object in the generation of missing background
information. For example, in one embodiment, the alpha of the
foreground object is utilized to mask out the foreground object,
i.e., the woman's face shown in FIG. 100A. The alpha of the woman's
face is utilized to mask the background as is shown in FIG. 100B.
When the foreground object, i.e., the woman has depth applied, the
mask associated with the image where the mask is located is shift
left and right depending on how near or far away the masked area is
set to reside at. In order to generate data to use when shifting
the foreground object, generated missing background information is
created for later depth use. See also FIGS. 71-76. The alpha is
then eroded into the missing background information area at FIG.
100C to generate data around the masked foreground object for use
when the foreground object is shifted left and right in applying
foreground depth. If there are primarily vertical lines in the
background near the masked foreground object as is shown in FIG.
100D, then the bias of the pixels, or stretching or extending of
colors across the gap, is vertically. If there are primarily
horizontal lines in the background near the masked foreground
object, then the bias of the pixels, or stretching or extending of
colors across the gap is horizontal. The missing background
information may be blurred and grain from the film near the mask
may also be added to the eroded area. Once the missing background
information is generated, the foreground object may be depth
adjusted to update previously rendered images without re-rendering
the images. Any technique that can determine whether there are
primarily horizontal or vertical or any other direction of lines in
a background, including histogramming for example or FFT's may be
utilized to determine whether to apply a background of a particular
orientation as is shown in FIG. 100E.
Gap Blur Grain processing is shown in FIGS. 101A-E. As shown, in
this example the gaps are detected in any manner as previously
described or in any other manner as is shown in FIG. 101A. Once the
gaps are identified, the gaps may be filled horizontally or
vertically and also may be blurred with added film grain from near
the gap. As is shown in FIG. 101B, if there is no apparent vertical
or horizontal background, then either bias may be utilized, for
example horizontal as is shown by extending the colors across the
gaps in a horizontal manner, optionally blurred. As is shown in
FIG. 101C, the film grain from an area outside of the gap for
example at background depth may be utilized to randomize or
otherwise make the horizontal or vertical gap fills more like the
film grain in the background. Close ups of the gaps filled without
film grain and with film grain are shown in FIGS. 101D and 101E
respectively. In one or more embodiments of the invention, a
pattern next to a gap may be automatically or manually sampled for
variance wherein the difference between each color and a desired
gap fill color is varied by the same pattern or amount. Any other
method of providing a pattern that simulates the film grain in an
image is in keeping with the spirit of the invention so long as the
processing is local and not ray traced a second time. In addition,
randomized grain in a known pattern may be based on, or a function
of the underlying luminance as per FIGS. 102A-B.
Grain Merge processing is shown in FIGS. 102A-B. The upper area of
each figure shows film grain as a function of the luminance, which
increases from left to right. The bottom input area may be utilized
to accept user inputs to change the mapping of film grain as
luminance varies. As shown in FIG. 102A, film grain input to the
gap fill processes may be set to not depend on the luminance. FIG.
102B however shows that more film grain is utilized in a low range
of luminance, while no film grain is utilized in a luminance range
just over half way across the luminance range. Thus, film grain or
any type of noise may be set to vary as a function of luminance in
any gap fill process or other process described herein.
UV Distort processing is performed by altering any of the UV or U
maps 85A-D or 86A-D and then updating the left and right viewpoint
images locally in order to avoid or eliminate ray tracing or
re-rendering. In one embodiment, the pixels are simply moved to the
new locations mapped by the UV or U maps to entirely avoid ray
tracing.
Z Distort or Displacer processes utilize a depth map, for example
as shown in FIG. 84 and alter the gray scale value, i.e., the depth
of a particular element, and shift pixels left and right by the new
amount to update the left and right viewpoint images without
computationally intensive ray tracing processing, potentially by
other workers using distal computers, which may be distal to any
computers used by a second set of masking workers for example. In
addition to the advantages of translation map alteration with UV or
U maps, Z Distort also enables stereo camera adjustments in
real-time and quick adjustments to depth in real-time, again
without re-rendering the entire image for example via ray tracing.
This enables depth occlusion by operating on nearer pixels last in
one or more embodiments in combination with any of the gap fill
techniques previously described. Displacer processing is a quick
translation tool that enables left and right translation of pixels
for modifying depth locally and in a quick manner that does not
perform occlusion processing and gap fill. This enables quick
adjustments to be viewed extremely quickly. In one or more
embodiments of the invention, the convergence distance, and
intraocular distance are obtained on a per image or per scene
basis, for example obtained via metadata written in each image if
desired. The source image and the depth map are altered by the user
and obtained to remap depth values on a pixel basis without ray
tracing. The pixels are shifted for left and right eyes based on
the new depths with priority to foreground pixels so that they
occlude background pixels. Wherever gaps are opened, generated
missing background information or gap fill is then utilized to fill
the gap. With respect to the pair of UV or U maps per image, Z maps
or depth maps are easier to compress, and only utilize one map per
source image. Less precision is generally utilized to store the
data and small changes to depth can easily be made by compositors,
for example locally in real-time without ray tracing the entire
frame. In addition, other advantages include use of client provided
depth maps and faster processing of operations on depth since only
one map per image is altered. This processing enables easy
correction of complex situations involving semi-transparent layers
locally in real-time without ray tracing. Hence, local updating of
image pairs is enabled through manipulation of local Z or depth
maps and/or translation maps or values without the need for
re-rendering through ray tracing.
In one or more embodiments of the invention gap analysis processing
may be utilized to display color coding of gaps 10301, 10302, 10303
or other artifacts for example based on their magnitude or
thickness to enable rapid identification and local modification of
areas where artifacts may be visible or unacceptable. In one or
more embodiments, the gap analysis processing draws Red areas 10301
for example in gaps that are thicker and will likely need more
attention than narrower gaps that may for example be drawn as green
areas 10302. Color coding artifacts or gaps also provides artists
with an easily identifiable view of areas of an image where
internal portions of characters/objects are broken, e.g., where
regions are not aligned with their neighbors correctly. In one or
more embodiments, a color lookup table may be utilized to provide a
color for a given width gap. In other embodiments the gap value may
compared with a threshold, for example with three ranges, 0-4, 5-9,
10+, wherein Green, Yellow and Red are applied to the gap based on
the gap value.
While the invention herein disclosed has been described by means of
specific embodiments and applications thereof, numerous
modifications and variations could be made thereto by those skilled
in the art without departing from the scope of the invention set
forth in the claims.
* * * * *
References