U.S. patent application number 15/633328 was filed with the patent office on 2018-01-11 for modification of post-viewing parameters for digital images using image region or feature information.
The applicant listed for this patent is FotoNation Limited. Invention is credited to Petronel BIGIOI, Peter CORCORAN, Yury PRILUTSKY, Eran STEINBERG.
Application Number | 20180013950 15/633328 |
Document ID | / |
Family ID | 55404908 |
Filed Date | 2018-01-11 |
United States Patent
Application |
20180013950 |
Kind Code |
A1 |
STEINBERG; Eran ; et
al. |
January 11, 2018 |
Modification of post-viewing parameters for digital images using
image region or feature information
Abstract
A method of generating one or more new digital images using an
original digitally-acquired image including a selected image
feature includes identifying within a digital image acquisition
device one or more groups of pixels that correspond to the selected
image feature based on information from one or more preview images.
A portion of the original image is selected that includes the one
or more groups of pixels. The technique includes automatically
generating values of pixels of one or more new images based on the
selected portion in a manner which includes the selected image
feature within the one or more new images.
Inventors: |
STEINBERG; Eran; (San
Francisco, CA) ; BIGIOI; Petronel; (Galway, IE)
; PRILUTSKY; Yury; (San Mateo, CA) ; CORCORAN;
Peter; (Claregalway, IE) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
FotoNation Limited |
Galway |
|
IE |
|
|
Family ID: |
55404908 |
Appl. No.: |
15/633328 |
Filed: |
June 26, 2017 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
14846390 |
Sep 4, 2015 |
9692964 |
|
|
15633328 |
|
|
|
|
12140950 |
Jun 17, 2008 |
9129381 |
|
|
14846390 |
|
|
|
|
10608784 |
Jun 26, 2003 |
8948468 |
|
|
12140950 |
|
|
|
|
60945558 |
Jun 21, 2007 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
H04N 5/23218 20180801;
H04N 9/64 20130101; H04N 5/232945 20180801; H04N 9/045 20130101;
G06K 9/00268 20130101; G06K 9/3233 20130101; G06T 2210/22 20130101;
G06T 2207/30201 20130101; G06K 9/00228 20130101; H04N 5/232933
20180801; H04N 5/23229 20130101; G06T 11/60 20130101; H04N 5/232127
20180801; G06T 5/008 20130101; G06T 7/194 20170101 |
International
Class: |
H04N 5/232 20060101
H04N005/232; H04N 9/04 20060101 H04N009/04; G06K 9/32 20060101
G06K009/32; G06K 9/00 20060101 G06K009/00; G06T 5/00 20060101
G06T005/00; H04N 9/64 20060101 H04N009/64; G06T 11/60 20060101
G06T011/60 |
Foreign Application Data
Date |
Code |
Application Number |
Jun 2, 2006 |
US |
PCT/US06/21393 |
Claims
1. A method of generating one or more new digital images using an
original digitally-acquired image including a selected image
feature, comprising: (a) identifying within a digital image
acquisition device one or more groups of pixels that correspond to
a selected image feature within an original digitally-acquired
image based on information from one or more preview images; (b)
selecting a portion of the original image that includes the one or
more groups of pixels; and (c) automatically generating values of
pixels of one or more new images based on the selected portion in a
manner which includes the selected image feature within the one or
more new images.
2. The method of claim 1, wherein said transformation is different
between the selected portion and remaining portions of the
image.
3. The method of claim 1, wherein said selected image feature
comprises one or more faces.
4. The method of claim 1, wherein said selected image feature
comprises a human subject.
5. The method of claim 1, wherein the selected image feature
comprises a foreground region or a background region.
6. The method of claim 5, wherein the foreground region is
determined by detection of a face.
7. The method of claim 5, wherein the foreground region is
determined by defining said selected image feature within an
original image to be local sharpness, relative exposure, local
color clustering, or local saturation, or combinations thereof.
8. The method of claim 5, wherein the determining of the foreground
region by defining said selected image feature within an original
image comprises determining a depth of focus.
9. The method of claim 5, further comprising visually separating
the foreground region and the background region within the one or
more new images.
10. The method of claim 9, further comprising calculating a depth
map of the background region.
11. The method of claim 5, further comprising independently
processing the foreground region or the background region, or
both.
12. A method as recited in claim 1, wherein said identifying one or
more groups of pixels within a digital acquisition device is
performed using nonimage data.
13. A method as recited in claim 12, wherein said non-image data
comprises one or more acquisition parameters.
14. A method as recited in claim 1, wherein said generating values
of pixels of one or more new images is performed within a digital
acquisition device.
15. A method as recited in claim 1, wherein said generating values
of pixels of one or more new images is performed by an external
device to said digital acquisition device.
16. A method of generating one or more new digital images using an
original digitally-acquired image including a background region or
a foreground region, or both, comprising: (a) identifying within a
digital image acquisition device one or more groups of pixels that
correspond to a background region or a foreground region, or both,
within an original digitally-acquired image based on information
from one or more preview images; (b) selecting a portion of the
original image that includes the one or more groups of pixels; and
(c) automatically generating values of pixels of one or more new
images based on the selected portion in a manner which includes the
background region or the foreground region, or both, within each of
the one or more new images.
17. The method of claim 16, further comprising separating the
foreground region and the background region within the original
image or the one or more new images or combinations thereof.
18. The method of claim 17, further comprising calculating a depth
map of the background region or the foreground region or both.
19. The method of claim 16, further comprising independently
processing the foreground region or the background region, or
both.
20. The method of claim 19, wherein at least one of said new images
comprises an independently processed background region or
foreground region or both.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. provisional patent
application No. 60/945,558, filed Jun. 21, 2007, entitled Digital
Image Enhancement with Reference Images.
[0002] This application is also a CIP of United States patent
application no. PCT/US2006/021393, which is a CIP of U.S. patent
application Ser. No. 10/608,784, filed Jun. 26, 2003, which is one
of a series of contemporaneously-filed patent applications
including Atty docket 2100874-991210 (FN102-A) entitled, "Digital
Image Processing Using Face Detection Information", by inventors
Eran Steinberg, Yuri Prilutsky, Peter Corcoran, and Petronel
Bigioi; Atty docket 2100874-991220 (FN102-B) entitled, "Perfecting
of Digital Image Capture Parameters Within Acquisition Devices
Using Face Detection", by inventors Eran Steinberg, Yuri Prilutsky,
Peter Corcoran, and Petronel Bigioi; Atty docket 2100874-991230
(FN102-C) entitled, "Perfecting the Optics Within a Digital Image
Acquisition Device Using Face Detection", by inventors Eran
Steinberg, Yuri Prilutsky, Peter Corcoran, and Petronel Bigioi;
Atty docket 2100874-991240 (FN102-D) entitled, "Perfecting the
Effect of Flash Within an Image Acquisition Device Using Face
Detection", by inventors Eran Steinberg, Yuri Prilutsky, Peter
Corcoran, and Petronel Bigioi; Atty docket 2100874-991250 (FN102-E)
entitled, "A Method of Improving Orientation and Color Balance of
Digital Images Using Face Detection Information", by inventors Eran
Steinberg, Yuri Prilutsky, Peter Corcoran, and Petronel Bigioi;
Atty docket 2100874-991260 (FN102-F) entitled, "Modification of
Viewing Parameters for Digital Images Using Face Detection
Information", by inventors Eran Steinberg, Yuri Prilutsky, Peter
Corcoran, and Petronel Bigioi; Atty docket 2100874-991270 (FN102-G)
entitled, "Digital Image Processing Composition Using Face
Detection Information", by inventor Eran Steinberg; Atty docket
2100874-991280 (FN102-H) entitled, "Digital Image Adjustable
Compression and Resolution Using Face Detection Information" by
inventors Eran Steinberg, Yuri Prilutsky, Peter Corcoran, and
Petronel Bigioi; and Atty docket 2100874-991290 (FN102-I) entitled,
"Perfecting of Digital Image Rendering Parameters Within Rendering
Devices Using Face Detection" by inventors Eran Steinberg, Yuri
Prilutsky, Peter Corcoran, and Petronel Bigioi.
[0003] This application is related to U.S. patent application Ser.
No. 11/573,713, filed Feb. 14, 2007, which claims priority to U.S.
provisional patent application No. 60/773,714, filed Feb. 14, 2006,
and to PCT application no. PCT/EP2006/008229, filed Aug. 15, 2006
(FN-119).
[0004] This application also is related to Ser. No.11/024,046,
filed Dec. 27, 2004, which is a CIP of U.S. patent application Ser.
No. 10/608,772, filed Jun. 26,2003 (fn-102e-cip)
[0005] This application also is related to PCT/US2006/021393, filed
Jun. 2, 2006, which is a CIP of Ser. No. 10/608,784, filed Jun. 26,
2003 (fn-102f-cip-pct).
[0006] This application also is related to U.S. application Ser.
No. 10/985,657, filed Nov. 10, 2004 (FN-109A).
[0007] This application also is related to U.S. application Ser.
No. 11/462,035, filed Aug. 2, 2006, which is a CIP of U.S.
application Ser. No. 11/282,954, filed Nov. 18, 2005
(FN-121-CIP).
[0008] This application also is related to Ser. No. 11/233,513,
filed Sep. 21, 2005, which is a CIP of U.S. application Ser. No.
11/182,718, filed Jul. 15, 2005, which is a CIP of U.S. application
Ser. No. 11/123,971, filed May 6, 2005 and which is a CIP of U.S.
application Ser. No. 10/976,366, filed Oct. 28, 2004
(FN-106-CIP-2).
[0009] This application also is related to U.S. patent application
Ser. No. 11/460,218, filed Jul. 26, 2006, which claims priority to
U.S. provisional patent application Ser. No. 60/776,338, filed Feb.
24, 2006 (FN-149a).
[0010] This application also is related to U.S. patent application
Ser. No. 12/063,089, filed Feb. 6, 2008, which is a CIP of U.S.
Ser. No. 11/766,674, filed Jun. 21, 2007, which is a CIP of U.S.
Ser. No. 11/753,397, which is a CIP of U.S. Ser. No. 11/765,212,
filed Aug. 11, 2006, now U.S. Pat. No. 7,315,631
(FN-143-CIP-3).
[0011] This application also is related to U.S. patent application
Ser. No. 11/674,650, filed Feb. 13, 2007, which claims priority to
U.S. provisional patent application Ser. No. 60/773, 714, filed
Feb. 14, 2006 (FN-144).
[0012] This application is related to U.S. Ser. No. 11/836,744,
filed Aug. 9, 2007, which claims priority to U.S. provisional
patent application Ser. No. 60/821,956, filed Aug. 9, 2006
(FN-178A).
[0013] This application is related to a family of applications
filed contemporaneously by the same inventors, including an
application entitled DIGITAL IMAGE ENHANCEMENT WITH REFERENCE
IMAGES (Docket FN-211A), and another entitled METHOD OF GATHERING
VISUAL META DATA USING A REFERENCE IMAGE (Docket: FN-211B), and
another entitled IMAGE CAPTURE DEVICE WITH CONPEMPORANEOUS
REFERENCE IMAGE CAPTURE MECHANISM (Docket: FN-211C), and another
entitled FOREGROUND/BACKGROUND SEPARATION USING REFERENCE IMAGES
(Docket: FN-211D), and another entitled REAL-TIME FACE TRACKING
WITH REFERENCE IMAGES (Docket: FN-211F) and another entitled METHOD
AND APPARATUS FOR RED-EYE DETECTION USING PREVIEW OR OTHER
REFERENCE IMAGES (Docket: FN-211G).
[0014] All of these priority and related applications, and all
references cited below, are hereby incorporated by reference.
BACKGROUND
1. Field of the Invention
[0015] The invention relates to digital image processing and
viewing, particularly automatic suggesting or processing of
enhancements of a digital image using information gained from
identifying and analyzing regions within an image or features
appearing within the image, particularly for creating post
acquisition slide shows. The invention provides automated image
analysis and processing methods and tools for photographs taken
and/or images detected, acquired or captured in digital form or
converted to digital form, or rendered from digital form to a soft
or hard copy medium by using information about the regions or
features in the photographs and/or images.
2. Description of the Related art
[0016] This invention relates to finding and defining regions of
interest (ROI) in an acquired image. In many cases the interest
relates to items in the foreground of an image. In addition, and
particularly for consumer photography, the ROI relates to human
subjects and in particular, faces.
[0017] Although well-known, the problem of face detection has not
received a great deal of attention from researchers. Most
conventional techniques concentrate on face recognition, assuming
that a region of an image containing a single face has already been
extracted and will be provided as an input. Such techniques are
unable to detect faces against complex backgrounds or when there
are multiple occurrences in an image. For all of the image
enhancement techniques introduced below and others as may be
described herein or understood by those skilled in the art, it is
desired to make use of the data obtained from face detection
processes for suggesting options for improving digital images or
for automatically improving or enhancing quality of digital
images.
[0018] Yang et al., IEEE Transactions on Pattern Analysis and
Machine Intelligence, Vol. 24, No. 1, pages 34-58, give a useful
and comprehensive review of face detection techniques January 2002.
These authors discuss various methods of face detection which may
be divided into four main categories: (i) knowledge-based methods;
(ii) feature-invariant approaches, including the identification of
facial features, texture and skin color; (iii) template matching
methods, both fixed and deformable and (iv) appearance based
methods, including eigenface techniques, statistical distribution
based methods and neural network approaches. They also discuss a
number of the main applications for face detections technology. It
is recognized in the present invention that none of this prior art
describes or suggests using detection and knowledge of faces in
images to create and/or use tools for the enhancement or correction
of the images.
[0019] a. Faces as Subject Matter
[0020] Human faces may well be by far the most photographed subject
matter for the amateur and professional photographer. In addition,
the human visual system is very sensitive to faces in terms of skin
tone colors. Also, in experiments performed by tracking the eye
movement of the subjects, with an image that includes a human
being, subjects tend to focus first and foremost on the face and in
particular the eyes, and only later search the image around the
figure. By default, when a picture includes a human figure and in
particular a face, the face becomes the main object of the image.
Thus, many artists and art teachers emphasize the location of the
human figure and the face in particular to be an important part of
a pleasing composition. For example, some teach to position faces
around the "Golden Ratio", also known as the "divine proportion" in
the Renaissance period, or PHI, .phi.-lines. Some famous artists
whose work repeatedly depict this composition are Leonardo
Da-Vinci, Georges Seurat and Salvador Dali.
[0021] In addition, the faces themselves, not just the location of
the faces in an image, have similar "divine proportion"
characteristics. The head forms a golden rectangle with the eyes at
its midpoint; the mouth and nose are each placed at golden sections
of distance between the eyes and the bottom on the chin etc.
etc.
[0022] b. Color and Exposure of Faces
[0023] While the human visual system is tolerant to shifts in color
balance, the human skin tone is one area where the tolerance is
somewhat limited and is accepted primarily only around the
luminance axis, which is a main varying factor between skin tones
of faces of people of different races or ethnic backgrounds. A
knowledge of faces can provide an important advantage in methods of
suggesting or automatically correcting an overall color balance of
an image, as well as providing pleasing images after
correction.
[0024] c. Auto Focus
[0025] Auto focusing is a popular feature among professional and
amateur photographers alike. There are various ways to determine a
region of focus. Some cameras use a center-weighted approach, while
others allow the user to manually select the region. In most cases,
it is the intention of the photographer to focus on the faces
photographed, regardless of their location in the image. Other more
sophisticated techniques include an attempt to guess the important
regions of the image by determining the exact location where the
photographer's eye is looking. It is desired to provide
advantageous auto focus techniques which can focus on what is
considered the important subject in the image
[0026] d. Fill-Flash
[0027] Another useful feature particularly for the amateur
photographer is fill-flash mode. In this mode, objects close to the
camera may receive a boost in their exposure using artificial light
such as a flash, while far away objects which are not effected by
the flash are exposed using available light. It is desired to have
an advantageous technique which automatically provides image
enhancements or suggested options using fill flash to add light to
faces in the foreground which are in the shadow or shot with back
light.
[0028] e. Orientation
[0029] The camera can be held horizontally or vertically when the
picture is taken, creating what is referred to as a landscape mode
or portrait mode, respectively. When viewing images, it is
preferable to determine ahead of time the orientation of the camera
at acquisition, thus eliminating a step of rotating the image and
automatically orienting the image. The system may try to determine
if the image was shot horizontally, which is also referred to as
landscape format, where the width is larger than the height of an
image, or vertically, also referred to as portrait mode, where the
height of the image is larger than the width. Techniques may be
used to determine an orientation of an image. Primarily these
techniques include either recording the camera orientation at an
acquisition time using an in camera mechanical indicator or
attempting to analyze image content post-acquisition. In-camera
methods, although providing precision, use additional hardware and
sometimes movable hardware components which can increase the price
of the camera and add a potential maintenance challenge. However,
post-acquisition analysis may not generally provide sufficient
precision. Knowledge of location, size and orientation of faces in
a photograph, a computerized system can offer powerful automatic
tools to enhance and correct such images or to provide options for
enhancing and correcting images.
[0030] f. Color Correction
[0031] Automatic color correction can involve adding or removing a
color cast to or from an image. Such cast can be created for many
reasons including the film or CCD being calibrated to one light
source, such as daylight, while the lighting condition at the time
of image detection may be different, for example, cool-white
fluorescent. In this example, an image can tend to have a greenish
cast that it will be desired to be removed. It is desired to have
automatically generated or suggested color correction techniques
for use with digital image enhancement processing.
[0032] g. Cropping
[0033] Automatic cropping may be performed on an image to create a
more pleasing composition of an image. It is desired to have
automatic image processing techniques for generating or suggesting
more balanced image compositions using cropping.
[0034] h. Rendering
[0035] When an image is being rendered for printing or display, it
undergoes operation as color conversion, contrast enhancement,
cropping and/or resizing to accommodate the physical
characteristics of the rendering device. Such characteristic may be
a limited color gamut, a restricted aspect ratio, a restricted
display orientation, fixed contrast ratio, etc. It is desired to
have automatic image processing techniques for improving the
rendering of images.
[0036] i. Compression and resolution
[0037] An image can be locally compressed in accordance with a
preferred embodiment herein, so that specific regions may have a
higher quality compression which involves a lower compression rate.
It is desired to have an advantageous technique for determining
and/or selecting regions of importance that may be maintained with
low compression or high resolution compared with regions determined
and/or selected to have less importance in the image.
SUMMARY OF THE INVENTION
[0038] A method of generating one or more new digital images, or
generating a progression or sequence of related images in a form of
a movie clip, using an original digitally-acquired image including
a selected image feature is provided. The method includes
identifying within a digital image acquisition device one or more
groups of pixels that correspond to a selected image feature, or
image region within an original digitally-acquired image based on
information from one or more preview or other reference images. A
portion of the original image is selected that includes the one or
more groups of pixels segmented spatially or by value. Values of
pixels of one or more new images are automatically generated based
on the selected portion in a manner which includes the selected
image feature within the one or more new images.
[0039] The selected image feature may include a segmentation of the
image to two portions, e.g., a foreground region and a background
region, and the method may include visually separating the
foreground region and the background region within the one or more
new images. The visual encoding of such separation may be done
gradually, thereby creating a movie-like effect.
[0040] The method may also include calculating a depth map of the
background region. The foreground and background regions may be
independently processed. One or more of the new images may include
an independently processed background region or foreground region
or both. The independent processing may include gradual or
continuous change between an original state and a final state using
one of or any combination of the following effects: focusing,
saturating, pixilating, sharpening, zooming, panning, tilting,
geometrically distorting, cropping, exposing or combinations
thereof. The method may also include determining a relevance or
importance, or both, of the foreground region or the background
region, or both.
[0041] The method may also include identifying one or more groups
of pixels that correspond to two or more selected image features
within the original digitally-acquired image. The automatic
generating of pixel values may be in a manner which includes at
least one of the two or more selected image features within the one
or more new images or a panning intermediate image between two of
the selected image features, or a combination thereof.
[0042] The method may also include automatically providing an
option for generating the values of pixels of one or more new
images based on the selected portion in a manner which includes the
selected image feature within each of the one or more new
images.
[0043] A method of generating one or more new digital images using
an original digitally-acquired image including separating
background and foreground regions is provided. The method includes
identifying within a digital image acquisition device one or more
groups of pixels that correspond to a background region or a
foreground region, or both, within an original digitally-acquired
image based on information from one or more preview or other
reference images. The foreground portion may be based on the
identification of well known objects such as faces, human bodies,
animals and in particular pets. Alternatively, the foreground
portion may be determined based on a pixel analysis with
information such as chroma, overall exposure and local sharpness.
Segmentations based on local analysis of the content or the values
may be alternatively performed as understood by those skilled in
the art of image segmentation. A portion of the original image is
selected that includes the one or more groups of pixels. Values of
pixels of one or more new images are automatically generated based
on the selected portion in a manner which includes the background
region or the foreground region, or both. The method may also
include calculating a depth map of the background region. The
foreground and background regions may be independently processed
for generating new images.
[0044] The present invention and/or preferred or alternative
embodiments thereof can be advantageously combined with features of
parent U.S. patent application Ser. No. 10/608,784, including a
method of generating one or more new digital images, as well as a
continuous sequence of images, using an original digitally-acquired
image including a face, and preferably based on one or more preview
or other reference images. A group of pixels that correspond to a
face within the original digitally-acquired image is identified. A
portion of the original image is selected to include the group of
pixels. Values of pixels of one or more new images based on the
selected portion are automatically generated, or an option to
generate them is provided, in a manner which always includes the
face within the one or more new images.
[0045] A transformation may be gradually displayed between the
original digitally-acquired image and one or more new images.
Parameters of said transformation may be adjusted between the
original digitally-acquired image and one or more new images.
Parameters of the transformation between the original
digitally-acquired image and one or more new images may be selected
from a set of at least one or more criteria including timing or
blending or a combination thereof. The blending may vary between
the various segmented regions of an image, and can include
dissolving, flying, swirling, appearing, flashing, or screening, or
combinations thereof.
[0046] Methods of generating slide shows that use an image
including a face are provided in accordance with the generation of
one or more new images. A group of pixels is identified that
correspond to a face within a digitally-acquired image based on
information from one or more preview or other reference images. A
zoom portion of the image including the group of pixels may be
determined. The image may be automatically zoomed to generate a
zoomed image including the face enlarged by the zooming, or an
option to generate the zoomed image may be provided. A center point
of zooming in or out and an amount of zooming in or out may be
determined after which another image may be automatically generated
including a zoomed version of the face, or an option to generate
the image including the zoomed version of the face may be provided.
One or more new images may be generated each including a new group
of pixels corresponding to the face, automatic panning may be
provided using the one or more new images.
[0047] A method of generating one or more new digital images using
an original digitally-acquired image including a face is further
provided. One or more groups of pixels may be identified that
correspond to two or more faces within the original
digitally-acquired image based on information from one or more
preview or other reference images. A portion of the original image
may be selected to include the group of pixels. Values of pixels
may be automatically generated of one or more new images based on
the selected portion in a manner which always includes at least one
of the two or more faces within the one or more new images or a
panning intermediate image between two of the faces of said two or
more identified faces or a combination thereof.
[0048] Panning may be performed between the two or more identified
faces. The panning may be from a first face to a second face of the
two or more identified faces, and the second face may then be
zoomed. The first face may be de-zoomed prior to panning to the
second face. The second face may also be zoomed. The panning may
include identifying a panning direction parameter between two of
the identified faces. The panning may include sequencing along the
identified panning direction between the two identified faces
according to the identified panning direction parameter.
[0049] A method of generating a simulated camera movement in a
still image using an original digitally-acquired image including a
face or other image feature is further provided. Simulated camera
movements such as panning, tilting and zooming may be determined
based on the orientation of the face or multiple faces or other
features in an image to simulate the direction of the face and in
particular the eyes. Such movement may then simulate the direction
the photographed subject is looking at. Such method may be extended
to two or more identified faces, or as indicated other image
features.
[0050] Each of the methods provided are preferably implemented
within software and/or firmware either in the camera or with
external processing equipment. The software may also be downloaded
into the camera or image processing equipment. In this sense, one
or more processor readable storage devices having processor
readable code embodied thereon are provided. The processor readable
code programs one or more processors to perform any of the above or
below described methods.
BRIEF DESCRIPTION OF THE DRAWINGS
[0051] FIG. 1a illustrates a preferred embodiment of the main
workflow of correcting images based on finding faces in the
images.
[0052] FIG. 1b illustrates a generic workflow of utilizing face
information in an image to adjust image acquisition parameters in
accordance with a preferred embodiment.
[0053] FIG. 1c illustrates a generic workflow of utilizing face
information in a single or a plurality of images to adjust the
image rendering parameters prior to outputting the image in
accordance with a preferred embodiment.
[0054] FIGS. 2a-2e illustrate image orientation based on
orientation of faces in accordance with one or more preferred
embodiments.
[0055] FIGS. 3a-3d illustrate an automatic composition and cropping
of an image based on the location of the face in accordance with
one or more preferred embodiments.
[0056] FIGS. 4a-4g illustrate digital fill-flash in accordance with
one or more preferred embodiments.
[0057] FIG. 4h describes an illustrative system in accordance with
a preferred embodiment to determine in the camera as part of the
acquisition process, whether fill flash is needed, and of so,
activate such flash when acquiring the image based on the exposure
on the face
[0058] FIG. 5 illustrates the use of face-detection for generating
dynamic slide shows, by applying automated and suggested zooming
and panning functionality where the decision as to the center of
the zoom is based on the detection of faces in the image.
[0059] FIG. 6 describes an illustrative simulation of a viewfinder
in a video camera or a digital camera with video capability, with
an automatic zooming and tracking of a face as part of the live
acquisition in a video camera, in accordance with a preferred
embodiment.
[0060] FIGS. 7a and 7b illustrate an automatic focusing capability
in the camera as part of the acquisition process based on the
detection of a face in accordance with one or more preferred
embodiments.
[0061] FIG. 8 illustrates an adjustable compression rate based on
the location of faces in the image in accordance with a preferred
embodiment.
INCORPORATION BY REFERENCE
[0062] What follows is a cite list of references each of which is,
in addition to that which is described as background, the invention
summary, the abstract, the brief description of the drawings and
the drawings themselves, hereby incorporated by reference into the
detailed description of the preferred embodiments below, as
disclosing alternative embodiments of elements or features of the
preferred embodiments not otherwise set forth in detail below. A
single one or a combination of two or more of these references may
be consulted to obtain a variation of the preferred embodiments
described in the detailed description herein:
[0063] U.S. Pat. Nos. RE33682, RE31370, 4,047,187, 4,317,991,
4,367,027, 4,638,364, 5,291,234, 5,432,863, 5,488,429, 5,638,136,
5,710,833, 5,724,456, 5,751,836, 5,781,650, 5,812,193, 5,818,975,
5,835,616, 5,870,138, 5,978,519, 5,991,456, 6,097,470, 6,101,271,
6,128,397, 6,134,339, 6,148,092, 6,151,073, 6,188,777, 6,192,149,
6,249,315, 6,263,113, 6,268,939, 6,278,491, 6,282,317, 6,301,370,
6,332,033, 6,393,148, 6,404,900, 6,407,777, 6,421,468, 6,438,264,
6,456,732, 6,459,436, 6,473,199, 6,501,857, 6,504,942, 6,504,951,
6,516,154, and 6,526,161;
[0064] United States published patent applications no.
2005/0041121, 2004/0114796, 2004/0240747, 2004/0184670,
2003/0071908, 2003/0052991, 2003/0044070, 2003/0025812,
2002/0172419, 2002/0136450, 2002/0114535, 2002/0105662, and
2001/0031142;
[0065] Published PCT applications no. WO 03/071484 and WO
02/045003
[0066] European patent application no EP 1 429 290 A;
[0067] Japanese patent application no. JP5260360A2;
[0068] British patent application no. GB0031423.7;
[0069] Yang et al., IEEE Transactions on Pattern Analysis and
Machine Intelligence, Vol. 24, no. 1, pp 34-58 (Jan. 2002);
[0070] Baluja & Rowley , "Neural Network-Based Face Detection,"
IEEE Transactions on Pattern Analysis and Machine Intelligence,
Vol. 20, No. 1, pages 23-28, January 1998; and
[0071] Joffe, S. Ed, Institute of Electrical and Electronics
Engineering, Red Eye Detection with Machine Learning, Proceedings
2003 International Conference of Image Processing. ICIP-2003.
Barcelona, Spain, Sep. 14-17, 2003, New York, N.Y.: IEEE, US, vol.
2 or 3, 14 September 2003, pages 871-874.
ILLUSTRATIVE DEFINITIONS
[0072] "Face Detection" involves the art of isolating and detecting
faces in a digital image; Face Detection includes a process of
determining whether a human face is present in an input image, and
may include or is preferably used in combination with determining a
position and/or other features, properties, parameters or values of
parameters of the face within the input image;
[0073] "Image-enhancement" or "image correction" involves the art
of modifying a digital image to improve its quality; such
modifications may be "global" applied to the entire image, or
"selective" when applied differently to different portions of the
image. Some main categories non-exhaustively include: (i) Contrast
Normalization and Image Sharpening. [0074] (ii) Image Crop, Zoom
and Rotate. [0075] (iii) Image Color Adjustment and Tone Scaling.
[0076] (iv) Exposure Adjustment and Digital Fill Flash applied to a
Digital Image. [0077] (v) Brightness Adjustment with Color Space
Matching; and Auto-Gamma determination with Image Enhancement.
[0078] (vi) Input/Output device characterizations to determine
Automatic/Batch Image Enhancements. [0079] (vii) In-Camera Image
Enhancement [0080] (viii) Face Based Image Enhancement
[0081] "Auto-focusing" involves the ability to automatically detect
and bring a photographed object into the focus field;
[0082] "Fill Flash" involves a method of combining available light,
such as sun light with another light source such as a camera flash
unit in such a manner that the objects close to the camera, which
may be in the shadow, will get additional exposure using the flash
unit.
[0083] A "pixel" is a picture element or a basic unit of the
composition of a digital image or any of the small discrete
elements that together constitute an image;
[0084] "Digitally-Captured Image" includes an image that is
digitally located and held in a detector;
[0085] "Digitally-Acquired Image" includes an image that is
digitally recorded in a permanent file and/or preserved in a more
or less permanent digital form; and
[0086] "Digitally-Detected Image": an image comprising digitally
detected electromagnetic waves.
[0087] "Digital Rendering Device": A digital device that renders
digital encoded information such as pixels onto a different device.
Most common rendering techniques include the conversion of digital
data into hard copy such as printers, and in particular laser
printers, ink jet printers or thermal printers, or soft copy
devices such as monitors, television, liquid crystal display, LEDs,
OLED, etc.
[0088] `Simulated camera movement" is defined as follows: given an
image of a certain dimension (e.g. M.times.N) , a window which is a
partial image is created out of the original image (of smaller
dimension to the original image). By moving this window around the
image, a simulated camera movement is generated. The movement can
be horizontal, also referred to as "panning", vertical also
referred to as "tilt", or orthogonal to the image plane also
referred to as "zooming, or a combination thereof. The simulated
camera movement may also include the gradual selection of
non-rectangular window, e.g., in the shape of a trapezoid, or
changing rectangular dimensions, which can simulate changes in the
perspective to simulate physical movement of the camera also
referred to as "dolly". Thus, simulated camera movement can include
any geometrical distortion and may create a foreshortening effect
based on the location of the foreground and the background relative
to the camera.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0089] Several embodiments are described herein that use
information obtained from reference images for processing a main
image. That is, the data that are used to process the main image
come at least not solely from the image itself, but instead or also
from one or more separate "reference" images.
Reference Image
[0090] Reference images provide supplemental meta data, and in
particular supplemental visual data to an acquired image, or main
image. The reference image can be a single instance, or in general,
a collection of one or more images varying from each other. The
so-defined reference image(s) provides additional information that
may not be available as part of the main image.
[0091] Example of a spatial collection may be multiple sensors all
located in different positions relative to each other. Example of
temporal distribution can be a video stream.
[0092] The reference image differs from the main captured image,
and the multiple reference images differ from each other in various
potential manners which can be based on one or combination of
permutations in time (temporal), position (spatial), optical
characteristics, resolution, and spectral response, among other
parameters.
[0093] One example is temporal disparity. In this case, the
reference image is captured before and/or after the main captured
image, and preferably just before and/or just after the main image.
Examples may include preview video, a pre-exposed image, and a
post-exposed image. In certain embodiments, such reference image
uses the same optical system as the acquired image, while in other
embodiments, wholly different optical systems or optical systems
that use one or more different optical components such as a lens,
an optical detector and/or a program component.
[0094] Alternatively, a reference image may differ in the location
of secondary sensor or sensors, thus providing spatial disparity.
The images may be taken simultaneously or proximate to or in
temporal overlap with a main image. In this case, the reference
image may be captured using a separate sensor located away from the
main image sensor. The system may use a separate optical system, or
via some splitting of a single optical system into a plurality of
sensors or a plurality of sub-pixels of a same sensor. As digital
optical systems become smaller dual or multi sensor capture devices
will become more ubiquitous. Some added registration and/or
calibration may be typically involved when two optical systems are
used.
[0095] Alternatively, one or more reference images may also be
captured using different spectral responses and/or exposure
settings. One example includes an infra red sensor to supplement a
normal sensor or a sensor that is calibrated to enhance specific
ranges of the spectral response such as skin tone, highlights or
shadows.
[0096] Alternatively, one or more reference images may also be
captured using different capture parameters such as exposure time,
dynamic range, contrast, sharpness, color balance, white balance or
combinations thereof based on any image parameters the camera can
manipulate.
[0097] Alternatively, one or more reference images may also be
captured using a secondary optical system with a differing focal
length, depth of field, depth of focus, exit pupil, entry pupil,
aperture, or lens coating, or combinations thereof based on any
optical parameters of a designed lens.
[0098] Alternatively, one or more reference images may also capture
a portion of the final image in conjunction with other
differentials. Such example may include capturing a reference image
that includes only the center of the final image, or capturing only
the region of faces from the final image. This allows saving
capture time and space while keeping as reference important
information that may be useful at a later stage.
[0099] Reference images may also be captured using varying
attributes as defined herein of nominally the same scene recorded
onto different parts of a same physical sensor. As an example, one
optical subsystem focuses the scene image onto a small area of the
sensor, while a second optical subsystem focuses the scene image,
e.g., the main image, onto a much larger area of the sensor. This
has the advantage that it involves only one sensor and one
post-processing section, although the two independently acquired
scene images will be processed separately, i.e., by accessing the
different parts of the sensor array. This approach has another
advantage, which is that a preview optical system may be configured
so it can change its focal point slightly, and during a capture
process, a sequence of preview images may be captured by moving an
optical focus to different parts of the sensor. Thus, multiple
preview images may be captured while a single main image is
captured. An advantageous application of this embodiment would be
motion analysis.
[0100] Getting data from a reference image in a preview or postview
process is akin to obtaining meta data rather than the
image-processing that is performed using the meta data. That is,
the data used for processing a main image, e.g., to enhance its
quality, is gathered from one or more preview or postview images,
while the primary source of image data is contained within the main
image itself. This preview or postview information can be useful as
clues for capturing and/or processing the main image, whether it is
desired to perform red-eye detection and correction, face tracking,
motion blur processing, dust artefact correction, illumination or
resolution enhancement, image quality determination,
foreground/background segmentation, and/or another image
enhancement processing technique. The reference image or images may
be saved as part of the image header for post processing in the
capture device, or alternatively after the data is transferred on
to an external computation device. In some cases, the reference
image may only be used if the post processing software determines
that there is missing data, damaged data or need to replace
portions of the data.
[0101] In order to maintain storage and computation efficiency, the
reference image may also be saved as a differential of the final
image. Example may include a differential compression or removal of
all portions that are identical or that can be extracted from the
final image.
Correcting Eye Defects
[0102] In one example involving red-eye correction, a face
detection process may first find faces, find eyes in a face, and
check if the pupils are red, and if red pupils are found, then the
red color pupils are corrected, e.g., by changing their color to
black. Another red-eye process may involve first finding red in a
digital image, checking whether the red pixels are contained in a
face, and checking whether the red pixels are in the pupil of an
eye. Depending on the quality of face detection available, one or
the other of these may be preferred. Either of these may be
performed using one or more preview or postview images, or
otherwise using a reference image, rather than or in combination
with, checking the main image itself. A red-eye filter may be based
on use of acquired preview, postview or other reference image or
images, and can determine whether a region may have been red prior
to applying a flash.
[0103] Another known problem involves involuntary blinking In this
case, the post processing may determine that the subject's eyes
were closed or semi closed. If there exists a reference image that
was captured time-wise either a fraction of a second before or
after such blinking, the region of the eyes from the reference
image can replace the blinking eye portion of the final image.
[0104] In some cases as defined above, the camera may store as the
reference image only high resolution data of the Region of Interest
(ROI) that includes the eye locations to offer such retouching.
Face Tools
[0105] Multiple reference images may be used, for example, in a
face detection process, e.g., a selected group of preview images
may be used. By having multiple images to choose from, the process
is more likely to have a more optimal reference image to operate
with. In addition, a face tracking process generally utilizes two
or more images anyway, beginning with the detection of a face in at
least one of the images. This provides an enhanced sense of
confidence that the process provides accurate face detection and
location results.
[0106] Moreover, a perfect image of a face may be captured in a
reference image, while a main image may include an occluded profile
or some other less than optimal feature. By using the reference
image, the person whose profile is occluded may be identified and
even have her head rotated and unblocked using reference image data
before or after taking the picture. This can involve upsampling and
aligning a portion of the reference image, or just using
information as to color, shape, luminance, etc., determined from
the reference image. A correct exposure on a region of interest or
ROI may be extrapolated using the reference image. The reference
image may include a lower resolution or even subsampled resolution
version of the main image or another image of substantially a same
scene as the main image.
[0107] Meta data that is extracted from one or more reference
images may be advantageously used in processes involving face
detection, face tracking, red-eye, dust or other unwanted image
artefact detection and/or correction, or other image quality
assessment and/or enhancement process. In this way, meta data,
e.g., coordinates and/or other characteristics of detected faces,
may be derived from one or more reference images and used for main
image quality enhancement without actually looking for faces in the
main image.
[0108] A reference image may also be used to include multiple
emotions of a single subject into a single object. Such emotions
may be used to create more comprehensive data of the person, such
as smile, frown, wink, and/or blink. Alternatively, such data may
also be used to post process editing where the various emotions can
be cut-and-pasted to replace between the captured and the reference
image. An example may include switching between a smile to a
sincere look based on the same image.
[0109] Finally, the reference image may be used for creating a
three-dimensional representation of the image which can allow
rotating subjects or the creation of three dimensional
representations of the scene such as holographic imaging or
lenticular imaging.
Motion Correction
[0110] A reference image may include an image that differs from a
main image in that it may have been captured at a different time
before or after the main image. The reference image may have
spatial differences such as movements of a subject or other object
in a scene, and/or there may be a global movement of the camera
itself. The reference image may, preferably in many cases, have
lower resolution than the main image, thus saving valuable
processing time, bytes, bitrate and/or memory, and there may be
applications wherein a higher resolution reference image may be
useful, and reference images may have a same resolution as the main
image. The reference image may differ from the main image in a
planar sense, e.g., the reference image can be infrared or Gray
Scale, or include a two bit per color scheme, while the main image
may be a full color image. Other parameters may differ such as
illumination, while generally the reference image, to be useful,
would typically have some common overlap with the main image, e.g.,
the reference image may be of at least a similar scene as the main
image, and/or may be captured at least somewhat closely in time
with the main image.
[0111] Some cameras (e.g., the Kodak V570, see
http://www.dcviews.com/_kodak/v570.htm) have a pair of CCDs, which
may have been designed to solve the problem of having a single zoom
lens. A reference image can be captured at one CCD while the main
image is being simultaneously captured with the second CCD, or two
portions of a same CCD may be used for this purpose. In this case,
the reference image is neither a preview nor a postview image, yet
the reference image is a different image than the main image, and
has some temporal or spatial overlap, connection or proximity with
the main image. A same or different optical system may be used,
e.g., lens, aperture, shutter, etc., while again this would
typically involve some additional calibration. Such dual mode
system may include a IR sensor, enhanced dynamic range, and/or
special filters that may assist in various algorithms or
processes.
[0112] In the context of blurring processes, i.e., either removing
camera motion blur or adding blur to background sections of images,
a blurred image may be used in combination with a non-blurred image
to produce a final image having a non-blurred foreground and a
blurred background. Both images may be deemed reference images
which are each partly used to form a main final image, or one may
be deemed a reference image having a portion combined into a main
image. If two sensors are used, one could save a blurred image at
the same time that the other takes a sharp image, while if only a
single sensor is used, then the same sensor could take a blurred
image followed by taking a sharp image, or vice-versa. A map of
systematic dust artefact regions may be acquired using one or more
reference images.
[0113] Reference images may also be used to disqualify or
supplement images which have with unsatisfactory features such as
faces with blinks, occlusions, or frowns.
Foreground/Background Processing
[0114] A method is provided for distinguishing between foreground
and background regions of a digital image of a scene. The method
includes capturing first and second images of nominally the same
scene and storing the captured images in DCT-coded format. These
images may include a main image and a reference image, and/or
simply first and second images either of which images may comprise
the main image. The first image may be taken with the foreground
more in focus than the background, while the second image may be
taken with the background more in focus than the foreground.
Regions of the first image may be assigned as foreground or
background according to whether the sum of selected high order DCT
coefficients decreases or increases for equivalent regions of the
second image. In accordance with the assigning, one or more
processed images based on the first image or the second image, or
both, are rendered at a digital rendering device, display or
printer, or combinations thereof.
[0115] This method lends itself to efficient in-camera
implementation due to the relatively less-complex nature of
calculations utilized to perform the task.
[0116] In the present context, respective regions of two images of
nominally the same scene are said to be equivalent if, in the case
where the two images have the same resolution, the two regions
correspond to substantially the same part of the scene. If, in the
case where one image has a greater resolution than the other image,
the part of the scene corresponding to the region of the higher
resolution image is substantially wholly contained within the part
of the scene corresponding to the region of the lower resolution
image. Preferably, the two images are brought to the same
resolution by sub-sampling the higher resolution image or
upsampling the lower resolution image, or a combination thereof.
The two images are preferably also aligned, sized or other process
to bring them to overlapping as to whatsoever relevant parameters
for matching.
[0117] Even after subsampling, upsampling and/or alignment, the two
images may not be identical to each other due to slight camera
movement or movement of subjects and/or objects within the scene.
An additional stage of registering the two images may be
utilized.
[0118] Where the first and second images are captured by a digital
camera, the first image may be a relatively high resolution image,
and the second image may be a relatively low resolution pre- or
post-view version of the first image.
While the image is captured by a digital camera, the processing may
be done in the camera as post processing, or externally in a
separate device such as a personal computer or a server computer.
In such case, both images can be stored. In the former embodiment,
two DCT-coded images can be stored in volatile memory in the camera
for as long as they are being used for foreground/background
segmentation and final image production. In the latter embodiment,
both images may be preferably stored in non-volatile memory. In the
case of lower resolution pre-or-post view images, the lower
resolution image may be stored as part of the file header of the
higher resolution image.
[0119] In some cases only selected regions of the image are stored
as two separated regions. Such cases include foreground regions
that may surround faces in the picture. In one embodiment, if it is
known that the images contain a face, as determined, for example,
by a face detection algorithm, processing can be performed just on
the region including and surrounding the face to increase the
accuracy of delimiting the face from the background.
[0120] Inherent frequency information as to DCT blocks is used to
provide and take the sum of high order DCT coefficients for a DCT
block as an indicator of whether a block is in focus or not. Blocks
whose high order frequency coefficients drop when the main subject
moves out of focus are taken to be foreground with the remaining
blocks representing background or border areas. Since the image
acquisition and storage process in a digital camera typically codes
captured images in DCT format as an intermediate step of the
process, the method can be implemented in such cameras without
substantial additional processing.
[0121] This technique is useful in cases where differentiation
created by camera flash, as described in U.S. application Ser. No.
11/217,788, published as 2006/0039690, incorporated by reference
(see also U.S. Ser. No. 11/421,027) may not be sufficient. The two
techniques may also be advantageously combined to supplement one
another.
[0122] Methods are provided that lend themselves to efficient
in-camera implementation due to the computationally less rigorous
nature of calculations used in performing the task in accordance
with embodiments described herein.
[0123] A method is also provided for determining an orientation of
an image relative to a digital image acquisition device based on a
foreground/background analysis of two or more images of a
scene.
[0124] Further embodiments are described below including methods
and devices for providing or suggesting options for automatic
digital image enhancements based on information relating to the
location, position, focus, exposure or other parameter or values of
parameters of region of interests and in particular faces in an
image. Such parameters or values of parameter may include a spatial
parameter.
[0125] A still image may be animated and used in a slide show by
simulated camera movement, e.g., zooming, panning and/or rotating
where the center point of an image is within a face or at least the
face is included in all or substantially all of the images in the
slide show.
[0126] A preferred embodiment includes an image processing
application whether implemented in software or in firmware, as part
of the image capture process, image rendering process, or as part
of post processing. This system receives images in digital form,
where the images can be translated into a grid representation
including multiple pixels. This application detects and isolates
the faces from the rest of the picture, and determines sizes and
locations of the faces relative to other portions of the image or
the entire image. Orientations of the faces may also be determined.
Based on information regarding detected faces, preferably separate
modules of the system collect facial data and perform image
enhancement operations based on the collected facial data. Such
enhancements or corrections include automatic orientation of the
image, color correction and enhancement, digital fill flash
simulation and dynamic compression.
[0127] Advantages of the preferred embodiments include the ability
to automatically perform or suggest or assist in performing complex
tasks that may otherwise call for manual intervention and/or
experimenting. Another advantage is that important regions, e.g.,
faces, of an image may be assigned, marked and/or mapped and then
processing may be automatically performed and/or suggested based on
this information relating to important regions of the images.
Automatic assistance may be provided to a photographer in the post
processing stage. Assistance may be provided to the photographer in
determining a focus and an exposure while taking a picture.
Meta-data may be generated in the camera that would allow an image
to be enhanced based on the face information.
[0128] Many advantageous techniques are provided in accordance with
preferred and alternative embodiments set forth herein. For
example, a method of processing a digital image using face
detection within said image to achieve one or more desired image
processing parameters is provided. A group of pixels is identified
that correspond to an image of a face within the digital image.
Default values are determined of one or more parameters of at least
some portion of said digital image. Values of the one or more
parameters are adjusted within the digitally-detected image based
upon an analysis of said digital image including said image of said
face and said default values.
[0129] The digital image may be digitally-acquired and/or may be
digitally-captured. Decisions for processing the digital image
based on said face detection, selecting one or more parameters
and/or for adjusting values of one or more parameters within the
digital image may be automatically, semi-automatically or manually
performed. Similarly, on the other end of the image processing
workflow, the digital image may be rendered from its binary display
onto a print, or a electronic display.
[0130] One or more different degrees of simulated fill flash may be
created by manual, semi-automatic or automatic adjustment. The
analysis of the image of the face may include a comparison of an
overall exposure to an exposure around the identified face. The
exposure may be calculated based on a histogram. Digitally
simulation of a fill flash may include optionally adjusting tone
reproduction and/or locally adjusting sharpness. One or more
objects estimated to be closer to the camera or of higher
importance may be operated on in the simulated fill-flash. These
objects determined to be closer to the camera or of higher
importance may include one or more identified faces. A fill flash
or an option for providing a suggested fill-flash may be
automatically provided. The method may be performed within a
digital acquisition device, a digital rendering device, or an
external device or a combination thereof.
[0131] The face pixels may be identified, a false indication of
another face within the image may be removed, and an indication of
a face may be added within the image, each manually by a user, or
semi-automatically or automatically using image processing
apparatus. The face pixels identifying may be automatically
performed by an image processing apparatus, and a manual
verification of a correct detection of at least one face within the
image may be provided.
[0132] A method of digital image processing using face detection to
achieve a desired image parameter is further provided including
identifying a group of pixels that correspond to an image of a face
within a digitally-detected image. Initial values of one or more
parameters of at least some of the pixels are determined. An
initial parameter of the digitally-detected image is determined
based on the initial values. Values of the one or more parameters
of pixels within the digitally-detected image are automatically
adjusted based upon a comparison of the initial parameter with the
desired parameter or an option for adjusting the values is
automatically provided.
[0133] The digitally-detected image may include a
digitally-acquired, rendered and/or digitally-captured image. The
initial parameter of the digitally-detected image may include an
initial parameter of the face image. The one or more parameters may
include any of orientation, color, tone, size, luminance, and
focus. The method may be performed within a digital camera as part
of a pre-acquisition stage, within a camera as part of post
processing of the captured image or within external processing
equipment. The method may be performed within a digital rendering
device such as a printer, or as a preparation for sending an image
to an output device, such as in the print driver, which may be
located in the printer or on an external device such as the PC, as
part of a preparation stage prior to displaying or printing the
image. An option to manually remove a false indication of a face or
to add an indication of a face within the image may be included. An
option to manually override, the automated suggestion of the
system, whether or not faces were detected, may be included.
[0134] The method may include identifying one or more sub-groups of
pixels that correspond to one or more facial features of the face.
Initial values of one or more parameters of pixels of the one or
more sub-groups of pixels may be determined. An initial spatial
parameter of the face within the digital image may be determined
based on the initial values. The initial spatial parameter may
include any of orientation, size and location.
[0135] When the spatial parameter is orientation, values of one or
more parameters of pixels may be adjusted for re-orienting the
image to an adjusted orientation. The one or more facial features
may include one or more of an eye, a mouth, two eyes, a nose, an
ear, neck, shoulders and/or other facial or personal features, or
other features associated with a person such as an article of
clothing, furniture, transportation, outdoor environment (e.g.,
horizon, trees, water, etc.) or indoor environment (doorways,
hallways, ceilings, floors, walls, etc.), wherein such features may
be indicative of an orientation. The one or more facial or other
features may include two or more features, and the initial
orientation may be determined base on relative positions of the
features that are determined based on the initial values. A shape
such as a triangle may be generated for example between the two
eyes and the center of the mouth, a golden rectangle as described
above, or more generically, a polygon having points corresponding
to preferably three or more features as vertices or axis.
[0136] Initial values of one or more chromatic parameters, such as
color and tone, of pixels of the digital image may be determined.
The values of one or more parameters may be automatically adjusted
or an option to adjust the values to suggested values may be
provided.
[0137] The method may be performed within any digital image capture
device, which as, but not limited to digital still camera or
digital video camera. The one or more parameters may include
overall exposure, relative exposure, orientation, color balance,
white point, tone reproduction, size, or focus, or combinations
thereof. The face pixels identifying may be automatically performed
by an image processing apparatus, and the method may include
manually removing one or more of the groups of pixels that
correspond to an image of a face. An automatically detected face
may be removed in response to false detection of regions as faces,
or in response to a determination to concentrate on less image
faces or images faces that were manually determined to be of higher
subjective significance, than faces identified in the identifying
step. A face may be removed by increasing a sensitivity level of
said face identifying step. The face removal may be performed by an
interactive visual method, and may use an image acquisition
built-in display.
[0138] The face pixels identifying may be performed with an image
processing apparatus, and may include manually adding an indication
of another face within the image. The image processing apparatus
may receive a relative value as to a detection assurance or an
estimated importance of the detected regions. The relative value
may be manually modified as to the estimated importance of the
detected regions.
[0139] Within a digital camera, a method of digital image
processing using face detection for achieving a desired image
parameter is further provided. A group of pixels is identified that
correspond to a face within a digital image. First initial values
of a parameter of pixels of the group of pixels are determined, and
second initial values of a parameter of pixels other than pixels of
the group of pixels are also determined. The first and second
initial values are compared. Adjusted values of the parameter are
determined based on the comparing of the first and second initial
values and on a comparison of the parameter corresponding to at
least one of the first and second initial values and the desired
image parameter.
[0140] Initial values of luminance of pixels of the group of pixels
corresponding to the face may be determined. Other initial values
of luminance of pixels other than the pixels corresponding to the
face may also be determined. The values may then be compared, and
properties of aperture, shutter, sensitivity and a fill flash may
be determined for providing adjusted values corresponding to at
least some of the initial values for generating an adjusted digital
image. The pixels corresponding to the face may be determined
according to sub-groups corresponding to one or more facial
features.
[0141] A method of generating one or more new digital images using
an original digitally-acquired image including a face is further
provided. A group of pixels that correspond to a face within the
original digitally-acquired image is identified. A portion of the
original image is selected to include the group of pixels. Values
of pixels of one or more new images based on the selected portion
are automatically generated, or an option to generate them is
provided, in a manner which always includes the face within the one
or more new images.
[0142] A transformation may be gradually displayed between the
original digitally-acquired image and one or more new images.
Parameters of said transformation may be adjusted between the
original digitally-acquired image and one or more new images.
Parameters of the transformation between the original
digitally-acquired image, e.g., including a face, and one or more
new images may be selected from a set of at least one or more
criteria including timing or blending or a combination thereof. The
blending may include dissolving, flying, swirling, appearing,
flashing, or screening, or combinations thereof.
[0143] Methods of generating slide shows that use an image
including a face are provided in accordance with the generation of
one or more new images. A group of pixels is identified that
correspond to a face within a digitally-acquired image. A zoom
portion of the image including the group of pixels may be
determined. The image may be automatically zoomed to generate a
zoomed image including the face enlarged by the zooming, or an
option to generate the zoomed image may be provided. A center point
of zooming in or out and an amount of zooming in or out may be
determined after which another image may be automatically generated
including a zoomed version of the face, or an option to generate
the image including the zoomed version of the face may be provided.
One or more new images may be generated each including a new group
of pixels corresponding to the face, automatic panning may be
provided using the one or more new images.
[0144] A method of generating one or more new digital images using
an original digitally-acquired image including a face is further
provided. One or more groups of pixels may be identified that
correspond to two or more faces within the original
digitally-acquired image. A portion of the original image may be
selected to include the group of pixels. Values of pixels may be
automatically generated of one or more new images based on the
selected portion in a manner which always includes at least one of
the two or more faces within the one or more new images or a
panning intermediate image between two of the faces of said two or
more identified faces or a combination thereof.
[0145] Panning may be performed between the two or more identified
faces. The panning may be from a first face to a second face of the
two or more identified faces, and the second face may then be
zoomed. The first face may be de-zoomed prior to panning to the
second face. The second face may also be zoomed. The panning may
include identifying a panning direction parameter between two of
the identified faces. The panning may include sequencing along the
identified panning direction between the two identified faces
according to the identified panning direction parameter.
[0146] A method of digital image processing using face detection
for achieving a desired spatial parameter is further provided
including identifying a group of pixels that correspond to a face
within a digital image, identifying one or more sub-groups of
pixels that correspond to one or more facial features of the face,
determining initial values of one or more parameters of pixels of
the one or more sub-groups of pixels, determining an initial
spatial parameter of the face within the digital image based on the
initial values, and determining adjusted values of pixels within
the digital image for adjusting the image based on a comparison of
the initial and desired spatial parameters.
[0147] The initial spatial parameter may include orientation. The
values of the pixels may be automatically adjusted within the
digital image to adjust the initial spatial parameter approximately
to the desired spatial parameter. An option may be automatically
provided for adjusting the values of the pixels within the digital
image to adjust the initial spatial parameter to the desired
spatial parameter.
[0148] A method of digital image processing using face detection is
also provided wherein a first group of pixels that correspond to a
face within a digital image is identified, and a second group of
pixels that correspond to another feature within the digital image
is identified. A re-compositioned image is determined including a
new group of pixels for at least one of the face and the other
feature. The other feature may include a second face. The
re-compositioned image may be automatically generated or an option
to generate the re-compositioned image may be provided. Values of
one or more parameters of the first and second groups of pixels,
and relative-adjusted values, may be determined for generating the
re-compositioned image.
[0149] Each of the methods provided are preferably implemented
within software and/or firmware either in the camera, the rendering
device such as printers or display, or with external processing
equipment. The software may also be downloaded into the camera or
image processing equipment. In this sense, one or more processor
readable storage devices having processor readable code embodied
thereon are provided. The processor readable code programs one or
more processors to perform any of the above or below described
methods.
[0150] FIG. 1a illustrates a preferred embodiment. An image is
opened by the application in block 102. The software then
determines whether faces are in the picture as described in block
106. If no faces are detected, the software ceases to operate on
the image and exits, 110.
[0151] Alternatively, the software may also offer a manual mode,
where the user, in block 116 may inform the software of the
existence of faces, and manually marks them in block 118. The
manual selection may be activated automatically if no faces are
found, 116, or it may even be optionally activated after the
automatic stage to let the user, via some user interface to either
add more faces to the automatic selection 112 or even 114, remove
regions that are mistakenly 110 identified by the automatic process
118 as faces. Additionally, the user may manually select an option
that invokes the process as defined in 106. This option is useful
for cases where the user may manually decide that the image can be
enhanced or corrected based on the detection of the faces. Various
ways that the faces may be marked, whether automatically of
manually, whether in the camera or by the applications, and whether
the command to seek the faces in the image is done manually or
automatically, are all included in preferred embodiments
herein.
[0152] In an alternative embodiment, the face detection software
may be activated inside the camera as part of the acquisition
process, as described in Block 104. This embodiment is further
depicted in FIG. 1b. In this scenario, the face detection portion
106 may be implemented differently to support real time or near
real time operation. Such implementation may include sub-sampling
of the image, and weighted sampling to reduce the number of pixels
on which the computations are performed.
[0153] In an alternative embodiment, the face detection software
may be activated inside the rendering device as part of the output
process, as described in Block 103. This embodiment is further
depicted in FIG. 1c. In this scenario, the face detection portion
106 may be implemented either within the rendering device, or
within a en external driver to such device.
[0154] After the faces are tagged, or marked, whether manually as
defined in 106, or automatically, 118, the software is ready to
operate on the image based on the information generated by the
face-detection stage. The tools can be implemented as part of the
acquisition, as part of the post-processing, or both.
[0155] Block 120 describes panning and zooming into the faces. This
tool can be part of the acquisition process to help track the faces
and create a pleasant composition, or as a post processing stage
for either cropping an image or creating a slide show with the
image, which includes movement. This tool is further described in
FIG. 6.
[0156] Block 130 depicts the automatic orientation of the image, a
tool that can be implemented either in the camera as art of the
acquisition post processing, or on a host software. This tool is
further described in FIGS. 2a-2e.
[0157] Block 140 describes the way to color-correct the image based
on the skin tones of the faces. This tool can be part of the
automatic color transformations that occur in the camera when
converting the image from the RAW sensor data form onto a known,
e.g. RGB representation, or later in the host, as part of image
enhancement software. The various image enhancement operations may
be global, affecting the entire image, such as rotation, and/or may
be selective based on local criteria. For example, in a selective
color or exposure correction as defined in block 140, a preferred
embodiment includes corrections done to the entire image, or only
to the face regions in a spatially masked operation, or to specific
exposure, which is a luminance masked operation. Note also that
such masks may include varying strength, which correlates to
varying degrees of applying a correction. This allows a local
enhancement to better blend into the image.
[0158] Block 150 describes the proposed composition such as
cropping and zooming of an image to create a more pleasing
composition. This tool, 150 is different from the one described in
block 120 where the faces are anchors for either tracking the
subject or providing camera movement based on the face
location.
[0159] Block 160 describes the digital-fill-flash simulation which
can be done in the camera or as a post processing stage. This tool
is further described in FIGS. 4a-4e. Alternatively to the digital
fill flash, this tool may also be an actual flash sensor to
determine if a fill flash is needed in the overall exposure as
described in Block 170. In this case, after determining the overall
exposure of the image, if the detected faces in the image are in
the shadow, a fill flash will automatically be used. Note that the
exact power of the fill flash, which should not necessarily be the
maximum power of the flash, may be calculated based on the exposure
difference between the overall image and the faces. Such
calculation is well known to the one skilled in the art and is
based on a tradeoff between aperture, exposure time, gain and flash
power.
[0160] This tool is further described in FIG. 4e. Block 180
describes the ability of the camera to focus on the faces. This can
be used as a pre-acquisition focusing tool in the camera, as
further illustrated in FIG. 7.
[0161] Referring to FIG. 1b, which describes a process of using
face detection to improve in camera acquisition parameters, as
aforementioned in FIG. 1a, block 106. In this scenario, a camera is
activated, 1000, for example by means of half pressing the shutter,
turning on the camera, etc. The camera then goes through the normal
pre-acquisition stage to determine, 1004, the correct acquisition
parameters such as aperture, shutter speed, flash power, gain,
color balance, white point, or focus. In addition, a default set of
image attributes, particularly related to potential faces in the
image, are loaded, 1002. Such attributes can be the overall color
balance, exposure, contrast, orientation etc.
[0162] An image is then digitally captured onto the sensor, 1010.
Such action may be continuously updated, and may or may not include
saving such captured image into permanent storage.
[0163] An image-detection process, preferably a face detection
process, is applied to the captured image to seek faces in the
image, 1020. If no images are found, the process terminates, 1032.
Alternatively, or in addition to the automatic detection of 1030,
the user can manually select, 1034 detected faces, using some
interactive user interface mechanism, by utilizing, for example, a
camera display. Alternatively, the process can be implemented
without a visual user interface by changing the sensitivity or
threshold of the detection process.
[0164] When faces are detected, 1040, they are marked, and labeled.
Detecting defined in 1040 may be more than a binary process of
selecting whether a face is detected or not. It may also be
designed as part of a process where each of the faces is given a
weight based on size of the faces, location within the frame, other
parameters described herein, etc., which define the importance of
the face in relation to other faces detected.
[0165] Alternatively, or in addition, the user can manually
deselect regions, 1044 that were wrongly false detected as faces.
Such selection can be due to the fact that a face was false
detected or when the photographer may wish to concentrate on one of
the faces as the main subject matter and not on other faces.
Alternatively, 1046, the user may re-select, or empahsize one or
more faces to indicate that these faces have a higher importance in
the calculation relative to other faces. This process as defined in
1046, further defines the preferred identification process to be a
continuous value one as opposed to a binary one. The process can be
done utilizing a visual user interface or by adjusting the
sensitivity of the detection process. After the faces are correctly
isolated, 1040, their attributes are compared, 1050 to default
values that were predefined in 1002. Such comparison will determine
a potential transformation between the two images, in order to
reach the same values. The transformation is then translated to the
camera capture parameters, 1070, and the image, 1090 is
acquired.
[0166] A practical example is that if the captured face is too
dark, the acquisition parameters may change to allow a longer
exposure, or open the aperture. Note that the image attributes are
not necessarily only related to the face regions but can also be in
relations to the overall exposure. As an exemplification, if the
overall exposure is correct but the faces are underexposed, the
camera may shift into a fill-flash mode as subsequently illustrated
in FIG. 4a-4f.
[0167] FIG. 1c illustrates a process of using face detection to
improve output or rendering parameters, as aforementioned in FIG.
1a, block 103. In this scenario, a rendering device such as a
printer or a display, herein referred to as the Device, activated,
1100. Such activation can be performed for example within a
printer, or alternatively within a device connected to the printer
such as a PC or a camera. The device then goes through the normal
pre-rendering stage to determine, 1104, the correct rendering
parameters such as tone reproduction, color transformation
profiles, gain, color balance, white point and resolution. In
addition, a default set of image attributes, particularly related
to potential faces in the image, are loaded, 1102. Such attributes
can be the overall color balance, exposure, contrast, orientation
etc.
[0168] An image is then digitally downloaded onto the device, 1110.
An image-detection process, preferably a face detection process, is
applied to the downloaded image to seek faces in the image, 1120.
If no images are found, the process terminates, 1132 and the device
resumes its normal rendering process. Alternatively, or in addition
to the automatic detection of 1130, the user can manually select,
1134 detected faces, using some interactive user interface
mechanism, by utilizing, for example, a display on the device.
Alternatively, the process can be implemented without a visual user
interface by changing the sensitivity or threshold of the detection
process. When faces are detected, 1040, they are marked, and
labeled. Detecting defined in 1140 may be more than a binary
process of selecting whether a face is detected or not. It may also
be designed as part of a process where each of the faces is given a
weight based on size of the faces, location within the frame, other
parameters described herein, etc., which define the importance of
the face in relation to other faces detected.
[0169] Alternatively, or in addition, the user can manually
deselect regions, 1144 that were wrongly false detected as faces.
Such selection can be due to the fact that a face was false
detected or when the photographer may wish to concentrate on one of
the faces as the main subject matter and not on other faces.
Alternatively, 1146, the user may re-select, or emphasize one or
more faces to indicate that these faces have a higher importance in
the calculation relative to other faces. This process as defined in
1146, further defines the preferred identification process to be a
continuous value one as opposed to a binary one. The process can be
done utilizing a visual user interface or by adjusting the
sensitivity of the detection process. After the faces are correctly
isolated, 1140, their attributes are compared, 1150 to default
values that were predefined in 1102. Such comparison will determine
a potential transformation between the two images, in order to
reach the same values. The transformation is then translated to the
device rendering parameters, 1170, and the image, 1190 is rendered.
The process may include a plurality of images. In this case 1180,
the process repeats itself for each image prior to performing the
rendering process. A practical example is the creation of a
thumbnail or contact sheet whish is a collection of low resolution
images, on a single display instance.
[0170] A practical example is that if the face was too dark
captured, the rendering parameters may change the tone reproduction
curve to lighten the face. Note that the image attributes are not
necessarily only related to the face regions but can also be in
relations to the overall tone reproduction.
[0171] Referring to FIGS. 2a-2e, which describe the invention of
automatic rotation of the image based on the location and
orientation of faces, as highlighted in FIG. 1 Block 130. An image
of two faces is provided in FIG. 2a. Note that the faces may not be
identically oriented, and that the faces may be occluding.
[0172] The software in the face detection stage, including the
functionality of FIG. 1a, blocks 108 and 118, will mark the two
faces, of the mother and son as an estimation of an ellipse 210 and
220 respectively. Using known mathematical means, such as the
covariance matrix of the ellipse, the software will determine the
main axis of the two faces, 212 and 222 respectively as well as the
secondary axis 214 and 224. Even at this stage, by merely comparing
the sizes of the axis, the software may assume that the image is
oriented 90 degrees, in the case that the camera hel helo in
landscape mode, which is horizontal, or in portrait mode which is
vertical or +90 degrees, aka clockwise, or -90 degrees aka counter
clockwise. Alternatively, the application may also be utilized for
any arbitrary rotation value. However this information may not
suffice to decide whether the image is rotated clockwise or
counter-clockwise.
[0173] FIG. 2c describes the step of extracting the pertinent
features of a face, which are usually highly detectable. Such
objects may include the eyes, 214, 216 and 224, 226, and the lips,
218 and 228. The combination of the two eyes and the center of the
lips creates a triangle 230 which can be detected not only to
determine the orientation of the face but also the rotation of the
face relative to a facial shot. Note that there are other highly
detectable portions of the image which can be labeled and used for
orientation detection, such as the nostrils, the eyebrows, the hair
line, nose bridge and the neck as the physical extension of the
face etc. In this figure, the eyes and lips are provided as an
example of such facial features Based on the location of the eyes
if found, and the mouth, the image may, e.g., need to be rotated in
a counter clockwise direction.
[0174] Note that it may not be enough to just locate the different
facial features, but it may be necessary to compare such features
to each other. For example, the color of the eyes may be compared
to ensure that the pair of eyes originate form the same person.
Another example is that in FIGS. 2-c and 2-d, if the software
combined the mouth of 218 with the eyes of 226, 224, the
orientation would have been determined as clockwise. In this case,
the software detects the correct orientation by comparing the
relative size of the mouth and the eyes. The above method describes
means of determining the orientation of the image based on the
relative location of the different facial objects. For example, it
may be desired that the two eyes should be horizontally situated,
the nose line perpendicular to the eyes, the mouth under the nose
etc. Alternatively, orientation may be determined based on the
geometry of the facial components themselves. For example, it may
be desired that the eyes are elongated horizontally, which means
that when fitting an ellipse on the eye, such as described in blocs
214 and 216, it may be desired that the main axis should be
horizontal. Similar with the lips which when fitted to an ellipse
the main axis should be horizontal. Alternatively, the region
around the face may also be considered. In particular, the neck and
shoulders which are the only contiguous skin tone connected to the
head can be an indication of the orientation and detection of the
face.
[0175] FIG. 2-e illustrates the image as correctly oriented based
on the facial features as detected. In some cases not all faces
will be oriented the same way. In such cases, the software may
decide on other criteria to determine the orientation of the
prominent face in the image. Such determination of prominence can
be based on the relevant size of the faces, the exposure, or
occlusion.
[0176] If a few criteria are tested, such as the relationship
between different facial components and or the orientation of
individual components, not all results will be conclusive to a
single orientation. This can be due to false detections,
miscalculations, occluding portions of faces, including the neck
and shoulders, or the variability between faces. In such cases, a
statistical decision may be implemented to address the different
results and to determine the most likely orientation. Such
statistical process may be finding the largest results (simple
count), or more sophisticated ordering statistics such as
correlation or principal component analysis, where the basis
function will be the orientation angle. Alternatively or in
addition, the user may manually select the prominent face or the
face to be oriented. The particular orientation of the selected or
calculated prominent face may itself be automatically determined,
programmed, or manually determined by a user.
[0177] The process for determining the orientation of images can be
implemented in a preferred embodiment as part of a digital display
device. Alternatively, this process can be implemented as part of a
digital printing device, or within a digital acquisition
device.
[0178] The process can also be implemented as part of a display of
multiple images on the same page or screen such as in the display
of a contact-sheet or a thumbnail view of images. In this case, the
user may approve or reject the proposed orientation of the images
individually or by selecting multiple images at once. In the case
of a sequence of images, this invention may also determine the
orientation of images based on the information as approved by the
user regarding previous images.
[0179] FIGS. 3a-3f describe an illustrative process in which a
proposed composition is offered based on the location of the face.
As defined in FIG. 1a blocks 108 and 118, the face 320 is detected
as are one or more pertinent features, as illustrated in this case,
the eyes 322 and 324. The location of the eyes are then calculated
based on the horizontal, 330 and vertical 340 location. In this
case, the face is located at the center of the image horizontally
and at the top quarter vertically as illustrated in FIG. 3-d.
[0180] Based on common rules of composition and aesthetics, e.g., a
face in a close up may be considered to be better positioned, as in
FIG. 3-e if the eyes are at the 2/3rd line as depicted in 350, and
1/3 to the left or 1/3 to the right as illustrated in 360. Other
similar rules may be the location of the entire face and the
location of various portions of the face such as the eyes and lips
based on aesthetic criteria such as the applying the golden-ratio
for faces and various parts of the face within an image.
[0181] FIG. 3c introduces another aspect of face detection which
may happen especially in non-restrictive photography. The faces may
not necessarily be frontally aligned with the focal plane of the
camera. In this figure, the object is looking to the side exposing
partial frontal, or partial profile of the face. In such cases, the
software may elect to use, the center of the face, which in this
case may align with the left eye of the subject. If the subject was
in full frontal position, the software may determine the center of
the face to be around the nose bridge. The center of the face may
be determined to be at the center of a rectangle, ellipse or other
shape generally determined to outline the face or at the
intersection of cross-hairs or otherwise as may be understood by
those skilled in the art (see, e.g., ellipse 210 of FIGS. 2b-2e,
ellipse 320 of FIG. 3b, ellipse 330 of FIG. 3c, the cross-hairs
350, 360 of FIG. 3e).
[0182] Based on the knowledge of the face and its pertinent
features such as eyes, lips nose and ears, the software can either
automatically or via a user interface that would recommend the next
action to the user, crop portions of the image to reach such
composition. For this specific image, the software will eliminate
the bottom region 370 and the right portion 380. The process of
re-compositioning a picture is subjective. In such case this
invention will act as guidance or assistance to the user in
determining the most pleasing option out of potentially a few. In
such a case a plurality of proposed compositions can be displayed
and offered to the user and the user will select one of them.
[0183] In an alternative embodiment, the process of
re-compositioning the image can be performed within the image
acquisition device as part of the image taking process, whether as
a pre-capture, pre-acquisition or post acquisition stage. In this
scenario the acquisition device may display a proposed
re-compositioning of the image on its display. Such
re-compositioning may be displayed in the device viewfinder or
display similarly to FIG. 3f, or alternatively as guidelines of
cropping such as lines 352 and 354. A user interface such will
enable the user to select form the original composed image, or the
suggested one. Similar functionality can be offered as part of the
post acquisition or otherwise referred to the playback mode.
[0184] In additional embodiments, the actual lines of aesthetics,
for example, the 1/3.sup.rd lines 350 and 350, may also be
displayed to the use as assistance in determining the right
composition. Referring to FIGS. 4a-4f, the knowledge of the faces
may assist the user in creating an automatic effect that is
otherwise created by a fill-flash. Fill-flash is a flash used where
the main illumination is available light. In this case, the flash
assists in opening up shadows in the image. Particularly, fill
flash is used for images where the object in the foreground is in
the shadow. Such instances occur for example when the sun is in
front of the camera, thus casting a shadow on the object in the
foreground. In many cases the object includes people posing in
front of a background of landscape.
[0185] FIG. 4a illustrates such image. The overall image is bright
due to the reflection of the sun in the water. The individuals in
the foreground are therefore in the shadow.
[0186] A certain embodiment of calculating the overall exposure can
be done using an exposure histogram. Those familiar in the art may
decide on other means of determining exposure, any of which may be
used in accordance with an alternative embodiment. When looking at
the histogram of the luminance of the image at FIG. 4-b, there are
three distinct areas of exposure which correspond to various areas.
The histogram depicts the concentration of pixels, as defined by
the Y-Axis 416, as a function of the different gray levels as
defined by the X-axis 418. The higher the pixel count for a
specific gray level, the higher the number as depicted on the
y-axis. Regions 410 are in the shadows which belong primarily to
the mother. The midtones in area 412 belong primarily to the shaded
foreground water and the baby. The highlights 414 are the water.
However, not all shadows may be in the foreground, and not all
highlights may be in the background. A correction of the exposure
based on the histogram may result in an unnatural correction.
[0187] When applying face detection, as depicted in FIG. 4-c, the
histogram in FIG. 4-d may be substantially more clear. In this
histogram, region 440 depicts the faces which are in the shadow.
Note that the actual selection of the faces, as illustrated in 4-c
need not be a binary mask but can be a gray scale mask where the
boundaries are feathered or gradually changing. In addition,
although somewhat similar in shape, the face region 440 may not be
identical to the shadow region of the entire image, as defined,
e.g., in FIG. 4b at area 410. By applying exposure correction to
the face regions as illustrated in FIG. 4-e, such as passing the
image through a lookup table 4-f, the effect is similar to the one
of a fill flash that illuminated the foreground, but did not affect
the background. By taking advantage of the gradual feathered mask
around the face, such correction will not be accentuated and
noticed. FIG. 4e can also be performed manually thus allowing the
user to create a varying effect of simulated fill flash.
Alternatively, the software may present the user with a selection
of corrections based on different tone reproduction curves and
different regions for the user to choose from.
[0188] Although exposure, or tone reproduction, may be the most
preferred enhancement to simulate fill flash, other corrections may
apply such as sharpening of the selected region, contrast
enhancement, of even color correction. Additional advantageous
corrections may be understood by those familiar with the effect of
physical strobes on photographed images.
[0189] Alternatively, as described by the flow chart of FIG. 4g, a
similar method may be utilized in the pre-acquisition stage, to
determine if a fill flash is needed or not. The concept of using a
fill flash is based on the assumption that there are two types of
light sources that illuminate the image: an available external or
ambient light source, which is controlled by the gain, shutter
speed and aperture, and a flash which is only controlled by the
flash power and affected by the aperture. By modifying the aperture
vs. the shutter speed, the camera can either enhance the effect of
the flash or decrease it, while maintaining the overall
exposure.
[0190] Referring now to FIG. 4g, a digital image is provided at
450. A determination is made at 460 whether faces were found in the
image. As will be seen below, this process can be applied to other
image features or regions within a digital image, e.g., a region
including a face and also its surroundings, or a portion of a face
less than the entire face, such as the eyes or the mouth or the
nose, or two of these, or a background or foreground region within
an image. If no faces (or other regions or features, hereinafter
only "faces" will be referred to, as an example) are found, the
process exits at 462. If a one or more faces is found at 460, then
the faces are automatically marked at 464. There can be a manual
step here instead of or in addition to the automatic marking at
464. A determination of exposure in face regions occurs at 470.
Then, at 474 it is determined whether exposure of the face regions
is lower than an overall exposure. If the exposure of the face
regions is not lower than an overall exposure, then the image may
be left as is by moving the process to 478. If the exposure of the
face regions is lower than an overall exposure, then a fill flash
may be digitally simulated at 480.
[0191] Referring still to FIG. 4g, an exemplary digital fill flash
simulation 480 includes creating masks to define one or more
selected regions at 482a. Exposure of the selected regions is
increased at 484a. Sharpening is applied to the selected regions at
486a. Tone reproduction is applied on selected regions 488a. Single
or multiple results may be displayed to the user at 490a, and then
a user selects a preferred results at 492a. An image may be
displayed with a parameter to modify at 494a, and then a user
adjusts the extent of modification at 496a. After 492a and/or 496a
correction is applied to the image at 498.
[0192] Referring now to FIG. 4h, when the user activates the
camera, in block 104 (see also FIG. 1a), the camera calculates the
overall exposure, 482b. Such calculation is known to one skilled in
the art and can be as sophisticated as needed. In block 108, the
camera searched for the existence of faces in the image. An
exposure is then calculated to the regions defined as belonging to
the faces, 486b. The disparity between the overall exposure as
determined in 484b and the faces, 486b is calculated. If the face
regions are substantially darker than the overall exposure 486b,
the camera will then activate the flash in a fill mode, 490b,
calculate the necessary flash power, aperture and shutter speed,
492b and acquire the image 494b with the fill flash. The
relationship between the flash power, the aperture and the shutter
speed are well formulated and known to one familiar in the art of
photography. Examples of such calculations can be found in U.S.
Pat. No. 6,151,073 to Steinberg et. al., which is hereby
incorporated by reference.
[0193] Alternatively, in a different embodiment, 496b, this
algorithm may be used to simply determine the overall exposure
based on the knowledge and the exposure of the faces. The image
will then be taken, 488b, based on the best exposure for the faces,
as calculated in 496b. Many cameras have a matrix type of exposure
calculation where different regions receive different weights as to
the contribution for the total exposure. In such cases, the camera
can continue to implement the same exposure algorithm with the
exception that now, regions with faces in them will receive a
larger weight in their importance towards such calculations.
[0194] FIG. 5 describes yet another valuable use of the knowledge
of faces in images. In this example, knowledge of the faces can
help improve the quality of image presentation. An image, 510 is
inserted into slide show software. The face is then detected as
defined in FIG. 1 block 104, including the location of the
important features of the face such as the eyes and the mouth.
[0195] The user can then choose between a few options such as: zoom
into the face vs. zoom out of the face and the level of zoom for a
tight close up 520, a regular close up 520 or a medium close up as
illustrated by the bounding box 540. The software will then
automatically calculate the necessary pan, tilt and zoom needed to
smoothly and gradually switch between the beginning and the end
state. In the case where more than one face is found, the software
can also create a pan and zoom combination that will begin at one
face and end at the other. In a more generic manner, the
application can offer from within a selection of effects such as
dissolve,
[0196] FIG. 6 illustrates similar functionality but inside the
device. A camera, whether still or video as illustrated by the
viewfinder 610, when in auto track mode 600, can detect the faces
in the image, and then propose a digital combination of zoom pan
and tilt to move from the full wide image 630 to a zoomed in image
640. Such indication may also show on the viewfinder prior to
zooming, 632 as indication to the user, which the user can then
decide in real time whether to activate the auto zooming or not.
This functionality can also be added to a tracking mode where the
camera continuously tracks the location of the face in the image.
In addition, the camera can also maintain the right exposure and
focus based on the face detection.
[0197] FIG. 7a illustrates the ability to auto focus the camera
based on the location of the faces in the image. Block 710 is a
simulation of the image as seen in the camera viewfinder. When
implementing a center weight style auto focus, 718, one can see
that the image will focus on the grass, 17 feet away, as depicted
by the cross 712. However, as described in this invention, if the
camera in the pre-acquisition mode, 104 detects the face, 714, and
focuses on the face, rather than arbitrarily on the center, the
camera will then indicate to the user where the focus is, 722 and
the lens will be adjusted to the distance to the face, which in
this example, as seen in 728, is 11 ft. vs. the original 17 ft.
[0198] This process can be extended by one skilled in the art to
support not only a single face, but multiple faces, by applying
some weighted average. Such average will depend on the disparity
between the faces, in distances, and sizes.
[0199] FIG. 7b presents the workflow of the process as illustrated
via the viewfinder in FIG. 7-a. When the face-auto-focus mode is
activated, 740, the camera continuously seeks for faces, 750. This
operation inside the camera is performed in real time and needs to
be optimized as such. If no faces are detected 760, the camera will
switch to an alternative focusing mode, 762. If faces are detected,
the camera will mark the single or multiple faces. Alternatively,
the camera may display the location of the face 772, on the
viewfinder or LCD. The user may then take a picture, 790 where the
faces are in focus.
[0200] Alternatively, the camera may shift automatically, via user
request or through preference settings to a face-tracking mode 780.
In this mode, the camera keeps track of the location of the face,
and continuously adjusts the focus based on the location of the
face.
[0201] In an alternative embodiment, the camera can search for the
faces and mark them, similarly to the cross in FIG. 722. The
photographer can then lock the focus on the subject, for example by
half pressing the shutter. Locking the focus on the subject differs
form locking the focus, by the fact that if the subject then moves,
the camera can still maintain the correct focus by modifying the
focus on the selected object.
[0202] FIG. 8 describes the use of information about the location
and size of faces to determine the relevant compression ratio of
different segments of the image. An image 800 is segmented into
tiles using horizontal grid 830 and vertical grid 820. The tiles
which include or partially include face information are marked 850.
Upon compression, regions of 850 may be compressed differently than
the tiles of image 800 outside of this region. The degree of
compression may be predetermined, pre-adjusted by the user or
determined as an interactive process. In the case of multiple
detected faces in an image, the user may also assign different
quality values, or compression rates based on the importance of the
faces in the image. Such importance may be determined subjectively
using an interactive process, or objectively using parameters such
as the relative size of the face, exposure or location of the face
relative to other subjects in the image.
[0203] An alternative method of variable compression involves
variable resolution of the image. Based on this, the method
described with reference to FIG. 8 can also be utilized to create
variable resolution, where facial regions which are preferably
usually the important regions of the image, and will be preferably
maintained with higher overall resolution than other regions in the
image. According to this method, referring to FIG. 8, the regions
of the face as defined in block 850 will be preferably maintained
with higher resolution than regions in the image 800 which are not
part of 850.
[0204] An image can be locally compressed so that specific regions
will have a higher quality compression which equates to lower
compression rate. Alternatively and/or correspondingly, specific
regions of an image may have more or less information associated
with them. The information can be encoded in a frequency-based, or
temporal-based method such as JPEG or Wavelet encoding.
Alternatively, compression on the spatial domain may also involve a
change in the image resolution. Thus, local compression may also be
achieved by defining adjustable variable resolution of an image in
specific areas. By doing so, selected or determined regions of
importance may maintain low compression or high resolution compared
with regions determined to have less importance or non-selected
regions in the image.
[0205] Face detection and face tracking technology, particularly
for digital image processing applications according to preferred
and alternative embodiments set forth herein, are further
advantageous in accordance with various modifications of the
systems and methods of the above description as may be understood
by those skilled in the art, as set forth in the references cited
and incorporated by reference herein and as may be otherwise
described below. For example, such technology may be used for
identification of faces in video sequences, particularly when the
detection is to be performed in real-time. Electronic component
circuitry and/or software or firmware may be included in accordance
with one embodiment for detecting flesh-tone regions in a video
signal, identifying human faces within the regions and utilizing
this information to control exposure, gain settings, auto-focus
and/or other parameters for a video camera (see, e.g., U.S. Pat.
Nos. 5,488,429 and 5,638,136 to Kojima et al., each hereby
incorporated by reference). In another embodiment, a luminance
signal and/or a color difference signal may be used to detect the
flesh tone region in a video image and/or to generate a detecting
signal to indicate the presence of a flesh tone region in the
image. In a further embodiment, electronics and/or software or
firmware may detect a face in a video signal and substitute a
"stored" facial image at the same location in the video signal,
which may be useful, e.g., in the implementation of a low-bandwidth
videophone (see, e.g., U.S. Pat. No. 5,870,138 to Smith et al.,
hereby incorporated by reference).
[0206] In accordance with another embodiment, a human face may be
located within an image which is suited to real-time tracking of a
human face in a video sequence (see, e.g., U.S. Pat. Nos. 6,148,092
and 6,332,033 to Qian, hereby incorporated by reference). An image
may be provided including a plurality of pixels and wherein a
transformation and filtering of each pixel is performed to
determine if a pixel has a color associated with human skin-tone. A
statistical distribution of skin tones in two distinct directions
may be computed and the location of a face within the image may be
calculated from these two distributions.
[0207] In another embodiment, electrical and/or software or
firmware components may be provided to track a human face in an
image from a video sequence where there are multiple persons (see,
e.g., U.S. Pat. No. 6,404,900 also to Qian, hereby incorporated by
reference). A projection histogram of the filtered image may be
used for output of the location and/or size of tracked faces within
the filtered image. A face-like region in an image may also be
detected by applying information to an observer tracking display of
the auto-stereoscopic type (see, e.g., U.S. Pat. No. 6,504,942 to
Hong et al., incorporated by reference).
[0208] An apparatus according to another embodiment may be provided
for detection and recognition of specific features in an image
using an eigenvector approach to face detection (see, e.g., U.S.
Pat. No. 5,710,833 to Moghaddam et al., incorporated by reference).
Additional eigenvectors may be used in addition to or alternatively
to the principal eigenvector components, e.g., all eigenvectors may
be used. The use of all eigenvectors may be intended to increase
the accuracy of the apparatus to detect complex multi-featured
objects.
[0209] Another approach may be based on object identification and
recognition within a video image using model graphs and/or bunch
graphs that may be particularly advantageous in recognizing a human
face over a wide variety of pose angles (see, e.g., U.S. Pat. No.
6,301,370 to Steffens et al., incorporated by reference). A further
approach may be based on object identification, e.g., also using
eigenvector techniques (see, e.g., U.S. Pat. No. 6,501,857 to
Gotsman et al., incorporated by reference). This approach may use
smooth weak vectors to produce near-zero matches, or alternatively,
a system may employ strong vector thresholds to detect matches.
This technique may be advantageously applied to face detection and
recognition in complex backgrounds.
[0210] Another field of application for face detection and/or
tracking techniques, particularly for digital image processing in
accordance with preferred and alternative embodiments herein, is
the extraction of facial features to allow the collection of
biometric data and tracking of personnel, or the classification of
customers based on age, sex and other categories which can be
related to data determined from facial features. Knowledge-based
electronics and/or software or firmware may be used to provide
automatic feature detection and age classification of human faces
in digital images (see, e.g., U.S. Pat. No. 5,781,650 to Lobo &
Kwon, hereby incorporated by reference). Face detection and feature
extraction may be based on templates (see U.S. Pat. No. 5,835,616
also to Lobo & Kwon, incorporated by reference). A system
and/or method for biometrics-based facial feature extraction may be
employed using a combination of disparity mapping, edge detection
and filtering to determine co-ordinates for facial features in the
region of interest (see, e.g., U.S. Pat. No. 6,526,161 to Yan,
incorporated by reference). A method for the automatic detection
and tracking of personnel may utilize modules to track a users head
or face (see, e.g., U.S. Pat. No. 6,188,777, incorporated by
reference). For example, a depth estimation module, a color
segmentation module and/or a pattern classification module may be
used. Data from each of these modules can be combined to assist in
the identification of a user and the system can track and respond
to a user's head or face in real-time.
[0211] The preferred and alternative embodiments may be applied in
the field of digital photography. For example, automatic
determination of main subjects in photographic images may be
performed (see, e.g., U.S. Pat. No. 6,282,317 to Luo et al.,
incorporated by reference). Regions of arbitrary shape and size may
be extracted from a digital image. These may be grouped into larger
segments corresponding to physically coherent objects. A
probabilistic reasoning engine may then estimate the region which
is most likely to be the main subject of the image.
[0212] Faces may be detected in complex visual scenes and/or in a
neural network based face detection system, particularly for
digital image processing in accordance with preferred or
alternative embodiments herein (see, e.g., U.S. Pat. No. 6,128,397
to Baluja & Rowley; and "Neural Network-Based Face Detection,"
IEEE Transactions on Pattern Analysis and Machine Intelligence,
Vol. 20, No. 1, pages 23-28, January 1998 by the same authors, each
reference being hereby incorporated by reference. In addition, an
image may be rotated prior to the application of the neural network
analysis in order to optimize the success rate of the
neural-network based detection (see, e.g., U.S. Pat. No. 6,128,397,
incorporated by reference). This technique is particularly
advantageous when faces are oriented vertically. Face detection in
accordance with preferred and alternative embodiments, and which
are particularly advantageous when a complex background is
involved, may use one or more of skin color detection, spanning
tree minimization and/or heuristic elimination of false positives
(see, e.g., U.S. Pat. No. 6,263,113 to Abdel-Mottaleb et al.,
incorporated by reference).
[0213] A broad range of techniques may be employed in image
manipulation and/or image enhancement in accordance with preferred
and alternative embodiments, may involve automatic, semi-automatic
and/or manual operations, and are applicable to several fields of
application. Some of the discussion that follows has been grouped
into subcategories for ease of discussion, including (i) Contrast
Normalization and Image Sharpening; (ii) Image Crop, Zoom and
Rotate; (iii) Image Color Adjustment and Tone Scaling; (iv)
Exposure Adjustment and Digital Fill Flash applied to a Digital
Image; (v) Brightness Adjustment with Color Space Matching; and
Auto-Gamma determination with Image Enhancement; (vi) Input/Output
device characterizations to determine Automatic/Batch Image
Enhancements; (vii) In-Camera Image Enhancement; and (viii) Face
Based Image Enhancement. Other alternative embodiments may employ
techniques provided at U.S. application Ser. No. 10/608,784, filed
Jun. 26, 2003, which is hereby incorporated by reference.
Slide Show Based on One or More Image Features or Regions of
Interest
[0214] Therefore in one embodiment, the creation of a slide show is
based on the automated detection of face regions. In other
embodiments, other image features, regions of interest (ROI) and/or
characteristics are detected and employed in combination with
detected face regions or independently to automatically construct a
sophisticated slide show which highlights key features within a
single image and or multiple images such as a sequence of
images.
[0215] Examples of image features or regions, in addition to faces,
are facial regions such as eyes, nose, mouth, teeth, cheeks, ears,
eyebrows, forehead, hair, and parts or combinations thereof, as
well as foreground and background regions of an image. Another
example of a region of an image is a region that includes one or
more faces and surrounding area of the image around the face or
faces.
Separation of Foreground and Background Regions
[0216] Foreground and background regions may be advantageously
separated in a preferred embodiment, which can include independent
or separate detection, processing, tracking, storing, outputting,
printing, cutting, pasting, copying, enhancing, upsampling,
downsampling, fill flash processing, transforming, or other digital
processing such as the exemplary processes provided in Tables I and
II below. Independent transformations may be made to the foreground
regions and the background regions. Such transformations are
illustrated in the tables below. Table I lists several exemplary
parameters that can be addressed regionally within an image or that
can be addressed differently or adjusted different amounts at
different regions within an image. With focus, selective
out-of-focus regions can be created, while other regions are in
focus. With saturation, selective reduction of color (gray scale)
can be created, or different regions within an image can have
different gray scales selected for them. With pixilation, selective
reduction of amount of pixels per region can be applied. Sharpening
can also be added region-by-region. With zooming, an image can be
cropped to smaller regions of interest. With panning and tilting,
it is possible to move horizontally and vertically, respectively,
within an image. With dolly, foreshortening or a change of
perspective are provided.
[0217] Table II illustrates initial and final states for different
regions, e.g., foreground and background regions, within an image
having processing applied differently to each of them. As shown,
the initial states for each region are the same with regard to
parameters such as focus, exposure sharpening and zoom, while
addressing the regions differently during processing provides
different final states for the regions. In one example, both the
foreground and background regions are initially out of focus, while
processing brings the foreground region into focus and leaves the
background region out of focus. In another example, both regions
are initially normal in focus, while processing takes the
background out of focus and leaves the foreground in focus. In
further examples, the regions are initially both normally exposed
or both under exposed, and processing results in the foreground
region being normally exposed and the background region being under
exposed or over exposed. In another example, both regions are
initially normal sharpened, and processing results in
over-sharpening of the foreground region and under-sharpening of
the background region. In a further example, a full initial image
with foreground and background is changed to a zoomed image to
include only the foreground region or to include a cropped
background region. In a further example, an initial image with
normal background and foreground regions is changed to a new image
with the foreground region zoomed in and the background region
zoomed out.
[0218] Transformations can be reversed. For example, zoom-in or
cropping may be reversed to begin with the cropped image and zoom
out, or blurring that is sharpened may be reversed into an initial
state of sharpening and final stages of blur, and so on with regard
to the examples provided, or any permutations and any combinations
of such transformations can be concatenated in various orders and
forms (e.g., zoom and blur, blur and zoom)
TABLE-US-00001 TABLE I Parameter Effect Focus Create selective
out-of-focus regions Saturate Create selective reduction of color
(gray scale) Pixelate Selectively reduce amount of pixels per
region Sharpen Add sharpening to regions Zoom in Crop image to
smaller region of interest Pan Move horizontally across the image
Tilt Move vertically up/down Dolly Change perspective,
foreshortening
Examples include:
TABLE-US-00002 TABLE II Initial State Final State Foreground
Background Foreground Background Out of Focus Out of Focus In Focus
Out of Focus Normal Normal Normal Out of Focus In Focus In Focus In
Focus Normal Normal Normal Under Exposed Good Exposure Good
Exposure Good Exposure Under Exposed Under Exposed Normal Under
Exposed Good Exposure Normal Normal Good Exposure Over Exposed Good
Exposure Good Exposure Normal Normal Over sharpened Under sharpened
Sharpening Sharpening Full Image Full Image with Zoomed image to
Cropped with Background include only FG Background Foreground
Normal Normal Zoomed in Zoomed out (foreshortening)
Alternatively, separated foreground/background regions may be
further analyzed to determine their importance/relevance. In
another embodiment, a significant background feature such as a
sunset or a mountain may be incorporated as part of a slide show
sequence. Foreground and background regions may be automatically
separated, or semi-automatically, as described at U.S. patent
application Ser. No. 11/217,788, Filed Aug. 30, 2005, which is
hereby incorporated by reference.
[0219] After separation of foreground and background regions it is
also possible to calculate a depth map of the background regions.
By calculating such a depth map at the time that an image is
acquired, it is possible to use additional depth map information to
enhance the automatic generation of a slide show.
[0220] In the embodiment which preferably uses faces, yet is
applicable to using other selected image features or regions, in
case there are multiple faces detected, interesting "camera
movement" can be simulated which includes panning/tilting from one
face to another or zooming in-out onto a selection of faces.
[0221] While an exemplary drawings and specific embodiments of the
present invention have been described and illustrated, it is to be
understood that that the scope of the present invention is not to
be limited to the particular embodiments discussed. Thus, the
embodiments shall be regarded as illustrative rather than
restrictive, and it should be understood that variations may be
made in those embodiments by workers skilled in the arts without
departing from the scope of the present invention as set forth in
the claims that follow and their structural and functional
equivalents.
[0222] In addition, in methods that may be performed according to
the claims below and/or preferred embodiments herein, the
operations have been described in selected typographical sequences.
However, the sequences have been selected and so ordered for
typographical convenience and are not intended to imply any
particular order for performing the operations, unless a particular
ordering is expressly provided or understood by those skilled in
the art as being necessary.
* * * * *
References