U.S. patent application number 17/377656 was filed with the patent office on 2022-05-19 for electronic device for estimating camera illuminant and method of the same.
This patent application is currently assigned to SAMSUNG ELECTRONICS CO., LTD.. The applicant listed for this patent is SAMSUNG ELECTRONICS CO., LTD.. Invention is credited to Abdelrahman Abdelhamed, Michael Scott BROWN, Abhijith Punnappurath.
Application Number | 20220156899 17/377656 |
Document ID | / |
Family ID | 1000005780665 |
Filed Date | 2022-05-19 |
United States Patent
Application |
20220156899 |
Kind Code |
A1 |
Abdelhamed; Abdelrahman ; et
al. |
May 19, 2022 |
ELECTRONIC DEVICE FOR ESTIMATING CAMERA ILLUMINANT AND METHOD OF
THE SAME
Abstract
A method for processing image data may include: obtaining a
first image and a second image that capture a same scene in
different views, from a first camera and a second camera,
respectively; spatially aligning the first image with the second
image; obtaining a color transformation matrix that maps the first
image to the second image based on color values of the first image
and the second image; obtaining an estimated illuminant color from
an output of a neural network by inputting the color transformation
matrix to the neural network, wherein the neural network is trained
based on a pair of reference images of a same reference scene and a
color rendition chart that are captured by different cameras having
different spectral sensitivities; and performing a white balance
correction on the first image based on the estimated illuminant
color to output a corrected first image.
Inventors: |
Abdelhamed; Abdelrahman;
(Toronto, CA) ; Punnappurath; Abhijith; (Toronto,
CA) ; BROWN; Michael Scott; (Toronto, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
SAMSUNG ELECTRONICS CO., LTD. |
Suwon-si |
|
KR |
|
|
Assignee: |
SAMSUNG ELECTRONICS CO.,
LTD.
Suwon-si
KR
|
Family ID: |
1000005780665 |
Appl. No.: |
17/377656 |
Filed: |
July 16, 2021 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
63186346 |
May 10, 2021 |
|
|
|
63114079 |
Nov 16, 2020 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06T 5/009 20130101;
H04N 5/247 20130101; G06T 7/90 20170101; G06T 2207/20084 20130101;
G06T 2207/10024 20130101; G06T 7/11 20170101; G06T 2207/20132
20130101; G06T 7/30 20170101; G06T 3/40 20130101; G06T 2207/20081
20130101 |
International
Class: |
G06T 5/00 20060101
G06T005/00; G06T 7/11 20060101 G06T007/11; G06T 3/40 20060101
G06T003/40; G06T 7/30 20060101 G06T007/30; G06T 7/90 20060101
G06T007/90; H04N 5/247 20060101 H04N005/247 |
Claims
1. An apparatus for processing image data, the apparatus
comprising: a memory storing instructions; and a processor
configured to execute the instructions to: obtain a first image and
a second image that capture a same scene in different views, from a
first camera and a second camera, respectively; spatially align the
first image with the second image; obtain a color transformation
matrix that maps the first image to the second image based on color
values of the first image and the second image; obtain an estimated
illuminant color from an output of a neural network by inputting
the color transformation matrix to the neural network, wherein the
neural network is trained based on a pair of reference images of a
same reference scene and a color rendition chart that are captured
by different cameras having different spectral sensitivities; and
perform a white balance correction on the first image based on the
estimated illuminant color to output a corrected first image.
2. The apparatus of claim 1, wherein the neural network is trained
to minimize a loss between the estimated illuminant color and a
ground-truth illuminant color, and wherein the ground-truth
illuminant color is obtained from a color value of at least one
achromatic patch in the color rendition chart.
3. The apparatus of claim 1, wherein the second image shows a wider
view of the same scene than the first image, and wherein the
processor is further configured to execute the instructions to:
crop the second image to have a same view as the first image, to
spatially align the first image with the cropped second image.
4. The apparatus of claim 1, wherein the processor is further
configured to execute the instructions to: down-sample the first
image to obtain a down-sampled first image; down-sample the cropped
second image to obtain a down-sampled second image; and compute the
color transformation matrix that maps the down-sampled first image
to the down-sampled second image based on color values of the
down-sampled first image and the down-sampled second image.
5. The apparatus of claim 1, wherein the color transformation
matrix is a three-by-three matrix that maps RGB values of the first
image to RGB values of the second image.
6. The apparatus of claim 1, wherein the output of the neural
network represents a ratio of RGB values of the estimated
illuminant color.
7. The apparatus of claim 1, wherein the neural network is further
trained using augmented images, and wherein the augmented images
are obtained by re-illuminating a first reference image and a
second reference image of different scenes under different
illuminations that are captured by a same reference camera, based
on color transformations between first color chart values of the
first reference image and second color chart values of the second
reference image.
8. The apparatus of claim 1, wherein the neural network is further
trained using augmented images, and wherein the augmented images
are obtained by re-illuminating a first reference image and a
second reference image of different scenes under different
illuminations that are captured by a same reference camera, based
on color transformations between all color values of the first
reference image and all color values of the second reference
image.
9. The apparatus of claim 1, wherein the color transformation
matrix is a first color transformation matrix, the processor is
further configured to execute the instructions to: obtain, from a
third camera, a third image that captures the same scene in a view
different from the views of the first image and the second image;
spatially align the third image with the first image; spatially
align the third image with the second image; obtain a second color
transformation matrix that maps the first image to the third image
based on the color values of the first image and color values of
the third image; obtain a third color transformation matrix that
maps the second image to the third image based on the color values
of the second image and the color values of the third image;
concatenate the first, the second, and the third color
transformation matrices to obtain a concatenated matrix; obtain the
estimated illuminant color from the output of the neural network by
inputting the concatenated matrix to the neural network; and
performing the white balance correction on the first image based on
the estimated illuminant color to output the corrected first
image.
10. The apparatus of claim 1, wherein the apparatus is a user
device in which the first camera and the second camera are mounted,
and wherein the first camera and the second camera have different
fields of view and different spectral sensitivities.
11. The apparatus of claim 1, the apparatus is a server comprising
a communication interface configured to communicate with a user
device comprising the first camera and the second camera, to
receive the first image and the second image from the user
device.
12. A method for processing image data, the method comprising:
obtaining a first image and a second image that capture a same
scene in different views, from a first camera and a second camera,
respectively; spatially aligning the first image with the second
image; obtaining a color transformation matrix that maps the first
image to the second image based on color values of the first image
and the second image; obtaining an estimated illuminant color from
an output of a neural network by inputting the color transformation
matrix to the neural network, wherein the neural network is trained
based on a pair of reference images of a same reference scene and a
color rendition chart that are captured by different cameras having
different spectral sensitivities; and performing a white balance
correction on the first image based on the estimated illuminant
color to output a corrected first image.
13. The method of claim 12, wherein the neural network is trained
to minimize a loss between the estimated illuminant color and a
ground-truth illuminant color, and wherein the ground-truth
illuminant color is obtained from a color value of at least one
achromatic patch in the color rendition chart.
14. The method of claim 12, wherein the second image shows a wider
view of the same scene than the first image, and wherein the method
further comprises: cropping the second image to have a same view as
the first image, to spatially align the first image with the
cropped second image.
15. The method of claim 12, further comprising: down-sampling the
first image to obtain a down-sampled first image; down-sampling the
cropped second image to obtain a down-sampled second image; and
computing the color transformation matrix that maps the
down-sampled first image to the down-sampled second image based on
color values of the down-sampled first image and the down-sampled
second image.
16. The method of claim 12, wherein the color transformation matrix
is a three-by-three matrix that maps RGB values of the first image
to RGB values of the second image.
17. The method of claim 12, wherein the output of the neural
network represents a ratio of RGB values of the estimated
illuminant color.
18. The method of claim 12, wherein the neural network is further
trained using augmented images, and wherein the augmented images
are obtained by re-illuminating a first reference image and a
second reference image of different scenes under different
illuminations that are captured by a same reference camera, based
on color transformations between first color chart values of the
first reference image and second color chart values of the second
reference image.
19. The method of claim 12, wherein the color transformation matrix
is a first color transformation matrix, and wherein the method
further comprises: obtaining, from a third camera, a third image
that captures the same scene in a view different from the views of
the first image and the second image; spatially aligning the third
image with the first image; spatially aligning the third image with
the second image; obtaining a second color transformation matrix
that maps the first image to the third image based on the color
values of the first image and color values of the third image;
obtaining a third color transformation matrix that maps the second
image to the third image based on the color values of the second
image and the color values of the third image; concatenating the
first, the second, and the third color transformation matrices to
obtain a concatenated matrix; obtaining the estimated illuminant
color from the output of the neural network by inputting the
concatenated matrix to the neural network; and performing the white
balance correction on the first image based on the estimated
illuminant color to output the corrected first image.
20. A non-transitory computer readable storage medium storing a
program to be executable by at least one processor to perform a
method for processing image data, the method comprising: obtaining
a first image and a second image that capture a same scene in
different views, from a first camera and a second camera,
respectively; spatially aligning the first image with the second
image; obtaining a color transformation matrix that maps the first
image to the second image based on color values of the first image
and the second image; obtaining an estimated illuminant color from
an output of a neural network by inputting the color transformation
matrix to the neural network, wherein the neural network is trained
based on a pair of reference images of a same reference scene and a
color rendition chart that are captured by different cameras having
different spectral sensitivities; and performing a white balance
correction on the first image based on the estimated illuminant
color to output a corrected first image.
Description
CROSS-REFERENCE TO RELATED APPLICATION(S)
[0001] This application is based on and claims priority under 35
U.S.C. .sctn. 119 to U.S. Provisional Patent Application No.
63/114,079 filed on Nov. 16, 2020, U.S. Provisional Patent
Application No. 63/186,346 filed on May 10, 2021, in the U.S.
Patent & Trademark Office, the disclosures of which are
incorporated herein by reference in their entireties.
BACKGROUND
1. Field
[0002] The disclosure relates to a system and method for estimating
a scene illumination using a neural network configured to predict
the scene illumination based on two or more images of the same
scene that are simultaneously captured by two or more cameras
having different spectral sensitivities, and performing white
balance corrections on the captured images.
2. Description of Related Art
[0003] In processing camera captured images, illuminant estimation
is a critical step for computational color constancy. Color
constancy refers to the ability of the human visual system to
perceive scene colors as being the same even when observed under
different illuminations. Cameras do not innately possess this
illumination adaptation ability, and a raw-RGB image recorded by a
camera sensor has significant color cast due to the scene's
illumination. As a result, computational color constancy is applied
to the camera's raw-RGB sensor image as one of the first steps in
the in-camera imaging pipeline to remove this undesirable color
cast.
[0004] In the related art, color constancy is achieved using (1) a
statistics-based method or (2) a learning-based method.
[0005] Statistics-based methods operate using statistics from an
image's color distribution and spatial layout to estimate the scene
illuminant. These statistics-based methods are fast and easy to
implement. However, these statistics-based methods make very strong
assumptions about scene content and fail in cases where these
assumptions do not hold.
[0006] Learning-based methods use labelled training data where the
ground truth illumination corresponding to each input image is
known from physical color charts placed in the scene. In general,
learning-based approaches are shown to be more accurate than
statistical-based methods. However, learning-based methods in the
related art usually include many more parameters than
statistics-based ones. The number of parameters could reach up to
tens of millions in some models, which result in a relatively
longer training time.
SUMMARY
[0007] One or more example embodiments provide a system and method
for estimating a scene illumination using a neural network
configured to predict the scene illumination based on two or more
images of the same scene that are simultaneously captured by two or
more cameras having different spectral sensitivities. The
multiple-camera setup may provide a benefit of improving the
accuracy of illuminant estimation.
[0008] According to an aspect of an example embodiment, an
apparatus for processing image data, may include: a memory storing
instructions; and a processor configured to execute the
instructions to: obtain a first image and a second image that
capture a same scene in different views, from a first camera and a
second camera, respectively; spatially align the first image with
the second image; obtain a color transformation matrix that maps
the first image to the second image based on color values of the
first image and the second image; obtain an estimated illuminant
color from an output of a neural network by inputting the color
transformation matrix to the neural network, wherein the neural
network is trained based on a pair of reference images of a same
reference scene and a color rendition chart that are captured by
different cameras having different spectral sensitivities; and
perform a white balance correction on the first image based on the
estimated illuminant color to output a corrected first image.
[0009] The neural network may be trained to minimize a loss between
the estimated illuminant color and a ground-truth illuminant color,
and the ground-truth illuminant color may be obtained from a color
value of at least one achromatic patch in the color rendition
chart.
[0010] The second image may show a wider view of the same scene
than the first image, and the processor may be further configured
to execute the instructions to: crop the second image to have a
same view as the first image, to spatially align the first image
with the cropped second image.
[0011] The processor may be further configured to execute the
instructions to: down-sample the first image to obtain a
down-sampled first image; down-sample the cropped second image to
obtain a down-sampled second image; and compute the color
transformation matrix that maps the down-sampled first image to the
down-sampled second image based on color values of the down-sampled
first image and the down-sampled second image.
[0012] The color transformation matrix may be a three-by-three
matrix that maps RGB values of the first image to RGB values of the
second image.
[0013] The output of the neural network may represent a ratio of
RGB values of the estimated illuminant color.
[0014] The neural network may be further trained using augmented
images, and the augmented images may be obtained by re-illuminating
a first reference image and a second reference image of different
scenes under different illuminations that are captured by a same
reference camera, based on color transformations between first
color chart values of the first reference image and second color
chart values of the second reference image.
[0015] The neural network may be further trained using augmented
images, and the augmented images may be obtained by re-illuminating
a first reference image and a second reference image of different
scenes under different illuminations that are captured by a same
reference camera, based on color transformations between all color
values of the first reference image and all color values of the
second reference image.
[0016] The color transformation matrix may correspond to a first
color transformation matrix. The processor may be further
configured to execute the instructions to: obtain, from a third
camera, a third image that captures the same scene in a view
different from the views of the first image and the second image;
spatially align the third image with the first image; spatially
align the third image with the second image; obtain a second color
transformation matrix that maps the first image to the third image
based on the color values of the first image and color values of
the third image; obtain a third color transformation matrix that
maps the second image to the third image based on the color values
of the second image and the color values of the third image;
concatenate the first, the second, and the third color
transformation matrices to obtain a concatenated matrix; obtain the
estimated illuminant color from the output of the neural network by
inputting the concatenated matrix to the neural network; and
performing the white balance correction on the first image based on
the estimated illuminant color to output the corrected first
image.
[0017] The apparatus may be a user device in which the first camera
and the second camera are mounted, and the first camera and the
second camera may have different fields of view and different
spectral sensitivities.
[0018] The apparatus may be a server including a communication
interface configured to communicate with a user device including
the first camera and the second camera, to receive the first image
and the second image from the user device.
[0019] According to an aspect of an example embodiment, a method
for processing image data may include: obtaining a first image and
a second image that capture a same scene in different views, from a
first camera and a second camera, respectively; spatially aligning
the first image with the second image; obtaining a color
transformation matrix that maps the first image to the second image
based on color values of the first image and the second image;
obtaining an estimated illuminant color from an output of a neural
network by inputting the color transformation matrix to the neural
network, wherein the neural network is trained based on a pair of
reference images of a same reference scene and a color rendition
chart that are captured by different cameras having different
spectral sensitivities; and performing a white balance correction
on the first image based on the estimated illuminant color to
output a corrected first image.
[0020] The neural network may be trained to minimize a loss between
the estimated illuminant color and a ground-truth illuminant color,
and wherein the ground-truth illuminant color may be obtained from
a color value of at least one achromatic patch in the color
rendition chart.
[0021] The second image may show a wider view of the same scene
than the first image, and the method may further include: cropping
the second image to have a same view as the first image, to
spatially align the first image with the cropped second image.
[0022] The method may further include: down-sampling the first
image to obtain a down-sampled first image; down-sampling the
cropped second image to obtain a down-sampled second image; and
computing the color transformation matrix that maps the
down-sampled first image to the down-sampled second image based on
color values of the down-sampled first image and the down-sampled
second image.
[0023] The color transformation matrix may be a three-by-three
matrix that maps RGB values of the first image to RGB values of the
second image.
[0024] The output of the neural network may represent a ratio of
RGB values of the estimated illuminant color.
[0025] The neural network may be further trained using augmented
images, and the augmented images may be obtained by re-illuminating
a first reference image and a second reference image of different
scenes under different illuminations that are captured by a same
reference camera, based on color transformations between first
color chart values of the first reference image and second color
chart values of the second reference image.
[0026] The color transformation matrix may correspond to a first
color transformation matrix. The method may further include:
obtaining, from a third camera, a third image that captures the
same scene in a view different from the views of the first image
and the second image; spatially aligning the third image with the
first image; spatially aligning the third image with the second
image; obtaining a second color transformation matrix that maps the
first image to the third image based on the color values of the
first image and color values of the third image; obtaining a third
color transformation matrix that maps the second image to the third
image based on the color values of the second image and the color
values of the third image; concatenating the first, the second, and
the third color transformation matrices to obtain a concatenated
matrix; obtaining the estimated illuminant color from the output of
the neural network by inputting the concatenated matrix to the
neural network; and performing the white balance correction on the
first image based on the estimated illuminant color to output the
corrected first image.
[0027] According to an aspect of an example embodiment, a
non-transitory computer readable storage medium storing a program
to be executable by at least one processor to perform a method for
processing image data, including: obtaining a first image and a
second image that capture a same scene in different views, from a
first camera and a second camera, respectively; spatially aligning
the first image with the second image; obtaining a color
transformation matrix that maps the first image to the second image
based on color values of the first image and the second image;
obtaining an estimated illuminant color from an output of a neural
network by inputting the color transformation matrix to the neural
network, wherein the neural network is trained based on a pair of
reference images of a same reference scene and a color rendition
chart that are captured by different cameras having different
spectral sensitivities; and performing a white balance correction
on the first image based on the estimated illuminant color to
output a corrected first image.
[0028] Additional aspects will be set forth in part in the
description that follows and, in part, will be apparent from the
description, or may be learned by practice of the presented
embodiments of the disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS
[0029] The above and other aspects, features, and aspects of
embodiments of the disclosure will be more apparent from the
following description taken in conjunction with the accompanying
drawings, in which:
[0030] FIG. 1 is a diagram of a system for performing image
processing using a pair of cameras according to an embodiment;
[0031] FIG. 2 is a diagram of a user device and spectral
sensitivities of a pair of cameras mounted on the user device
according to an embodiment;
[0032] FIG. 3 illustrates a wrap and crop operation according to an
embodiment;
[0033] FIG. 4 is a diagram of a neural network for estimating
illumination of a scene captured by a pair of cameras according to
an embodiment;
[0034] FIG. 5 is a diagram of devices of the system for performing
the image processing according to an embodiment;
[0035] FIG. 6 is a diagram of components of the devices of FIG. 5
according to an embodiment;
[0036] FIG. 7 is a diagram of a system for training a neural
network of FIG. 5 according to an embodiment;
[0037] FIG. 8 illustrates a data augmentation process according to
an embodiment;
[0038] FIG. 9 illustrates a data augmentation process based on full
matrix transformation between color rendition charts captured in
images according to an embodiment; and
[0039] FIG. 10 illustrates a data augmentation process based on
diagonal transformation between illuminants according to an
embodiment;
[0040] FIG. 11 illustrates a data augmentation process based on
full matrix transformation between images according to an
embodiment; and
[0041] FIG. 12 is a diagram of a system for performing image
processing using more than two cameras according to an
embodiment.
DETAILED DESCRIPTION
[0042] The following detailed description of example embodiments
refers to the accompanying drawings. The same reference numbers in
different drawings may identify the same or similar elements.
[0043] Example embodiments of the present disclosure are directed
to estimating a scene illumination in the RGB color space of camera
sensors, and applying a matrix computed from estimated scene
illumination parameters to perform a white-balance correction.
[0044] FIG. 1 is a diagram of a method for estimating illumination
of a physical scene using a neural network according to an
embodiment.
[0045] As shown in FIG. 1, an image signal processing is performed
using a pair of images of the same physical scene that are
simultaneously captured by two different cameras, a first camera
111 and a second camera 112. According to embodiments of the
present disclosure, both the illuminant for the first camera 111
and the illuminant for the second camera 112 are predicted, but for
simplicity, the method shown in FIG. 1 focuses on estimating the
illuminant for the first camera 111.
[0046] Referring to FIG. 2, the two cameras 111 and 112 may have
different focal lengths and lens configurations to allow a user
device (e.g., a smartphone) 110 to deliver DSLR-like optical
capabilities of providing a wide-angle view and a telephoto. Also,
the two cameras 111 and 112 may have different spectral
sensitivities and therefore may provide different spectral
measurements of the physical scene.
[0047] Graphs (a) and (b) shown in FIG. 2 represent the spectral
sensitivities of the first camera 111 and the second camera 112 in
RGB channels, respectively.
[0048] For example, the pitch of photodiodes and the overall
resolutions of the two image sensors (e.g., charge-coupled device
(CCD) sensors) mounted in the first camera 111 and the second
camera 112 may be different from each other to accommodate the
different optics associated with each sensor. Also, different color
filter arrays (CFA) may be used in the first camera 111 and the
second camera 112 according to the different optics, which may
result in the different spectral sensitivities to incoming light as
shown in graphs (a) and (b) of FIG. 2.
[0049] The first camera 111 and the second camera 112 may
simultaneously capture a first (unprocessed) raw-RGB image and a
second (unprocessed) raw-RGB image of the same scene, respectively,
that provide different spectral measurements of the scene.
[0050] The first raw-RGB image and the second raw-RGB image may
have different views while capturing the same scene. The image
signal processing according to an embodiment of the present
disclosure may use the color values of the scene captured with the
different spectral sensitivities to estimate the scene illumination
since the color values are correlated with the scene
illumination.
[0051] Referring back to FIG. 1, the image signal processing may
include: image alignment operation S110 for spatially aligning a
pair of images, color transformation operation S120 for computing
color transformation between the images, illumination estimation
operation S130 for estimate the scene illumination using a neural
network, and white balance operation S140 for correcting scene
colors in the images based on the estimated scene illumination.
[0052] In image alignment operation S110, a global homography may
be used to align two different images of the same scene having
different fields of view, and then down-sampling is performed on
the aligned two images, prior to computing color transformation
between the two images.
[0053] Specifically, down-sampling S111 and S113 and warping and
cropping S112 are performed to register the pair of the first
raw-RGB image and the second raw-RGB image, which capture the same
scene but have different fields of view.
[0054] In a first processing pipeline, the first raw-RGB image is
downscaled by a preset factor (e.g., a factor of six) in operation
S111.
[0055] In a second processing pipeline, either or both of image
warping and image cropping S112 are performed on the second raw-RGB
image to align the second raw-RGB image with the first raw-RGB
image. For example, in the second processing pipeline, the second
raw-RGB image is cropped to have the same size of the field of view
as the first raw-RGB image. Additionally, any one or any
combination of transformation, rotation, and translation may be
applied to the second raw-RGB image so that the same objects in the
first raw-RGB image and the second raw-RGB image are located at the
same pixel coordinates.
[0056] FIG. 3 illustrates a wrap and crop operation according to an
embodiment of the disclosure. A pre-calibrated perspective
transform H is calculated between the first and second cameras 111
and 112, and the perspective transform H is applied to the second
raw-RGB image to align the second raw-RGB image with the first
raw-RGB image.
[0057] As shown in FIG. 3, the first camera 1 and the second camera
2 may capture a preset pattern to obtain image 1 and image 2,
respectively.
[0058] At least four points x'.sub.1, x'.sub.2, x'.sub.3, and
x'.sub.4 are selected from image 1 to compute the perspective
transform H.
x'.sub.1=(x'.sub.1,y'.sub.1,1).sup.T
x'.sub.2=(x'.sub.2,y'.sub.2,1).sup.T
x'.sub.3=(x'.sub.3,y'.sub.3,1).sup.T
x'.sub.4=(x'.sub.4,y'.sub.4,1).sup.T
[0059] The corresponding points x.sub.1, x.sub.2, x.sub.3, and
x.sub.4 in image 2 are represented as follows:
x.sub.1=(x.sub.1,y.sub.1,1).sup.T
x.sub.2=(x.sub.2,y.sub.2,1).sup.T
x.sub.3=(x.sub.3,y.sub.3,1).sup.T
x.sub.4=(x.sub.4,y.sub.4,1).sup.T
[0060] Matrix h [h.sub.1, h.sub.2, h.sub.3, h.sub.4, h.sub.5,
h.sub.6, h.sub.7, h.sub.8, h.sub.9] is obtained based on the
following:
[ 0 T - x 1 T y 1 ' .times. x 1 T x 1 T 0 T - x 1 ' .times. x 1 T 0
T - x 2 T y 2 ' .times. x 2 T x 2 T 0 T - x 2 ' .times. x 2 T 0 T -
x 4 T y 4 ' .times. x 4 T x 4 T 0 T - x 4 ' .times. x 4 T ] 8
.times. 9 .times. [ h 1 h 2 h 3 h 4 h 5 h 6 h 7 h 8 h 9 ] 9 .times.
1 = 0 8 .times. 1 ##EQU00001##
[0061] Using matrix h [h.sub.1, h.sub.2, h.sub.3, h.sub.4, h.sub.5,
h.sub.6, h.sub.7, h.sub.8, h.sub.9], the perspective transform H is
obtained as follows:
H = [ h 1 h 2 h 3 h 4 h 5 h 6 h 7 h 8 h 9 ] 3 .times. 3
##EQU00002##
[0062] Once the perspective transform H is computed using the
calibration pattern, the warp and crop operation for a new scene is
performed by applying the perspective transform H to an image
captured by the second camera 112 (e.g., the second raw-RGB image).
In an example embodiment, the warp and crop operation may be
performed only once for the two cameras 111 and 112, rather than
being performed individually for new images captured by the cameras
111 and 112.
[0063] Once the second raw-RGB image is aligned with the first
raw-RGB image, down-sampling S113 is performed on the aligned
second raw-RGB image.
[0064] The down-sampling S111 and the down-sampling S113 may use
the same down-sampling factor to allow the down-sampled first
raw-RGB image and the down-sampled first raw-RGB image to have
substantially the same resolution.
[0065] However, the present embodiment is not limited thereto, and
different down-sampling factors may be used for the down-sampling
S111 and the down-sampling S113. Also, the first processing
pipeline including operation S111 and the second processing
pipeline including operations S112 and S113 may be executed in
parallel or in sequence.
[0066] The down-sampling S111 and the down-sampling S113 prior to
computing the color transformation, may make the illumination
estimation robust to any small misalignments and slight parallax in
the two views. Since the hardware arrangement of the two cameras
111 and 112 does not change for a given device (e.g., the user
device 110), the homography can be pre-computed and remains fixed
for all image pairs from the same device.
[0067] In color transformation operation S120, a color
transformation matrix is computed to map the down-sampled first
raw-RGB image from the first camera 111 to the corresponding
aligned and down-sampled second raw-RGB image from the second
camera 112. Fora particular scene illuminant, the color
transformation between the two different images of the same scene
may have a unique signature that is related to the scene
illumination. Accordingly, the color transformation itself may be
used as the feature for illumination estimation.
[0068] Given the first raw-RGB image I.sub.1.di-elect
cons.R.sup.n.times.3 and the second raw-RGB image I.sub.2 .di-elect
cons.R.sup.n.times.3 with n pixels of the same scene captured by
the first camera 111 and the second camera 112, under the same
illumination L.di-elect cons.R.sup.3, there exists a linear color
transformation T.di-elect cons.R.sup.3.times.3 between the color
values of the first raw-RGB images and the second raw-RGB image
I.sub.2 as:
I.sub.2.apprxeq.I.sub.1T Equation (1)
[0069] such that T is unique to the scene illumination L.
[0070] T is computed using the pseudo inverse, as follows:
T=(I.sub.1.sup.TI.sub.1).sup.-1I.sub.1.sup.TI.sub.2 Equation
(2)
[0071] For example, the linear color transformation T may be
represented in a 3.times.3 color transformation matrix as
follows:
T 3 .times. 3 = ( t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 )
##EQU00003##
[0072] More specifically, given A denotes pixel values in R, G, B
color channels for the down-sampled first raw-RGB image, B denotes
pixel values in R, G, B color channels for the aligned and
down-sampled second raw-RGB image, the 3.times.3 color
transformation matrix T between A and B is calculated as
follows.
A .times. T = B ##EQU00004## A = [ a 1 .times. R a 1 .times. G a 1
.times. B a 2 .times. R a 2 .times. G a 2 .times. B a NR a NG a NB
] ##EQU00004.2## T = [ t 11 t 12 t 13 t 21 t 22 t 23 t 31 t 32 t 33
] ##EQU00004.3## B = [ b 1 .times. R b 1 .times. G b 1 .times. B b
2 .times. R b 2 .times. G b 2 .times. B b NR b NG b NB ]
##EQU00004.4##
[0073] In the matrices of A and B, the three columns correspond to
R, G, B color channels, and the rows correspond to the number of
pixels in the down-sampled first raw-RGB image and the aligned and
down-sampled second raw-RGB image, respectively.
[0074] Using a pseudo-inverse equation, the 3.times.3 color
transformation matrix T is calculated as follows:
T = ( [ a 1 .times. R a 1 .times. G a 1 .times. B a 2 .times. R a 2
.times. G a 2 .times. B a NR a NG a NB ] T .function. [ a 1 .times.
R a 1 .times. G a 1 .times. B a 2 .times. R a 2 .times. G a 2
.times. B a NR a NG a NB ] ) - 1 .function. [ a 1 .times. R a 1
.times. G a 1 .times. B a 2 .times. R a 2 .times. G a 2 .times. B a
NR a NG a NB ] T .times. [ b 1 .times. R b 1 .times. G b 1 .times.
B b 2 .times. R b 2 .times. G b 2 .times. B b NR b NG b NB ]
##EQU00005##
[0075] In the embodiment, the 3.times.3 color transformation matrix
is used since the 3.times.3 color transformation matrix is linear
and accurate, and computationally efficient. However, the size of
the color transformation matrix is not limited thereto, and any
3.times.M color transformation matrix (wherein M=3) may be
used.
[0076] In illumination estimation operation S130, a neural network
trained for estimating the illumination of the scene (e.g., the
illuminant color) receives, as input, the color transformation, and
outputs a two-dimensional (2D) chromaticity value that corresponds
to the illumination estimation of the scene. The 2D chromaticity
value may be represented by a ratio of R, G, and B values, such as
2D [R/G B/G]. For example, the estimated illumination {circumflex
over (L)} is expressed as:
L ^ = ( r ^ b ^ ) = ( r ^ 1 b ^ ) ##EQU00006##
[0077] Referring to FIG. 4, the neural network may include an input
layer having nine (9) nodes for receiving the nine (9) parameters
of the 3.times.3 color transformation matrix, an output layer
having two nodes for outputting the 2D chromaticity value, a set of
hidden layers placed between the input layer and the output layer.
For example, each hidden layer may include nine (9) nodes.
[0078] The neural network according to an example embodiment may be
required to process only the nine parameters in the color
transformation matrix, and as a result, the neural network is
relatively very light compared with other image processing
networks, and therefore is capable of being efficiently run
on-device in real time.
[0079] A method and a system for training the neural network will
be described later with reference to FIG. 7.
[0080] Referring back to FIG. 1, in white balance operation S140, a
white balance gain of the first raw-RGB image is adjusted based on
the estimated illumination of the light source at the scene.
[0081] Parameters such as the R gain and the B gain (i.e., the gain
values for the red color channel and the blue color channel) for
white balance adjustment are calculated based upon a preset
algorithm.
[0082] In an embodiment, white balance correction factors (e.g.,
.alpha., .beta., .gamma.) are selected for the first raw-RGB image
based on the estimated illumination, and each color component
(e.g., R.sub.WB, G.sub.WB, B.sub.WB) of the first raw-RGB image is
multiplied with its respective correction factor (e.g., .alpha.,
.beta., .gamma.) to obtain white-balanced color components (e.g.,
.alpha.R.sub.WB, .beta.G.sub.WB, .gamma.B.sub.WB).
[0083] In an embodiment, a R/G correction factor and a B/G
correction factor may be computed based on the estimated
illumination, to adjust the R/G gain and B/G gain of the first
raw-RGB image.
[0084] FIG. 5 is a diagram of devices for performing the
illumination estimation according to an embodiment. FIG. 5 includes
a user device 110, a server 120, and a network 130. The user device
110 and the server 120 may interconnect via wired connections,
wireless connections, or a combination of wired and wireless
connections.
[0085] The user device 110 includes one or more devices configured
to generate an output image. For example, the user device 110 may
include a computing device (e.g., a desktop computer, a laptop
computer, a tablet computer, a handheld computer, a smart speaker,
a server, etc.), a mobile phone (e.g., a smart phone, a
radiotelephone, etc.), a camera device, a wearable device (e.g., a
pair of smart glasses or a smart watch), or a similar device.
[0086] The server 120 includes one or more devices configured to
train a neural network for predicting the scene illumination using
camera images to correct scene colors in the camera images. For
example, the server 120 may be a server, a computing device, or the
like. The server 120 may receive camera images from an external
device (e.g., the user device 110 or another external device),
train a neural network for predicting illumination parameters using
the camera images, and provide the trained neural network to the
user device 110 to permit the user device 110 to generate an output
image using the neural network.
[0087] The network 130 includes one or more wired and/or wireless
networks. For example, network 130 may include a cellular network
(e.g., a fifth generation (5G) network, a long-term evolution (LTE)
network, a third generation (3G) network, a code division multiple
access (CDMA) network, etc.), a public land mobile network (PLMN),
a local area network (LAN), a wide area network (WAN), a
metropolitan area network (MAN), a telephone network (e.g., the
Public Switched Telephone Network (PSTN)), a private network, an ad
hoc network, an intranet, the Internet, a fiber optic-based
network, or the like, and/or a combination of these or other types
of networks.
[0088] The number and arrangement of devices and networks shown in
FIG. 5 are provided as an example. In practice, there may be
additional devices and/or networks, fewer devices and/or networks,
different devices and/or networks, or differently arranged devices
and/or networks than those shown in FIG. 5. Furthermore, two or
more devices shown in FIG. 5 may be implemented within a single
device, or a single device shown in FIG. 5 may be implemented as
multiple, distributed devices. Additionally, or alternatively, a
set of devices (e.g., one or more devices) may perform one or more
functions described as being performed by another set of
devices.
[0089] FIG. 6 is a diagram of components of one or more devices of
FIG. 5 according to an embodiment. Device 200 may correspond to the
user device 110 and/or the server 120.
[0090] As shown in FIG. 6, the device 200 may include a bus 210, a
processor 220, a memory 230, a storage component 240, an input
component 250, an output component 260, and a communication
interface 270.
[0091] The bus 210 includes a component that permits communication
among the components of the device 200. The processor 220 is
implemented in hardware, firmware, or a combination of hardware and
software. The processor 220 is a central processing unit (CPU), a
graphics processing unit (GPU), an accelerated processing unit
(APU), a microprocessor, a microcontroller, a digital signal
processor (DSP), a field-programmable gate array (FPGA), an
application-specific integrated circuit (ASIC), or another type of
processing component. The process 220 includes one or more
processors capable of being programmed to perform a function.
[0092] The memory 230 includes a random access memory (RAM), a read
only memory (ROM), and/or another type of dynamic or static storage
device (e.g., a flash memory, a magnetic memory, and/or an optical
memory) that stores information and/or instructions for use by the
processor 220.
[0093] The storage component 240 stores information and/or software
related to the operation and use of the device 200. For example,
the storage component 240 may include a hard disk (e.g., a magnetic
disk, an optical disk, a magneto-optic disk, and/or a solid state
disk), a compact disc (CD), a digital versatile disc (DVD), a
floppy disk, a cartridge, a magnetic tape, and/or another type of
non-transitory computer-readable medium, along with a corresponding
drive.
[0094] The input component 250 includes a component that permits
the device 200 to receive information, such as via user input
(e.g., a touch screen display, a keyboard, a keypad, a mouse, a
button, a switch, and/or a microphone). The input component 250 may
include a sensor for sensing information (e.g., a global
positioning system (GPS) component, an accelerometer, a gyroscope,
and/or an actuator).
[0095] In particular, the input component 250 may include two or
more cameras, including the first camera 111 and the second camera
112 illustrated in FIG. 2. The first camera 111 and the second
camera 112 may be rear-facing cameras that have different spectral
sensitivities and have different fields of view from each
other.
[0096] The output component 260 includes a component that provides
output information from the device 200 (e.g., a display, a speaker,
and/or one or more light-emitting diodes (LEDs)).
[0097] The communication interface 270 includes a transceiver-like
component (e.g., a transceiver and/or a separate receiver and
transmitter) that enables the device 200 to communicate with other
devices, such as via a wired connection, a wireless connection, or
a combination of wired and wireless connections. The communication
interface 270 may permit device 200 to receive information from
another device and/or provide information to another device. For
example, the communication interface 270 may include an Ethernet
interface, an optical interface, a coaxial interface, an infrared
interface, a radio frequency (RF) interface, a universal serial bus
(USB) interface, a Wi-Fi interface, a cellular network interface,
or the like.
[0098] The device 200 may perform one or more processes described
herein. The device 200 may perform operations S110-S140 based on
the processor 220 executing software instructions stored by a
non-transitory computer-readable medium, such as the memory 230
and/or the storage component 240. A computer-readable medium is
defined herein as a non-transitory memory device. A memory device
includes memory space within a single physical storage device or
memory space spread across multiple physical storage devices.
[0099] Software instructions may be read into the memory 230 and/or
the storage component 240 from another computer-readable medium or
from another device via the communication interface 270. When
executed, software instructions stored in the memory 230 and/or
storage component 240 may cause the processor 220 to perform one or
more processes described herein.
[0100] Additionally, or alternatively, hardwired circuitry may be
used in place of or in combination with software instructions to
perform one or more processes described herein. Thus, embodiments
described herein are not limited to any specific combination of
hardware circuitry and software.
[0101] FIG. 7 is a diagram of a system for training a neural
network of FIG. 4 according to an embodiment. The training process
may be performed by the user device 110 or the server 120, using
the components illustrated in FIG. 6.
[0102] The neural network according to an embodiment is trained to
predict the illuminant for the first camera 111 and the illuminant
for the second camera 112 using the same color transforms, but for
simplicity, the description of the training process in the present
disclosure focuses on estimating the illuminant for the first
camera 111.
[0103] As shown in FIG. 7, a network training process is performed
using a pair of images of the same physical scene that are
simultaneously captured by two different cameras 111 and 112. The
two cameras 111 and 112 may have different spectral sensitivities
and therefore may provide different spectral measurements for the
same scene having the same light source.
[0104] The first camera 111 and the second camera 112 may
simultaneously capture a first raw-RGB image and a second raw-RGB
image of the same scene, respectively, that provide different
spectral measurements of the scene. The first raw-RGB image and the
second raw-RGB image may have different views while capturing the
same scene.
[0105] For the purposes of training the neural network, the first
camera 111 and the second camera 112 may capture a color rendition
chart as shown in FIG. 7. The color rendition chart may allow the
first raw-RGB image and the second raw-RGB to provide a wide
distribution of colors under the scene. Also, the neutral patches
(also referred to as "achromatic patches" or "gray patches") of the
color rendition chart in the first raw-RGB image may provide a
ground truth illumination value (e.g., a ground-truth illuminant
color) for the first raw-RGB image. Likewise, the neutral patches
in the second raw-RGB image may provide a ground truth illumination
value for the second raw-RGB image.
[0106] Hereinafter, the first raw-RGB image and the second raw-RGB
image may be referred to as image 1 and image 2.
[0107] In operation S210, image 1 and image 2 are spatially aligned
with each other, for example, using a global homography. For
example, image 2 is cropped to have the same size of the field of
view as image 2, and any one or any combination of transformation,
rotation, and translation is applied to image 2 so that the same
objects (e.g., the slide) in image 1 and image 2 are located at the
same pixel coordinates.
[0108] In turn, the aligned image 1 and image 2 are down-sampled
prior to computing color transformation between image 1 and image
2. The down-sampling may make the illumination estimation robust to
any small misalignments and slight parallax in the two views of
images 1 and 2. Since the hardware arrangement of the two cameras
111 and 112 does not change for a given device, the homography can
be pre-computed and remains fixed for all image pairs from the same
device.
[0109] In operation S220, a color transformation matrix is computed
to map the down-sampled image 1 from the first camera 111 to the
corresponding aligned and down-sampled image from the second camera
112. For example, the color transformation matrix may be computed
based on Equations (1) and (2).
[0110] In operation S230, a neural network for estimating the
illumination of the scene is constructed to have the structure
shown in FIG. 4. For example, the neural network may include an
input layer having nine (9) nodes for receiving the nine (9)
parameters of a 3.times.3 color transformation matrix, an output
layer having two nodes for outputting the 2D chromaticity value, a
set of hidden layers placed between the input layer and the output
layer. The neural network according to an example embodiment may be
required to process only the nine parameters in the color
transformation matrix, and as a result, the neural network is
relatively very light compared with other image processing
networks, and therefore is capable of being efficiently run
on-device in real time.
[0111] In the training process, the neural network receives, as
input, the parameters of the color transformation matrix, and
outputs a two-dimensional (2D) chromaticity value that corresponds
to the illumination estimation of the scene. The 2D chromaticity
value may be represented as 2D [R/G B/G], indicating a ratio of a
red color value to a green color value, and a ratio of a blue color
value to the green color value.
[0112] Given a dataset of M image pairs L={(I.sub.11,I.sub.21), . .
. , (I.sub.1M,I.sub.2M)}, the corresponding color transformations
T.sub.1, . . . , T.sub.M between each pair of images are computed
using Equation (2), as follows:
T={T.sub.1, . . . ,T.sub.M}
[0113] (I.sub.11,I.sub.21) may denote image 1 and image 2, and
T.sub.1 may denote color transformation between image 1 and image
2. The training process according to the embodiment is described
using the pair of images 1 and 2, but a large number of paired
images may be used for training the neural network. Augmented
training images may be developed by applying mathematical
transformation functions to camera captured images. The description
of data augmentation will be provided later with reference to FIGS.
7-9.
[0114] In operation S240, a set of corresponding target ground
truth illuminations L of image I.sub.1i (i.e., as measured by the
first camera 111) is obtained from each pair of images as
follows:
L={L.sub.1, . . . ,L.sub.M},
[0115] L.sub.1 may denote a ground truth illumination of image 1.
The ground truth illumination L.sub.1 may be obtained by extracting
the image area of the neutral patches from image 1 and measuring
pixel colors of the neutral patches since the neutral patches work
as a good reflector of the scene illumination. For example, average
pixel colors L.sub.1 [R.sub.avg, G.sub.avg, B.sub.avg] inside the
neutral patches may be used as the ground truth illumination
L.sub.1 for image 1.
[0116] The neural network f.sub..theta.: T.fwdarw.L is trained with
parameters .theta. to model the mapping between the color
transformations T and scene illuminations L. The neural network
f.sub..di-elect cons. may predict the scene illumination L for the
first camera 111 given the color transformation T between image 1
and image 2, as follows:
{circumflex over (L)}=f.sub..theta.(T) Equation (3)
[0117] In operation S250, the neural network f.sub..di-elect cons.
is trained to minimize the loss between the predicted illuminations
{circumflex over (L)}.sub.i and the ground truth illuminations
L.sub.i as follows:
min .theta. .times. 1 M .times. i = 1 M .times. L ^ i - L i
Equation .times. .times. ( 4 ) ##EQU00007##
[0118] The neural network according to an embodiment is
lightweight, for example, consisting of a small number (e.g., 2, 5,
or 16) of dense layers, wherein each layer has nine neurons only.
The total number of parameters may range from 200 parameters for
the 2-layer neural network up to 1460 parameters for the 16-layer
neural network. The input to the neural network is the flattened
nine values of the color transformation T and the output is two
values corresponding to the illumination estimation in the 2D [R/G
B/G] chromaticity color space where the green channel's value may
be set to 1.
[0119] According to embodiments of the present disclosure, the user
device 110 or the server 120 may use the neural network that has
been trained by an external device without performing an additional
training process on the user device 110 or the server 120, or
alternatively may continue to train the neural network in real time
on the user device 110 or the server 120.
[0120] FIG. 8 illustrates a data augmentation process according to
an embodiment.
[0121] Due to the difficulty in obtaining large datasets of image
pairs captured with two cameras under the same illumination, a data
augmentation process may be performed to increase the number of
training samples and the generalizability of the model according to
an example embodiment.
[0122] As shown in FIG. 8, image I.sub.1 is captured under a source
illuminant L.sub.1[r.sub.1, g.sub.1, b.sub.1] and includes a color
rendition chart. Image I.sub.1 is re-illuminated to obtain image
I.sub.1' which appears to be captured under the target illuminant
L.sub.2[r.sub.2, g.sub.2, b.sub.2]. Image I.sub.1' as well as image
I.sub.1 may be used to train the neural network.
[0123] Various methods may be used to re-illuminate an image which
will be described with references to FIG. 9-11 hereinafter.
[0124] FIG. 9 illustrates a data augmentation process based on a
full matrix transformation between color rendition charts captured
in images according to an embodiment.
[0125] As shown in FIG. 9, a pair of captured images I.sub.1 and
I.sub.2 are used to obtain a re-illuminated image I.sub.1' that
includes the same image content as the captured image I.sub.1 but
has different color values from the captured image I.sub.1. The
captured image I.sub.1 and captured image I.sub.2 are images
captured by the same camera (e.g., the first camera 111), under
different light sources, illuminant L.sub.1 and illuminant L.sub.2,
respectively. The captured image I.sub.1 and captured image I.sub.2
both include a color rendition chart captured therein.
[0126] In order to re-illuminate the captured image I.sub.1 based
on the color values of the captured image I.sub.2, the color
rendition chart is extracted from each of the captured image
I.sub.1 and the captured image I.sub.2. A color transformation
matrix T is computed based on the color chart values of the
captured image I.sub.1 and the color chart values of the captured
image I.sub.2. The color transformation matrix T may convert the
color chart values of the captured image I.sub.1 to the color chart
values of the captured image I.sub.2.
[0127] The color transformation matrix T is applied to the captured
image I.sub.1 to transform approximately all the colors in the
captured image I.sub.1 and thereby to obtain the re-illuminated
image I.sub.1' which appears to be captured under illuminant
L.sub.2.
[0128] While FIG. 9 shows augmentation of an image pair from the
first camera 111 only, the corresponding pair of images from the
second camera 112 is augmented in the same way. Also, the captured
image I.sub.2 gas well as the captured image I.sub.1) is
re-illuminated in a similar manner, based on a color transformation
matrix that transforms the color chart values of the captured image
I.sub.2 to the color chart values of the captured image
I.sub.1.
[0129] In an example embodiment of the present disclosure, given a
small dataset of raw-RGB image pairs captured with two cameras and
including the color rendition charts, the color values of the color
chart patches (e.g., the 24 color chart patches shown in FIG. 9),
C.di-elect cons.R.sup.24.times.3, are extracted from each
image.
[0130] A color transformation T.sub.C.sup.1i.fwdarw.1j.di-elect
cons.R.sup.3.times.3 between each pair of images (I.sub.1i,
I.sub.1j) is obtained from the first camera 111 based only on the
color chart values from the two images (I.sub.1i, I.sub.1j) as
follows:
T.sub.C.sup.1i.fwdarw.1j=(I.sub.1i.sup.TI.sub.1i).sup.-1I.sub.1i.sup.TI.-
sub.1j
[0131] Similarly, the color transformation T.sub.C.sup.2i.fwdarw.2j
for image pairs (I.sub.2i, I.sub.2j) is obtained from the second
camera 112 as follows:
T.sub.C.sup.2i.fwdarw.2j=(I.sub.2i.sup.TI.sub.2i).sup.-1I.sub.2i.sup.TI.-
sub.2j
[0132] This bank of color transformations is applied to augment
images by re-illuminating any given pair of images from the two
cameras (I.sub.1i,I.sub.2i) to match their colors to any target
pair of images I.sub.1j, I.sub.2j, as follows:
I.sub.1i.fwdarw.j=I.sub.1iT.sub.C.sup.1i.fwdarw.1j
I.sub.2i.fwdarw.j=I.sub.2iT.sub.C.sup.2i.fwdarw.2j
[0133] where i.fwdarw.j means re-illuminating image i to match the
colors of image j. Using this illuminant augmentation method, the
number of training image pairs may be increased from M to
M.sup.2.
[0134] According to the data augmentation process shown in FIG. 9,
approximately all colors may be transformed since the color
rendition charts included in the images provide a wide distribution
of colors.
[0135] However, the data augmentation process is not limited to the
method of using the color rendition charts as shown in FIG. 9, and
different data augmentation methods may be applied as shown in
FIGS. 10 and 11.
[0136] FIG. 10 illustrates a data augmentation process based on a
diagonal transformation between illuminants according to an
embodiment.
[0137] Referring to FIG. 10, a source illuminant L.sub.1[r.sub.1,
g.sub.1, b.sub.1] and a target illuminant L.sub.2[r.sub.2, g.sub.2,
b.sub.2] are identified from images I.sub.1 and I.sub.2 that are
captured by the same camera (e.g., the first camera 111). A color
transformation between the source illuminant L.sub.1[r.sub.1,
g.sub.1, b.sub.1] and the target illuminant L.sub.2[r.sub.2,
g.sub.2, b.sub.2] may be obtained as follows:
[ r .times. 2 / r .times. 1 0 0 0 g .times. 2 / g .times. 1 0 0 0 b
.times. 2 / b .times. 1 ] ##EQU00008##
[0138] The color transformation is applied to image I.sub.1 to
change neutral color values of image I.sub.1 and thereby to obtain
image I.sub.1' which appears to be captured under the target
illuminant L.sub.2[r.sub.2, g.sub.2, b.sub.2]. Image I.sub.1' as
well as image I.sub.1 may be used to train the neural network.
[0139] FIG. 11 illustrates a data augmentation process based on a
full matrix transformation between images according to an
embodiment.
[0140] In an embodiment shown in FIG. 11, a color transformation
matrix T is obtained using all image colors of image I.sub.1 and
all image colors of Image I.sub.2, unlike the embodiment of FIG. 9
in which the color chart values extracted from images I.sub.1 and
I.sub.2 are used to calculate the color transformation matrix
T.
[0141] According to the embodiment shown in FIG. 11, a color
rendition chart may be omitted from images I.sub.1 and I.sub.2, and
instead, images I.sub.1 and I.sub.2 may be required to capture a
scene having a wide distribution of colors. Also, the color
transformation matrix T may be computed individually for each image
pair.
[0142] FIG. 12 is a diagram of a system for performing image
processing using more than two cameras according to an
embodiment.
[0143] When there are N cameras (wherein N>2),
( N 2 ) ##EQU00009##
3.times.3 color transformation matrices are constructed
independently using the process described with reference to FIG. 1.
The
( N 2 ) ##EQU00010##
color transformation matrices are then concatenated and fed as
input to the neural network. In particular, the feature vector that
is input to the network is of the size of
( N 2 ) .times. 9. ##EQU00011##
[0144] In detail, referring to FIG. 12, raw-RGB image 1, raw-RGB
image 2, and raw-RGB image 3 are captured by camera 1, camera 2,
and camera 3, respectively.
[0145] The raw-RGB image 1 and the raw-RGB image 2 are re aligned
with each other and down-sampled for calculation of a first color
transformation between the down-sampled raw-RGB image 1 and the
aligned and down-sampled raw-RGB image 2.
[0146] The raw-RGB image 1 and the raw-RGB image 3 are aligned with
each other and down-sampled for calculation of a second color
transformation between the down-sampled raw-RGB image 1 and the
aligned and down-sampled raw-RGB image 3.
[0147] The raw-RGB image 2 and the raw-RGB image 3 are aligned with
each other and down-sampled for calculation of a third color
transformation between the down-sampled raw-RGB image 2 and the
aligned and down-sampled raw-RGB image 3.
[0148] The first color transformation, the second color
transformation, and the third color transformation are concentrated
at a concatenation layer, and then are fed as input to a neural
network for estimating the scene illumination.
[0149] Each of the first color transformation, the second color
transformation, and the third color transformation may be a
3.times.3 matrix. The neural network may have an input layer having
27 nodes for receiving 27 parameters of the concatenated matrices,
an output layer having 2 nodes for outputting a 2D chromaticity
value for correcting color values of the raw-RGB image 1, and a set
of hidden layers located between the input layer and the output
layer.
[0150] The foregoing disclosure provides illustration and
description, but is not intended to be exhaustive or to limit the
implementations to the precise form disclosed. Modifications and
variations are possible in light of the above disclosure or may be
acquired from practice of the implementations.
[0151] As used herein, the term "component" is intended to be
broadly construed as hardware, firmware, or a combination of
hardware and software.
[0152] It will be apparent that systems and/or methods, described
herein, may be implemented in different forms of hardware,
firmware, or a combination of hardware and software. The actual
specialized control hardware or software code used to implement
these systems and/or methods is not limiting of the
implementations. Thus, the operation and behavior of the systems
and/or methods were described herein without reference to specific
software code--it being understood that software and hardware may
be designed to implement the systems and/or methods based on the
description herein.
[0153] Even though particular combinations of features are recited
in the claims and/or disclosed in the specification, these
combinations are not intended to limit the disclosure of possible
implementations. In fact, many of these features may be combined in
ways not specifically recited in the claims and/or disclosed in the
specification. Although each dependent claim listed below may
directly depend on only one claim, the disclosure of possible
implementations includes each dependent claim in combination with
every other claim in the claim set.
[0154] No element, act, or instruction used herein should be
construed as critical or essential unless explicitly described as
such. Also, as used herein, the articles "a" and "an" are intended
to include one or more items, and may be used interchangeably with
"one or more." Furthermore, as used herein, the term "set" is
intended to include one or more items (e.g., related items,
unrelated items, a combination of related and unrelated items,
etc.), and may be used interchangeably with "one or more." Where
only one item is intended, the term "one" or similar language is
used. Also, as used herein, the terms "has," "have," "having," or
the like are intended to be open-ended terms. Further, the phrase
"based on" is intended to mean "based, at least in part, on" unless
explicitly stated otherwise.
[0155] Expressions such as "at least one of," when preceding a list
of elements, modify the entire list of elements and do not modify
the individual elements of the list. For example, the expression,
"at least one of a, b, and c," should be understood as including
only a, only b, only c, both a and b, both a and c, both b and c,
all of a, b, and c, or any variations of the aforementioned
examples.
[0156] While such terms as "first," "second," etc., may be used to
describe various elements, such elements must not be limited to the
above terms. The above terms may be used only to distinguish one
element from another.
* * * * *