U.S. patent application number 14/001139 was filed with the patent office on 2014-07-10 for systems, methods, and media for reconstructing a space-time volume from a coded image.
This patent application is currently assigned to Sony Corporation. The applicant listed for this patent is Jinwei Gu, Mohit Gupta, Yasunobu Hitomi, Tomoo Mitsunaga, Shree K. Nayar. Invention is credited to Jinwei Gu, Mohit Gupta, Yasunobu Hitomi, Tomoo Mitsunaga, Shree K. Nayar.
Application Number | 20140192235 14/001139 |
Document ID | / |
Family ID | 46721259 |
Filed Date | 2014-07-10 |
United States Patent
Application |
20140192235 |
Kind Code |
A1 |
Hitomi; Yasunobu ; et
al. |
July 10, 2014 |
SYSTEMS, METHODS, AND MEDIA FOR RECONSTRUCTING A SPACE-TIME VOLUME
FROM A CODED IMAGE
Abstract
Systems, methods, and media for reconstructing a space-time
volume from a coded image are provided. In accordance with some
embodiments, systems for reconstructing a space-time volume from a
coded image are provided, the systems comprising: an image sensor
that outputs image data; and at least one processor that: causes a
projection of the space-time volume to be captured in a single
image of the image data in accordance with a coded shutter
function; receives the image data; and performs a reconstruction
process on the image data to provide a space-time volume
corresponding to the image data.
Inventors: |
Hitomi; Yasunobu; (New York,
NY) ; Gu; Jinwei; (Rochester, NY) ; Gupta;
Mohit; (New York, NY) ; Mitsunaga; Tomoo;
(Kawasaki, JP) ; Nayar; Shree K.; (New York,
NY) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Hitomi; Yasunobu
Gu; Jinwei
Gupta; Mohit
Mitsunaga; Tomoo
Nayar; Shree K. |
New York
Rochester
New York
Kawasaki
New York |
NY
NY
NY
NY |
US
US
US
JP
US |
|
|
Assignee: |
Sony Corporation
Minato-ku, Tokyo
NY
The Trustees of Columbia University in the City of New
York
New York
|
Family ID: |
46721259 |
Appl. No.: |
14/001139 |
Filed: |
February 27, 2012 |
PCT Filed: |
February 27, 2012 |
PCT NO: |
PCT/US12/26816 |
371 Date: |
March 24, 2014 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61446970 |
Feb 25, 2011 |
|
|
|
Current U.S.
Class: |
348/241 |
Current CPC
Class: |
G06T 7/50 20170101; G06T
5/002 20130101; G06T 2200/21 20130101; H04N 13/128 20180501; H04N
5/341 20130101; G06T 1/0007 20130101; H04N 13/296 20180501; H04N
5/2353 20130101; H04N 13/106 20180501; H04N 5/3535 20130101; H04N
13/388 20180501 |
Class at
Publication: |
348/241 |
International
Class: |
G06T 5/00 20060101
G06T005/00 |
Goverment Interests
STATEMENT REGARDING GOVERNMENT FUNDED RESEARCH
[0002] This invention was made with government support under grant
NSF IIS-09-64429 awarded by the National Science Foundation. The
government has certain rights in the invention.
Claims
1. A system for reconstructing a space-time volume from a coded
image, comprising: an image sensor that outputs image data; and at
least one processor that: causes a projection of the space-time
volume to be captured in a single image of the image data in
accordance with a coded shutter function; receives the image data;
and performs a reconstruction process on the image data to provide
a space-time volume corresponding to the image data.
2. The system of claim 1, wherein the reconstruction process is
based on an over-complete dictionary.
3. The system of claim 2, wherein the over-complete dictionary is
based on rotated video samples.
4. The system of claim 1, wherein the coded shutter function has
random start times for pixel exposure bumps.
5. The system of claim 1, wherein the coded shutter function has
only a single exposure period per pixel per sensor integration
time.
6. The system of claim 1, wherein the coded shutter function has at
least one pixel in each pixel neighborhood exposed during each
interval of a sensor integration time.
7. The system of claim 1, further comprising a Liquid Crystal on
Silicon chip that modulates light onto the image sensor according
to the coded shutter function in response to signals from the at
least one processor.
8. The system of claim 1, wherein the image sensor has pixels that
individually addressable.
9. The system of claim 1, wherein the reconstruction process
includes performing an orthogonal matching pursuit algorithm.
10. A method for reconstructing a space-time volume from a coded
image, comprising: causing a projection of the space-time volume to
be captured by an image sensor in a single image of image data in
accordance with a coded shutter function using a hardware
processor; receiving the image data using a hardware processor; and
performing a reconstruction process on the image data to provide a
space-time volume corresponding to the image data using a hardware
processor.
11. The method of claim 10, wherein the reconstruction process is
based on an over-complete dictionary.
12. The method of claim 11, wherein the over-complete dictionary is
based on rotated video samples.
13. The method of claim 10, wherein the coded shutter function has
random start times for pixel exposure bumps.
14. The method of claim 10, wherein the coded shutter function has
only a single exposure period per pixel per sensor integration
time.
15. The method of claim 10, wherein the coded shutter function has
at least one pixel in each pixel neighborhood exposed during each
interval of a sensor integration time.
16. The method of claim 10, further comprising modulating light
onto the image sensor using an Liquid Crystal on Silicon chip
according to the coded shutter function in response to signals from
the hardware processor.
17. The method of claim 10, wherein the image sensor has pixels
that are individually addressable.
18. The method of claim 10, wherein the reconstruction process
includes performing an orthogonal matching pursuit algorithm.
19. A non-transitory computer-readable medium containing
computer-executable instructions that, when executed by a
processor, cause the processor to perform a method for
reconstructing a space-time volume from a coded image, the method
comprising: causing a projection of the space-time volume to be
captured in a single image of image data in accordance with a coded
shutter function; receiving the image data; and performing a
reconstruction process on the image data to provide a space-time
volume corresponding to the image data.
20. The non-transitory computer-readable medium of claim 19,
wherein the reconstruction process is based on an over-complete
dictionary.
21. The non-transitory computer-readable medium of claim 20,
wherein the over-complete dictionary is based on rotated video
samples.
22. The non-transitory computer-readable medium of claim 19,
wherein the coded shutter function has random start times for pixel
exposure bumps.
23. The non-transitory computer-readable medium of claim 19,
wherein the coded shutter function has only a single exposure
period per pixel per sensor integration time.
24. The non-transitory computer-readable medium of claim 19,
wherein the coded shutter function has at least one pixel in each
pixel neighborhood exposed during each interval of a sensor
integration time.
25. The non-transitory computer-readable medium of claim 19,
wherein the method further comprises modulating light onto the
image sensor according to the coded shutter function.
26. The non-transitory computer-readable medium of claim 19,
wherein the method further comprises individually addressing pixels
of an image sensor.
27. The non-transitory computer-readable medium of claim 19,
wherein the reconstruction process includes performing an
orthogonal matching pursuit algorithm.
Description
CROSS REFERENCE TO RELATED APPLICATION
[0001] This application claims the benefit of U.S. Provisional
Patent Application No. 61/446,970, filed Feb. 25, 2011, which is
hereby incorporated by reference herein in its entirety.
TECHNICAL FIELD
[0003] Systems, methods, and media for reconstructing a space-time
volume from a coded image are provided.
BACKGROUND
[0004] Cameras face a fundamental trade-off between spatial
resolution and temporal resolution. For example, many digital still
cameras can capture images with high spatial resolution, while many
high-speed video cameras suffer from low spatial resolution. This
limitation is due in many instances to hardware factors such as
readout and analog-to-digital (A/D) conversion time of image
sensors. Although it is possible to increase the readout throughput
by introducing parallel A/D convertors and frame buffers, doing so
often requires more transistors per pixel, which lowers the fill
factor, and increases the cost, for such image sensors. As a
compromise; many current camera manufacturers implement a
"thin-out" mode, which directly trades-off the spatial resolution
for higher temporal resolution, thereby degrading the image
quality.
[0005] Accordingly, new mechanisms for providing improved temporal
resolution without sacrificing spatial resolution are
desirable.
SUMMARY
[0006] Systems, methods, and media for reconstructing a space-time
volume from a coded image are provided. In accordance with some
embodiments, systems for reconstructing a space-time volume from a
coded image are provided, the systems comprising: an image sensor
that outputs image data; and at least one processor that: causes a
projection of the space-time volume to be captured in a single
image of the image data in accordance with a coded shutter
function; receives the image data; and performs a reconstruction
process on the image data to provide a space-time volume
corresponding to the image data.
[0007] In accordance with some embodiments, methods for
reconstructing a space-time volume from a coded image are provided,
the methods comprising: causing a projection of the space-time
volume to be captured by an image sensor in a single image of image
data in accordance with a coded shutter function using a hardware
processor; receiving the image data using a hardware processor; and
performing a reconstruction process on the image data to provide a
space-time volume corresponding to the image data using a hardware
processor.
[0008] In accordance with some embodiments, non-transitory
computer-readable media containing computer-executable instructions
that, when executed by a processor, cause the processor to perform
a method for reconstructing a space-time volume from a coded image
are provided, the method comprising: causing a projection of the
space-time volume to be captured in a single image of image data in
accordance with a coded shutter function; receiving the image data;
and performing a reconstruction process on the image data to
provide a space-time volume corresponding to the image data.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] FIG. 1 is a diagram of a process for producing a space-time
volume video from a single coded image in accordance with some
embodiments.
[0010] FIG. 2 is a diagram of a process for generating a coded
shutter function in accordance with some embodiments.
[0011] FIG. 3 is a diagram of hardware that can be used in
accordance with some embodiments.
[0012] FIG. 4 is a diagram of an image sensor that can be used in
accordance with some embodiments.
DETAILED DESCRIPTION
[0013] Systems, methods, and media for reconstructing a space-time
volume from a coded image are provided. In some embodiments, these
systems, methods, and media can provide improved temporal
resolution without sacrificing spatial resolution in a captured
video.
[0014] In accordance with some embodiments, a video can be produced
by reconstructing a space-time volume E from a single coded image I
captured using a per-pixel coded shutter function S which defines
how pixels of a camera sensor capture the coded image I.
[0015] In terms of the space-time volume E and the coded shutter
function S, the coded image I can be defined as shown in equation
(1):
I(x, y)=.SIGMA..sup.N.sub.t=1S(x, y, t)E(x, y, t), (1)
where x and y correspond to the two-dimensions corresponding to an
M.times.M pixel neighborhood of a camera sensor, t corresponds to N
intervals of one integration time of the camera sensor, and the
resolution of this space-time volume E is M.times.M.times.N.
Although a neighborhood of a camera sensor is described herein as
being square (M.times.M) for simplicity and consistency, in some
embodiments, a neighborhood need not be square and can be any
suitable shape.
[0016] Equation (1) can also be written in matrix form as I=SE,
where I (observation) and E (unknowns) are vectors with M.times.M
and M.times.M.times.N elements, respectively, and S is a matrix
with M.times.M rows and M.times.M.times.N columns. Because the
number of observations (M.times.M) is significantly lower than the
number of unknowns (M.times.M.times.N), this is an under-determined
system. In some embodiments, this system can be solved and the
unknown signal E can be recovered if the signal E is sparse and the
sampling satisfies the restricted isometry property:
I=SE=SD.alpha., (2)
where D is a basis in which E is sparse, and a is the sparse
representation of E.
[0017] Turning to FIG. 1, an example of a process 100 for
reconstructing a space-time volume from a captured image is shown.
As illustrated, after process 100 begins at 102, the space-time
volume can be sampled into a coded image using a coded shutter
function at 104.
[0018] Any suitable coded shutter function can be used to capture
an image at 104, and the used shutter function can have any
suitable attributes. For example, in some embodiments, the shutter
function can have the attribute of being a binary shutter function
(i.e., S(x, y, t).di-elect cons.0, 1) wherein, at every time
interval t, the shutter is either integrating light (on) or not
(off). As another example, in some embodiments, the shutter
function can have the attribute of having only one continuous
exposure period (or "bump") for each pixel during a camera sensor's
integration time. As yet another example, in some embodiments, the
shutter function can have the attribute of having one or more bump
lengths (i.e., durations of exposure) measured in intervals t. As
still another example, in some embodiments, the shutter function
can have the attribute of having bumps that start at periodic or
random times. As a further example, in some embodiments, the
shutter function can have the attribute of having groups of pixels
having the same start time based on location (e.g., in the same
row) in a camera sensor. As a still further example, in some
embodiments, the shutter function can have the attribute that at
least one pixel of each M.times.M pixel neighborhood of a camera
sensor is sampled at each interval during the camera sensor's
integration time.
[0019] In some embodiments, a coded shutter function can include a
combination of such attributes. For example, in some embodiments, a
coded shutter function can be a binary shutter function, can have
only one continuous exposure period (or "bump") for each pixel
during a camera sensor's integration time, can have only one bump
length, can have bumps that start at random times, and can have the
attribute that at least one pixel of each M.times.M pixel
neighborhood of a camera sensor is sampled at each interval during
the camera sensor's integration time.
[0020] A process 200 for generating such a coded shutter function
in accordance with some embodiments is illustrated in FIG. 2. This
process can be performed at any suitable point or points in time
and can be performed only once in some embodiments.
[0021] As shown, after process 200 begins at 202, the process can
set a first bump length at 204. Any suitable bump length can be set
as the first bump length. For example, in some embodiments, the
first bump length can be set to one interval t.
[0022] Next, at 206, the process can select the first camera sensor
pixel. Any suitable pixel can be selected as the first camera
sensor pixel. For example, the camera sensor pixel with the lowest
set of coordinate values can be set as the first camera sensor
pixel.
[0023] Then, at 208, process 200 can randomly select (or
pseudo-randomly select) a start time during the integration time of
the camera's sensor for the selected pixel and assign the bump
length and start time to the pixel. At 210, it can be determined if
the selected pixel is the last pixel. If not, then process 200 can
select the next pixel (using any suitable technique) at 212 and
loop back to 208.
[0024] Otherwise, process 200 can next select a first M.times.M
pixel neighborhood at 214. This neighborhood can be selected in any
suitable manner. For example, a first M.times.M pixel neighborhood
can be selected as the M.times.M pixel neighborhood with the lowest
set of coordinates.
[0025] At 216, the process can then determine if at least one pixel
in the selected neighborhood was sampled at each time t. This
determination can be made in any suitable manner. For example, in
some embodiments, the process can loop through each time t and
determine if a pixel in the neighborhood has a bump that occurs
during that time t. If no pixel in the neighborhood is determined
to have a bump during the time t, then the neighborhood can be
determined as not having at least one pixel being sampled at each
time t and process 200 can loop back to 206.
[0026] Otherwise, the process can determine if the current
neighborhood is the last neighborhood at 218. This determination
can be made in any suitable manner. For example, in some
embodiments, the current neighborhood can be determined as being
the last neighborhood if it has the highest coordinate pair of all
of the neighborhoods. If it is determined that the current
neighborhood is not the last neighborhood, then process 200 can
select the next neighborhood at 220 and loop back to 216.
[0027] Otherwise, at 222, process 200 can next simulate image
capture using the bump length and start time assigned to each
pixel. Image capture can be simulated in any suitable manner. For
example, in some embodiments, image capture can be simulated using
real high-speed video data. Next, at 224, reconstruction of the
M.times.M.times.N sub-volumes and averaging of the sub-volumes to
provide a single volume can be performed as described in connection
with 106 and 108 of FIG. 1 below. Then, at 226, the peak signal to
noise ratio (PSNR) for the single volume produced at 222 and 224
can be determined. This PSNR can be determined in any suitable
manner, such as by comparing the single volume to real high-speed
video used for the simulated image capture.
[0028] At 228, process 200 can determine if the current bump length
is the last bump length to be checked. This can be determined in
any suitable manner. For example, when the bump length is equal to
the camera sensor's integration time, the bump length can be
determined to be the last bump length. If the bump length is
determined to not be the last bump length, then process 200 can
select the next bump length at 230 and loop back to 206. The next
bump length can be selected in any suitable manner. For example,
the next bump length can be set to be the previous bump length plus
one interval t in some embodiments.
[0029] Otherwise, the bump length and starting time assignments
with the best PSNR can be selected as the coded shutter function at
232. The best PSNR can be selected on any suitable basis. For
example, in some embodiment, the best PSNR can be selected as the
highest PSNR value determined in the presence of noise similar to
anticipated camera noise.
[0030] Finally, once the bump length and starting time assignments
with the best PSNR are selected as the coded shutter function,
process 200 can terminate at 234.
[0031] Referring back to FIG. 1, after sampling the space-time
volume into one coded image at 104, a reconstruction process can be
performed on patches of size M.times.M for every spatial location
in the captured image to produce volume patches of size
M.times.M.times.N at 106. This reconstruction process can be
performed in any suitable manner. For example, in some embodiments,
this reconstruction process can be performed by solving the
following sparse approximation problem to find {circumflex over
(.alpha.)}:
{circumflex over
(.alpha.)}=arg.sub..alpha.min.parallel..alpha..parallel..sub.0
subject to
.parallel.SD.sub..alpha.-I.parallel..sub.2.sup.2<.di-elect cons.
(3)
where:
[0032] .alpha. is a sparse representation of E;
[0033] S is a matrix of the shutter function;
[0034] D is an over-complete dictionary;
[0035] I is a vector of the captured coded image; and
[0036] .di-elect cons. is the error between the reconstructed
space-time volume and the ground truth. Any suitable mechanism can
be used to solve this approximation problem. For example, in
accordance with some embodiments, the orthogonal matching pursuit
(OMP) algorithm can be used to solve this approximation
problem.
[0037] Once {circumflex over (.alpha.)} has been found, the
space-time volume can be computed by solving E=D{circumflex over
(.alpha.)}.
[0038] Any suitable over-complete dictionary D can be used in some
embodiments, and such a dictionary can be formed in any suitable
manner. For example, in accordance with some embodiments, an
over-complete dictionary for sparsely expressing target video
volumes can be built from a large collection of natural video data.
Such an over-complete dictionary can be trained from patches of
natural scenes in a training data set using the K-SVD algorithm, as
described in Aharon et al., "K-SVD: An Algorithm for Designing
Overcomplete Dictionaries for Sparse Representation," IEEE
Transactions on Signal Processing, vol. 54, no. 11, November 2006,
which is hereby incorporated by reference herein in its entirety.
Such training can occur any suitable number of times (such as only
once) and can occur at any suitable point(s) in time.
[0039] Any suitable number of videos of any suitable type can be
used to train the dictionary in some embodiments. For example, in
some embodiments, a random selection of 20 video sequences with
frame rates close to a target frame rate (e.g., 300 fps) can be
used in some embodiments. To add variability to the data set,
spatial rotations on the sequences can be performed and the
sequences can be used for training in their forward (i.e., normal
playback) and backward (i.e., reverse playback) directions, in some
embodiments. Any suitable rotations can be performed in some
embodiments. For example, in some embodiments, rotations of 0, 45,
90, 135, 180, 215, 270, and 315 degrees can be performed. Any
suitable number of basis elements (e.g., 5000) can be extracted
from each sequence in some embodiments. As a result, the learned
dictionary can capture various features such as shifting edges in
various orientations in some embodiments.
[0040] After the reconstruction process has been performed for the
all positions, the overlapping reconstructed patches can be
averaged and the full space-time volume obtained at 108, and
process 100 can terminate at 110.
[0041] The resulting space-time volume video can then be used in
any suitable manner. For example, this video can be presented on a
display, can be stored, can be analyzed, etc.
[0042] Turning to FIG. 3, an example of hardware 300 that can be
used in some embodiments. As shown, the hardware can include an
objective lens 302, relay lenses 306, 310, and 314, a polarizing
beam splitter 308, a Liquid Crystal on Silicon (LCoS) chip 312, an
image sensor 316, and a computer 318.
[0043] The scene can be imaged onto a virtual image plane 304 using
objective lens 302. Objective lens 302 can be any suitable lens,
such as an objective lens with a focal length equal to 25 mm, for
example. The virtual image can then be re-focused onto an image
plane of LCoS chip 312 via relay lenses 306 and 310 and polarizing
beam splitter 308. LCoS chip 312 can be any suitable LCoS chip,
such as a LCoS chip part number SXGA-3DM from Forth Dimension
Displays Ltd. of Birmingham, UK. Relay lenses 306 and 310 can be
any suitable lenses, such as relay lenses with focal lengths equal
to 100 mm, for example. Polarizing beam splitter 308 can be any
suitable polarizing beam splitter.
[0044] The image formed on the image plane of LCoS chip 312 can be
polarized according to the shutter function and reflected back to
polarizing beam splitter 308, which can reflect the image through
relay lens 314 and can focus the image on image sensor 316. Relay
lens 314 can be any suitable relay lens, such as a relay lens with
a focal length equal to 100 mm, for example. Image sensor 316 can
be any suitable image sensor, such as a Point Grey Grasshopper
sensor from Point Grey Research Inc. of Richmond, BC, Canada.
[0045] As stated above, the virtual image can be focused on both
the image plane of the LCoS chip and the image sensor, thereby
enabling per-pixel alignment between the pixels of the LCoS chip
and the pixels of the image sensor. A trigger signal from the LCoS
chip into computer 318 can be used to temporally synchronize the
LCoS chip and the image sensor. The LCoS chip can be run at any
suitable frequency. For example, in some embodiments, the LCoS chip
can be run at 9-18 times the frame-rate of the image sensor.
[0046] Alternatively to using a LCoS chip to perform a shutter
function, the shutter function can be performed by pixel-wise
control of reset and reading of the pixels in an image sensor 416
as shown in FIG. 4 in some embodiments. As illustrated, image
sensor 416 can allow pixel-wise access by providing both row and
column select lines for the pixel array. Image sensor 416 can be a
CMOS image sensor in some embodiments. In such an embodiment
including an image sensor 416, the LCoS chip, the beam splitter,
and some of the lenses can be omitted.
[0047] Computer 318 can be used to perform functions described
above and any additional or alternative function(s). For example,
computer 318 can be used to perform the functions described above
in connection with FIGS. 1 and 2. Computer 318 can be any of a
general purpose device such as a computer or a special purpose
device such as a client, a server, etc. Any of these general or
special purpose devices can include any suitable components such as
a hardware processor (which can be a microprocessor, digital signal
processor, a controller, etc.), memory, communication interfaces,
display controllers, input devices, etc. In some embodiments,
computer 318 can be part of another device (such as a camera, a
mobile phone, computing device, a gaming device, etc.) or can be a
stand-alone device.
[0048] In some embodiments, any suitable computer readable media
can be used for storing instructions for performing the processes
described herein. For example, in some embodiments, computer
readable media can be transitory or non-transitory. For example,
non-transitory computer readable media can include media such as
magnetic media (such as hard disks, floppy disks, etc.), optical
media (such as compact discs, digital video discs, Blu-ray discs,
etc.), semiconductor media (such as flash memory, electrically
programmable read only memory (EPROM), electrically erasable
programmable read only memory (EEPROM), etc.), any suitable media
that is not fleeting or devoid of any semblance of permanence
during transmission, and/or any suitable tangible media. As another
example, transitory computer readable media can include signals on
networks, in wires, conductors; optical fibers, circuits, any
suitable media that is fleeting and devoid of any semblance of
permanence during transmission, and/or any suitable intangible
media.
[0049] Although the invention has been described and illustrated in
the foregoing illustrative embodiments, it is understood that the
present disclosure has been made only by way of example, and that
numerous changes in the details of implementation of the invention
can be made without departing from the spirit and scope of the
invention, which is only limited by the claims which follow.
Features of the disclosed embodiments can be combined and
rearranged in various ways.
* * * * *