U.S. patent application number 15/057210 was filed with the patent office on 2016-09-01 for methods and apparatus for making environmental measurements and/or using such measurements in 3d image rendering.
The applicant listed for this patent is NextVR Inc.. Invention is credited to David Cole, Alan McKay Moss.
Application Number | 20160253839 15/057210 |
Document ID | / |
Family ID | 56798339 |
Filed Date | 2016-09-01 |
United States Patent
Application |
20160253839 |
Kind Code |
A1 |
Cole; David ; et
al. |
September 1, 2016 |
METHODS AND APPARATUS FOR MAKING ENVIRONMENTAL MEASUREMENTS AND/OR
USING SUCH MEASUREMENTS IN 3D IMAGE RENDERING
Abstract
Methods and apparatus for making and using environmental
measurements are described. Environmental information captured
using a variety of devices is processed and combined to generate an
environmental model which is communicated to customer playback
devices. A UV map which is used for applying, e.g., wrapping,
images onto the environmental model is also provided to the
playback devices. A playback device uses the environmental model
and UV map to render images which are then displayed to a viewer as
part of providing a 3D viewing experience. In some embodiments
updated environmental model is generated based on more recent
environmental measurements, e.g., performed during the event. The
updated environmental model and/or difference information for
updating the existing model, optionally along with updated UV
map(s), is communicated to the playback devices for use in
rendering and playback of subsequently received image content. By
communicating updated environmental information improved 3D
simulations are achieved.
Inventors: |
Cole; David; (Laguna Beach,
CA) ; Moss; Alan McKay; (Laguna Beach, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
NextVR Inc. |
Laguna Beach |
CA |
US |
|
|
Family ID: |
56798339 |
Appl. No.: |
15/057210 |
Filed: |
March 1, 2016 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
62127215 |
Mar 2, 2015 |
|
|
|
62126709 |
Mar 1, 2015 |
|
|
|
62126701 |
Mar 1, 2015 |
|
|
|
Current U.S.
Class: |
345/420 |
Current CPC
Class: |
G09G 5/001 20130101;
H04N 19/44 20141101; H04N 21/4345 20130101; H04N 13/189 20180501;
G06T 17/20 20130101; H04N 13/344 20180501; G09G 3/003 20130101;
H04N 13/111 20180501; G06F 3/011 20130101; G06F 3/04815 20130101;
G06T 2207/30244 20130101; G09G 2360/02 20130101; G09G 2370/02
20130101; G06T 17/05 20130101; H04N 13/161 20180501; H04N 13/139
20180501; H04N 13/239 20180501; H04N 13/194 20180501; G09G 2360/125
20130101; H04N 13/204 20180501; H04N 13/271 20180501; H04N 13/232
20180501; G06T 19/20 20130101; G09G 2352/00 20130101; G06F 3/012
20130101; H04N 13/282 20180501; H04N 13/279 20180501; G06K 9/00362
20130101; G09G 5/006 20130101; G09G 5/003 20130101 |
International
Class: |
G06T 17/20 20060101
G06T017/20; G06T 19/20 20060101 G06T019/20; G06T 17/05 20060101
G06T017/05 |
Claims
1. A method of operating a playback device, the method comprising:
receiving information communicating a first mesh model of a 3D
environment generated based on measurements of a portion of said
environment made using a light field camera at a first time;
receiving image content; and rendering, using said first mesh model
at least some of the received image content.
2. The method of claim 1, further comprising: receiving updated
mesh model information, said updated mesh model information
including at least some updated mesh model information generated
based on measurements of said portion of said environment using
said light field camera at a second time.
3. The method of claim 2, further comprising: receiving additional
image content; and rendering, using said updated mesh model
information at least some of the received additional image
content.
4. The method of claim 3, wherein said information communicating a
first mesh model of the 3D environment includes information
defining a complete mesh model.
5. The method of claim 4, wherein said updated mesh model
information communicates a complete updated mesh model.
6. The method of claim 5, wherein said updated mesh model
information provides new mesh information for portions of said 3D
environment which have changed between said first and second time
periods.
7. The method of claim 6, wherein said updated mesh model
information is difference information indicating a difference
between said first mesh model and an updated mesh model.
8. The method of claim 7, wherein said first mesh model information
includes a first set of coordinate triples, each coordinate triple
indicating a coordinate in X, Y, Z space of a node in the first
mesh model.
9. The method of claim 8, wherein said updated mesh model
information includes at least one of: i) new sets of mesh
coordinates for at least some nodes in said first mesh model
information, said new coordinates being intended to replace
coordinates of corresponding nodes in said first mesh model; or ii)
a new set of coordinate triples to be used for at least a portion
of said first mesh model in place of a previous set of coordinate
triples, said new set of coordinate triples including the same or a
different number of coordinate triples than the previous set of
coordinate triples to be replaced.
10. The method of claim 9, further comprising: receiving a first
map mapping a 2D image space to said first mesh model; and wherein
rendering, using said first mesh model at least some of the
received image content, includes using said first map to determine
how to wrap an image included in said received image content onto
said first mesh model.
11. The method of claim 10, further comprising: receiving updated
map information corresponding to said updated mesh model
information; and wherein rendering, using said updated mesh model
information at least some of the received additional image content,
includes using said updated map information to determine how to
wrap an additional image included in said received additional image
content onto said updated mesh model.
12. The method of claim 11, wherein the updated map information
includes map difference information, the method further comprising:
generating an updated map by applying said map difference
information to said first map to generate an updated map; and
wherein rendering, using said updated mesh model information, at
least some of the received additional image content, includes using
said updated map to determine how to wrap an additional image
included in said received additional image content onto said
updated mesh model.
13. A computer readable medium including computer executable
instructions which, when executed by a computer, control the
computer to: receive information communicating a first mesh model
of a 3D environment generated based on measurements of a portion of
said environment made using a light field camera at a first time;
receive image content; and render, using said first mesh model at
least some of the received image content.
14. A playback apparatus, comprising: a processor configured to
control said playback apparatus to: receive information
communicating a first mesh model of a 3D environment generated
based on measurements of a portion of said environment made using a
light field camera at a first time; receive image content; and
render, using said first mesh model at least some of the received
image content.
15. The playback apparatus of claim 14, wherein the processor is
further configured to control the playback apparatus to: receive
updated mesh model information, said updated mesh model information
including at least some updated mesh model information generated
based on measurements of the portion of said environment using said
light field camera at a second time.
16. The playback apparatus of claim 15, wherein the processor is
further configured to control the playback apparatus to: receive
additional image content; and render, using said updated mesh model
information, at least some of the received additional image
content.
17. The playback apparatus of claim 14, wherein the processor is
further configured to control the playback apparatus to: receive a
first map mapping a 2D image space to said first mesh model; and
use said first map to determine how to wrap an image included in
said received image content onto said first mesh model as part of
being configured to render, using said first mesh model, at least
some of the received image content.
18. The playback apparatus of claim 17, wherein the processor is
further configured to control the playback apparatus to: receive
updated map information corresponding to said updated mesh model
information; and use said updated map information to determine how
to wrap an additional image included in said received additional
image content onto said updated mesh model as part of being
configured to render, using said updated mesh model information, at
least some of the received additional image content.
19. The playback apparatus of claim 18, wherein the updated map
information includes map difference information; and wherein the
processor is further configured to control the playback apparatus
to: generate an updated map by applying said map difference
information to said first map to generate an updated map; and use
said updated map to determine how to wrap an additional image
included in said received additional image content onto said
updated mesh model as part of rendering, using said updated mesh
model information, at least some of the received additional image
content.
20. The playback apparatus of claim 16, wherein said information
communicating a first mesh model of the 3D environment includes
information defining a complete mesh model.
Description
RELATED APPLICATIONS
[0001] The present application claims the benefit of U.S.
Provisional Application Ser. No. 62/126,701 filed Mar. 1, 2015,
U.S. Provisional Application Ser. No. 62/126,709 filed Mar. 1,
2015, and U.S. Provisional Application Ser. No. 62/127,215 filed
Mar. 2, 2015, each of which is hereby expressly incorporated by
reference in its entirety.
FIELD
[0002] The present invention relates to methods and apparatus for
capturing and using environmental information, e.g., measurements
and images, to support various applications including the
generation and/or display of stereoscopic images which can be used
as part of providing a 3D viewing experience.
BACKGROUND
[0003] Accurate representation of a 3D environment often requires
reliable models of the environment. Such models, when available,
can be used to during image playback so that object captured in
images of a scene appear to the view to be the correct size.
Environmental maps can also be used in stitching together different
pieces of an image and to facilitate alignment of images captured
by different cameras.
[0004] While environment maps, when available, can facilitate a
much more realistic stereoscopic displays than when a simple
spherical model of an environment is assumed, there are numerous
difficulties associated with obtaining accurate environmental
information during an event which may be filmed for later
stereoscopic playback. For example, while LIDAR may be used to make
environmental measures of distances relative to a camera position
prior to deployment of a stereoscopic camera to capture an event,
the laser(s) used for LIDAR measurements may be a distraction or
unsuitable for use during an actual event while people trying to
view a concert, game or other activity. In addition, the placement
of the camera rig used to capture an event may preclude a LIDAR
device being placed at the same location during the event.
[0005] Thus it should be appreciated that while LIDAR may be used
to make accurate measurements of a stadium or other event location
prior to an event, because of the use of LASER light as well as the
time associated with making LIDAR measures of an area, LIDAR is not
well suited for making measurements of an environment from the
location of a camera position during an event which is to be
captured by one or more cameras placed and operated from the camera
position during an ongoing event.
[0006] While LIDAR can be used to make highly accurate distance
measurements, for the above discussed reasons it is normally used
when a stadium or other event area does not have an ongoing event.
As a result, the LIDAR distance measurement normally measure an
empty stadium or event area without people present. In addition,
since the LIDAR measurements are normally made before any
modification or display set ups for a particular event, the static
environmental map provided by a LIDAR or other measurement system,
while in many cases highly accurate with regard to the environment
at the time of measurement, often does not accurately reflect the
state and shape of an environment during an event such as a sports
game, concert or fashion show.
[0007] In view of the above discussion it should be appreciated
that there is a need for new and improved methods of making
environmental measurements and, in particular, measuring the shape
of an environment during an event and using the environmental
information in simulating the 3D environment. While not necessary
for all embodiments, it would be desirable if an environment could
be accurately measured during an event with regard to a camera
position from which stereoscopic or other images are captured for
later playback as part of simulating the 3D environment of the
event.
SUMMARY
[0008] Methods and apparatus for making and using environmental
measurements are described. Environmental information captured
using a variety of devices is processed and combined. In some
embodiments different devices are used to capture environmental
information at different times, rates and/or resolutions. At least
some of the environmental information used to map the environment
is captured during an event. Such information is combined, in some
but not necessarily all embodiments, with environmental information
that was captured prior to the event. However, depending on the
embodiment, a single environmental measurement technique may be
used but in many embodiments multiple environmental measurement
techniques are used with the environmental information, e.g., depth
information relative to a camera position, being combined to
generate a more reliable and timely environmental map than might be
possible if a single source of environmental information were used
to generate a depth map.
[0009] In various embodiments environmental information is obtained
from one or more sources. In some embodiments, a static
environmental map or model, such as one produced form LIDAR
measurements before an event is used. LIDAR is a detection system
that works on the principle of radar, but uses light from a laser
for distance measurement. From LIDAR measurements made from a
location to be used for a camera position where a camera is placed
for capturing images during the actual event, or from model of the
environment made based on another location but with information
about the location of the camera position, a static map of an
environment relative to a camera position is generated. The static
map provides accurate distance information for the environment in
many cases, assuming the environment is unoccupied or has not
otherwise changed from the time the measurements used to make the
static map were made. Since the static map normally corresponds to
an empty environment, the distances indicated in the static depth
map are often maximum distances since objects such as persons,
signs, props, etc, are often added to an environment for an event
and it is rare that a structure shown in the static map is removed
for an event. Thus, static map can and sometimes is used to provide
maximum distance information and to provide information on the
overall scale/size of the environment.
[0010] In addition to static model information, in some embodiments
environmental measurements are made using information captured
during an event. The capture of the environmental information
during the event involves, in some embodiments, the use of one or
more light field cameras which capture images from which depth
information can be obtained using known techniques. In some
embodiments, light field cameras which provide both images and
depth maps generated from the images captured by the light field
camera are used. The cameras may be, and sometimes are, mounted on
or incorporated into a camera rig which also includes one or more
pairs of stereoscopic cameras. Methods for generating depth
information from light field cameras are used in some embodiments.
For example, image data corresponding to an area or a point in the
environment captured by sensor portions corresponding to different
lenses of the light field micro array can be processed to provide
information on the distance to the point or area.
[0011] The light field camera has the advantage of being able to
passively collect images during an event which can be used to
provide distance information. A drawback of the use of a light
field camera is that it normally has lower resolution than that of
a regular camera due to the use of the lens array over the sensor
which effectively lowers the resolution of the individual captured
images.
[0012] In addition to the images of the light field camera or
cameras, the images captured by other cameras including, e.g.,
stereoscopic camera pairs, can be processed and used to provide
depth information. This is possible since the cameras of a
stereoscopic pair are spaced apart by a known distance and this
information along with the captured images can, and in some
embodiments is used to determine the distance from the camera to a
point in the environment captured by the cameras in the
stereoscopic camera pair. The depth information, in terms of the
number of environmental points or locations for which depth can be
estimated, maybe as high or almost as high as the number of pixels
of the image captured by the individual cameras of the stereoscopic
pairs since the camera do not use a micro lens array over the
sensor of the camera.
[0013] While the output of the stereoscopic cameras can, and in
some embodiments are, processed to generate depth information, it
may be less reliable in many cases than the depth information
obtained from the output of the light field cameras.
[0014] In some embodiments, the static model of the environment
provides maximum distance information, the depth information from
the light field cameras provides more up to date depth information
which normally indicates depths which are equal to or less than the
depths indicated by the static model but which are more timely and
which may vary during an event as environmental conditions change.
Similarly the depth information from the images captured by the
stereo camera pair or pairs tends to be timely and available form
images captured during an event.
[0015] In various embodiments the depth information from the
different sources, e.g., static model which may be based on LIDAR
measurements prior to an event, depth information from the one or
more light field cameras and depth information generated from the
stereoscopic images are combined, e.g., reconciled. The
reconciliation process may involve a variety of techniques or
information weighting operations taking into consideration the
advantages of different depth information sources and the
availability of such information. For example, in one exemplary
resolution process LIDAR based depth information obtained from
measurements of the environment prior to an event is used to
determine maximum depths, e.g., distances, from a camera position
and are used in the absence of additional depth information to
model the environment. When depth information is available from a
light field camera or array of light field cameras, the depth
information is used to refine the environmental depth map so that
it can reflect changes in the environment during an ongoing event.
In some embodiments reconciling depth map information obtained from
images captured by a light field camera includes refining the LIDAR
based depth map to include shorter depths reflecting the presence
of objects in the environment during an event. In some cases
reconciling an environmental depth map that is based on light field
depth measurements alone, or in combination with information from a
static or LIDAR depth map, includes using depth information to
further clarify the change in depths between points where the depth
information is known from the output of the light field camera. In
this way, the greater number of points of information available
from the light field and/or stereoscopic images can be used to
refine the depth map based on the output of the light field camera
or camera array.
[0016] Based on depth information and/or map a 3D model of the
environment, sometimes referred to as the environmental mesh model,
is generated in some embodiments. The 3D environmental model may be
in the form of a grid map of the environment onto which images can
be applied. In some embodiments the environmental model is
generated based on environmental measurements, e.g., depth
measurements, of the environment of interest performed using a
light field camera, e.g., with the images captured by the light
field cameras being used to obtain depth information. In some
embodiments an environmental model generated based on measurements
of at least a portion of the environment made using a light field
camera at a first time, e.g., prior to and/or at the start of an
event. The environmental model is communicated to one or more
customer devices, e.g., rendering and playback devices for use in
rendering and playback of image content. In some embodiments a UV
map which is used to apply, e.g., wrap, images onto the 3D
environmental model is also provided to the customer devices.
[0017] The application of images to such a map is sometimes called
wrapping since the application has the effect of applying the
image, e.g., a 2D image, as if it was being wrapped unto the 3D
environmental model. The customer playback devices use the
environmental model and UV map to render image content which is
then displayed to a viewer as part of providing the viewer a 3D
viewing experience.
[0018] Since the environment is dynamic and changes may occur while
the event is ongoing as discussed above, in some embodiments
updated environmental information is generated to accurately model
the environmental changes during the event and provided to the
customer devices. In some embodiments the updated environmental
information is generated based on measurements of the portion of
the environment made using the light field camera at a second time,
e.g., after the first time period and during the event. In some
embodiments the updated model information communicates a complete
updated mesh model. In some embodiments the updated mesh model
information includes information indicating changes to be made to
the original environmental model to generate an updated model with
the updated environmental model information providing new
information for portions of the 3D environment which have changed
between the first and second time periods.
[0019] The updated environmental model and/or difference
information for updating the existing model, optionally along with
updated UV map(s), is communicated to the playback devices for use
in rendering and playback of subsequently received image content.
By communicating updated environmental information improved 3D
simulations are achieved.
[0020] By using the depth map generation techniques described
herein, relatively accurate depth maps of a dynamic environment
such as an ongoing concert, sporting event, play, etc. in which
items in the environment may move or be changed during the event
can be generated. By communicating the updated depth information,
e.g., in the form of a 3D model of the environment or updates to an
environmental model, improved 3D simulations can be achieved which
can in turn be used for enhanced 3D playback and/or viewing
experience. The improvements in 3D environmental simulation can be
achieved over systems which use static depth maps since the
environmental model onto which images captured in the environment
to be simulated will more accurately reflect the actual environment
than in cases where the environmental model is static.
[0021] It should be appreciated that as changes to the environment
in which images are captured by the stereoscopic and/or other
camera occur, such changes can be readily and timely reflected in
the model of the environment used by a playback device to display
the captured images.
BRIEF DESCRIPTION OF THE DRAWINGS
[0022] FIG. 1 illustrates a camera rig implemented in accordance
with one embodiment along with a calibration target which may be
used to for calibrating the camera rig.
[0023] FIG. 2 illustrates the camera rig with three pairs of
cameras, e.g., 3 pairs of cameras capturing stereoscopic image
data, mounted in the camera rig.
[0024] FIG. 3 illustrates an exemplary camera rig with an exemplary
protective cover implemented in accordance with some exemplary
embodiments.
[0025] FIG. 4 illustrates another exemplary camera rig implemented
in accordance with an exemplary embodiment with various elements of
the camera rig being shown for clarity in partially disassembled
form.
[0026] FIG. 5 shows the camera rig of FIG. 4 with the cameras
mounted thereon along with an audio capture device including ear
shaped devices including microphones used for capturing stereo
audio.
[0027] FIGS. 6-8 illustrate various views of an exemplary camera
rig implemented in accordance with some exemplary embodiments.
[0028] FIG. 9 illustrates yet another exemplary camera rig
implemented in accordance with some exemplary embodiments.
[0029] FIG. 10 illustrates a front view of an exemplary arrangement
of an array of cameras that can be used in the exemplary camera
rigs of the present invention such as camera rigs shown in FIGS.
1-9, in accordance with some embodiments.
[0030] FIG. 11 illustrates a front view of yet another exemplary
arrangement of an array of cameras that can be used in any of the
camera rigs of the present invention.
[0031] FIG. 12 illustrates an exemplary system implemented in
accordance with some embodiments of the invention.
[0032] FIG. 13A is a first part of FIG. 13 which illustrates a
flowchart of an exemplary method of operating an imaging system in
accordance with some embodiments.
[0033] FIG. 13B is a second part of FIG. 13 which illustrates a
flowchart of an exemplary method of operating the imaging
system.
[0034] FIG. 14A is a first part of FIG. 14 which illustrates a
flowchart of an exemplary method of generating and updating 3D mesh
models and UV maps in accordance with an exemplary embodiment that
is well suited for use with the method shown in FIGS. 13A and
13B.
[0035] FIG. 14B is a second part of FIG. 14 which illustrates a
flowchart of generating and updating 3D mesh models and UV maps in
accordance with an exemplary embodiment.
[0036] FIG. 15 illustrates an exemplary light field camera which
can be used in the camera rig shown in FIGS. 1-9.
[0037] FIG. 16 illustrates an exemplary processing system
implemented in accordance with an exemplary embodiment.
[0038] FIG. 17 illustrates a flowchart of an exemplary method of
operating an exemplary rendering and playback device in accordance
with an exemplary embodiment.
[0039] FIG. 18 illustrates an exemplary rendering and playback
device implemented in accordance with an exemplary embodiment.
[0040] FIG. 19 illustrates an exemplary 3D environmental mesh model
that may be used in various embodiments with a plurality of nodes
illustrated as the point of intersection of lines used to divide
the 3D model into segments.
[0041] FIG. 20 illustrates an exemplary UV map that can be used for
mapping portions of a 2D frame, providing a texture, to the mesh
model of FIG. 19.
DETAILED DESCRIPTION
[0042] Various features relate to the field of panoramic
stereoscopic imagery and more particularly, to an apparatus
suitable for capturing high-definition, high dynamic range, high
frame rate stereoscopic, 360-degree panoramic video using a minimal
number of cameras in an apparatus of small size and at reasonable
cost while satisfying weight, and power requirements for a wide
range of applications.
[0043] Stereoscopic, 360-degree panoramic video content is
increasingly in demand for use in virtual reality displays. In
order to produce stereoscopic, 360-degree panoramic video content
with 4K or greater of resolution, which is important for final
image clarity, high dynamic range, which is important for recording
low-light content, and high frame rates, which are important for
recording detail in fast moving content (such as sports), an array
of professional grade, large-sensor, cinematic cameras or other
cameras of suitable quality are often needed.
[0044] In order for the camera array to be useful for capturing
360-degree, stereoscopic content for viewing in a stereoscopic
virtual reality display, the camera array should acquire the
content such that the results approximate what the viewer would
have seen if his head were co-located with the camera.
Specifically, the pairs of stereoscopic cameras should be
configured such that their inter-axial separation is within an
acceptable delta from the accepted human-model average of 63 mm.
Additionally, the distance from the panoramic array's center point
to the entrance pupil of a camera lens (aka nodal offset) should be
configured such that it is within an acceptable delta from the
accepted human-model average of 101 mm.
[0045] In order for the camera array to be used to capture events
and spectator sports where it should be compact and non-obtrusive,
it should be constructed with a relatively small physical footprint
allowing it to be deployed in a wide variety of locations and
shipped in a reasonable sized container when shipping is
required.
[0046] The camera array should also be designed such that the
minimum imaging distance of the array to be small, e.g., as small
as possible, which minimizes the "dead zone" where scene elements
are not captured because they fall outside of the field of view of
adjacent cameras.
[0047] It would be advantageous if the camera array can be
calibrated for optical alignment by positioning calibration targets
where the highest optical distortion is prone to occur (where lens
angles of view intersect AND the maximum distortion of the lenses
occur). To facilitate the most efficacious calibration target
positioning, target locations should, and in some embodiments are,
determined formulaically from the rig design.
[0048] FIG. 1 shows an exemplary camera configuration 100 used in
some embodiments. The support structure shown in FIGS. 4 and 5 is
not shown in FIG. 1 to allow for better appreciation of the camera
pair arrangement shown used in some embodiments.
[0049] While in some embodiments three camera pairs are used such
as in the FIG. 1 example in some but not all embodiments a camera
array, e.g., the camera positions of the rig, is populated with
only 2 of the 6-total cameras which may be used to support
simultaneous 360-degree stereoscopic video. When the camera rig or
assembly is configured with less than all 6 cameras which can be
mounted in the rig, the rig is still capable of capturing the
high-value, foreground 180-degree scene elements in real-time while
manually capturing static images of the lower-value, background
180-degree scene elements, e.g., by rotating the rig when the
foreground images are not being captured. For example, in some
embodiments when a 2-camera array is used to capture a football
game with the field of play at the 0-degree position relative to
the cameras, the array is manually rotated around the nodal point
into the 120-degree and 240-degree positions. This allows the
action on the field of a sports game or match, e.g., foreground, to
be captured in real time and the sidelines and bleachers, e.g.,
background areas, to be captured as stereoscopic static images to
be used to generate a hybridized panorama including real time
stereo video for the front portion and static images for the left
and right rear portions. In this manner, the rig can be used to
capture a 360 degree view with some portions of the 360 view being
captured at different points in time with the camera rig being
rotated around its nodal axis, e.g., vertical center point between
the different points in time when the different view of the 360
scene area are captured. Alternatively, single cameras may be
mounted in the second and third camera pair mounting positions and
mono (non-stereoscopic) image content captured for those areas.
[0050] In other cases where camera cost is not an issue, more than
two cameras can be mounted at each position in the rig with the rig
holding up to 6 cameras as in the FIG. 1 example. In this manner,
cost effect camera deployment can be achieved depending on the
performance to be captured and, the need or ability of the user to
transport a large number, e.g., 6 cameras, or the user's ability to
transport fewer than 6 cameras, e.g., 2 cameras. In some
embodiments an environmental depth map is generated from the images
captured by the cameras in the camera rig 100.
[0051] FIG. 1 depicts a six (6) camera assembly 100 also sometimes
referred to as a rig or camera array, along with a calibration
target 115. The camera rig 100 illustrated in FIG. 1 includes a
support structure (shown in FIGS. 4 and 5) which holds the cameras
in the indicated positions, 3 pairs 102, 104, 106 of stereoscopic
cameras (101, 103), (105, 107), (109, 111) for a total of 6
cameras. The support structure includes a base 720 also referred to
herein as a mounting plate (see element 720 shown in FIG. 4) which
supports the cameras and to which plates on which the cameras are
mounted can be secured. The support structure maybe made of
plastic, metal or a composite material such as graphite or
fiberglass, and is represented by the lines forming the triangle
which is also used to show the spacing and relationship between the
cameras. The center point at which the doted lines intersect
represents the center nodal point around which the camera pairs
102, 104, 106 can be rotated in some but not necessarily all
embodiments. The center nodal point corresponds in some embodiments
to a steel rod or threaded center mount, e.g., of a tripod base,
around which a camera support frame represented by the triangular
lines can be rotated. The support frame may be a plastic housing in
which the cameras are mounted or tripod structure as shown in FIGS.
4 and 5.
[0052] In FIG. 1, each pair of cameras 102, 104, 106 corresponds to
a different camera pair position. The first camera pair 102
corresponds to a 0 degree forward to front facing position and
normally meant to cover the foreground where the main action
occurs. This position normally corresponds to the main area of
interest, e.g., a field upon which a sports game is being played, a
stage, or some other area where the main action/performance is
likely to occur. The second camera pair 104 corresponds to a 120
degree camera position (approximately 120 degree from the front
facing) degree position) and is used to capture a right rear
viewing area. The third camera pair 106 corresponds to a 240 degree
viewing position (approximately 240 degree from the front facing)
and a left rear viewing area. Note that the three camera positions
are 120 degrees apart.
[0053] Each camera viewing position includes one camera pair in the
FIG. 1 embodiment, with each camera pair including a left camera
and a right camera which are used to capture images. The left
camera captures what are sometimes referred to as a left eye images
and the right camera captures what is sometime referred to as right
eye images. The images may be part of a view sequence or still
image captured at one or more times. Normally at least the front
camera position corresponding to camera pair 102 will be populated
with high quality video cameras. The other camera positions may be
populated with high quality video cameras, lower quality video
cameras or a single camera used to capture still or mono images. In
some embodiments the second and third camera embodiments are left
unpopulated and the support plate on which the cameras are mounted
is rotated allowing the first camera pair 102 to capture images
corresponding to all three camera positions but at different times.
In some such embodiments left and right rear images are captured
and stored and then video of the forward camera position is
captured during an event. The captured images may be encoded and
streamed in real time, e.g. while an event is still ongoing, to one
or more playback devices.
[0054] The first camera pair 102 shown in FIG. 1 includes a left
camera 101 and a right camera 103. The left camera has a first lens
assembly 120 secured to the first camera and the right camera 103
has a second lens assembly secured to the right camera 103. The
lens assemblies 120, 120' include lenses which allow for a wide
angle field of view to be captured. In some embodiments each lens
assembly 120, 120' includes a fish eye lens. Thus each of the
cameras 102, 103 can capture a 180 degree field of view or
approximately 180 degrees. In some embodiments less than 180
degrees is captured but there is still at least some overlap in the
images captured from adjacent camera pairs in some embodiments. In
the FIG. 1 embodiment a camera pair is located at each of the first
(0 degree), second (120 degree), and third (240 degree) camera
mounting positions with each pair capturing at least 120 degrees or
more of the environment but in many cases with each camera pair
capturing 180 degrees or approximately 180 degrees of the
environment.
[0055] Second and third camera pairs 104, 106 are the same or
similar to the first camera pair 102 but located at 120 and 240
degree camera mounting positions with respect to the front 0 degree
position. The second camera pair 104 includes a left camera 105 and
left lens assembly 122 and a right camera 107 and right camera lens
assembly 122'. The third camera pair 106 includes a left camera 109
and left lens assembly 124 and a right camera 111 and right camera
lens assembly 124'.
[0056] In FIG. 1, D represents the inter-axial distance of the
first 102 stereoscopic pair of cameras 101, 103. In the FIG. 1
example D is 117 mm which is the same or similar to the distance
between pupils of the left and right eyes of an average human
being. Dashed line 150 in FIG. 1 depicts the distance from the
panoramic array's center point to the entrance pupil of the right
camera lens 120' (aka nodal offset). In one embodiment
corresponding to the FIG. 1 which example the distance indicated by
reference number 150 is 315 mm but other distances are
possible.
[0057] In one particular embodiment the footprint of the camera rig
100 is relatively small. Such a small size allows the camera rig to
be placed in an audience, e.g., at a seating position where a fan
or attendance might normally be located or positioned. Thus in some
embodiments the camera rig is placed in an audience area allowing a
viewer to have a sense of being a member of the audience where such
an effect is desired. The footprint in some embodiments corresponds
to the size of the base to which the support structure including,
in some embodiments a center support rod is mounted or support
tower is located. As should be appreciated the camera rigs in some
embodiments can rotate around the center point of the base which
corresponds to the center point between the 3 pairs of cameras. In
other embodiments the cameras are fixed and do not rotate around
the center of the camera array.
[0058] The camera rig 100 is capable of capturing relatively close
as well as distinct object. In one particular embodiment the
minimum imaging distance of the camera array is 649 mm but other
distances are possible and this distance is in no way critical.
[0059] The distance from the center of the camera assembly to the
intersection point 151 of the views of the first and third camera
parts represents an exemplary calibration distance which can be
used for calibrating images captured by the first and second camera
pairs. In one particular exemplary embodiment, an optimal
calibration distance, where lens angles of view intersect and the
maximum distortion of the lenses occur is 743 mm. Note that target
115 may be placed at a known distance from the camera pairs located
at or slightly beyond the area of maximum distortion. The
calibration target include a known fixed calibration pattern. The
calibration target can be and is used for calibrating the size of
images captured by cameras of the camera pairs. Such calibration is
possible since the size and position of the calibration target is
known relative to the cameras capturing the image of the
calibration target 115.
[0060] FIG. 2 is a diagram 200 of the camera array 100 shown in
FIG. 1 in greater detail. While the camera rig 100 is again shown
with 6 cameras, in some embodiment the camera rig 100 is populated
with only two cameras, e.g., camera pair 102 including cameras 101
and 103. As shown there is a 120 degree separation between each of
the camera pair mounting positions. Consider for example if the
center between each camera pair corresponds to the direction of the
camera mounting position. In such a case the first camera mounting
position corresponds to 0 degrees, the second camera mounting
position corresponds to 120 degrees and the third camera mounting
position corresponding to 240 degrees. Thus each camera mounting
position is separated by 120 degrees. This can be seen if the
center line extending out through the center of each camera pair
102, 104, 106 was extended and the angle between the lines
measured.
[0061] In the FIG. 2 example, the pair 102, 104, 106 of cameras
can, and in some embodiments do, rotate around the center point of
the camera rig allowing for different views to be captured at
different times without having to alter the position of the camera
rig base. That is, the cameras can be rotated around the center
support of the rig and allowed to capture different scenes at
different times allowing for a 360 degree scene capture using the
rig shown in FIG. 2 while it is populated with only two cameras.
Such a configuration is particularly desirable from a cost
perspective given the cost of stereoscopic cameras and is well
suited for many applications where it may be desirable to show a
background captured from the same point of view but at a different
time than the time at which the front scene including the main
action during a sporting event or other event may occur. Consider
for example that during the event objects may be placed behind the
camera that it would be preferable not to show during the main
event. In such a scenario the rear images may be, and sometimes
are, captured prior to the main event and made available along with
the real time captured images of the main event to provide a 360
degree set of image data.
[0062] Various features also relate to the fact that the camera
support structure and camera configuration can, and in various
embodiments does, maintain a nodal offset distance in a range from
75 mm to 350 mm. In one particular embodiment, a nodal offset
distance of 315 mm is maintained. The support structure also
maintains, in some embodiments an overall area (aka footprint) in a
range from 400 mm.sup.2 to 700 mm.sup.2. In one particular
embodiment, an overall area (aka footprint) of 640 mm.sup.2 is
maintained. The support structure also maintains a minimal imaging
distance in a range from 400 mm to 700 mm. In one particular
embodiment, a minimal imaging distance of 649 mm is maintained. In
one particular embodiment the optimal calibration distance of the
array is where lens angles of view intersect AND the maximum
distortion of the lenses occur. In one particular exemplary
embodiment this distance is 743 mm.
[0063] As discussed above, in various embodiments the camera array,
e.g., rig, is populated with only 2 of the 6-total cameras which
would normally be required for simultaneous 360-degree stereoscopic
video for the purpose of capturing the high-value, foreground
180-degree scene elements in real-time while manually capturing
static images of the lower-value, background 180-degree scene
elements.
[0064] FIG. 3 shows an exemplary camera rig 300 which is the same
or similar to the rig of FIGS. 1 and 2 but without a support tripod
and with a plastic cover 350 placed over the camera pairs. The
plastic cover 350 includes handles 310, 312, 314 which can be used
to lift or rotate, e.g., when placed on a tripod, the camera rig
300. The camera rig 300 is shown with three pairs of cameras, a
first camera pair 302 including cameras 301, 303 with lens
assemblies 320, 320', a second camera pair 304 including cameras
with lens assemblies 322, 322', and a third camera pair 306
including cameras with lens assemblies 324, 324'. The plastic cover
350 is secured to the mounting platform 316, which may be
implemented as a flat plate with one or more slots and screw holes
as shown in FIG. 4. The plastic cover 350 is secured to the base
with nuts or screws 330, 331 which can be removed or tightened by
hand to allow for easy removal or attachment of the cover 350 and
easy access to the cameras of the camera pairs. While six cameras
are included in the rig 300 shown in FIG. 3, a single camera pair
may be included and/or a single camera pair with one or more
individual cameras located at the other camera mounting positions
where the camera pairs are not mounted may be used.
[0065] FIG. 4 is a detailed diagram of a camera rig assembly 400
shown in partially disassembled form to allow better view of how
the components are assembled. The camera rig 400 is implemented in
accordance with one exemplary embodiment and may have the camera
configuration shown in FIGS. 1 and 2. In the example shown in FIG.
4 various elements of the camera rig 400 are shown in disassembled
form for clarity and detail. As can be appreciated from FIG. 4, the
camera rig 400 includes 3 pairs of cameras 702, 704 and 706, e.g.,
stereoscopic cameras, which can be mounted on a support structure
720 of the camera rig 400. The first pair of cameras 702 includes
cameras 750 and 750'. The second pair of cameras 704 includes
cameras 752, 752' and the third pair of cameras 706 includes
cameras 754, 754'. The lenses 701, 701' of the cameras 750, 750'
can be seen in FIG. 7. While elements 701 and 701' are described as
lenses, in some embodiments they are lens assemblies which are
secured to the cameras 750, 750 with each lens assembly including
multiple lenses positioned in a lens barrel which is secured to the
cameras 750, 750' via a friction fit or twist lock connection.
[0066] In some embodiments the three pairs (six cameras) of cameras
702, 704 and 706 are mounted on the support structure 720 via the
respective camera pair mounting plates 710, 712 and 714. The
support structure 720 may be in the form of a slotted mounting
plate 720. Slot 738 is exemplary of some of the slots in the plate
720. The slots reduce weight but also allow for adjustment of the
position of the camera mounting plates 710, 712, 714 used to
support camera pairs or in some cases a single camera.
[0067] The support structure 720 includes three different mounting
positions for mounting the stereoscopic camera pairs 702, 704, 706,
with each mounting position corresponding to a different direction
offset 120 degrees from the direction of the adjacent mounting
position. In the illustrated embodiment of FIG. 7, the first pair
of stereoscopic cameras 702 is mounted in a first one of the three
mounting positions, e.g., front facing position, and corresponds to
a front viewing area. The second pair 704 of stereoscopic cameras
704 is mounted in a second one of the three mounting positions,
e.g., background right position rotating 120 degrees clockwise with
respect the front position, and corresponds to a different right
rear viewing area. The third pair 706 of stereoscopic cameras is
mounted in a third one of the three mounting positions, e.g.,
background left position rotating 240 degrees clockwise with
respect the front position, and corresponds to a left rear viewing
area. The cameras in each camera position capture at least a 120
viewing area but may capture in many case at least a 180 degree
viewing area resulting in overlap in the captured images which can
facilities combining of the images into a 360 degree view with some
of the overlapping portions being cut off in some embodiments.
[0068] The first camera pair mounting plate 710 includes threaded
screw holes 741, 741', 741'' and 741''' through which screws 704,
740', 740'', 740'' can be inserted, respectively through slots 738
and 738'; to secure the plate 710 to the support structure 720. The
slots allow for adjustment of the position of the support plate
710.
[0069] The cameras 750, 750' of the first camera pair are secured
to individual corresponding camera mounting plates 703, 703' using
screws that pass through the bottom of the plates 703, 703' and
extend into threaded holes on the bottom of the cameras 750, 750'.
Once secured to the individual mounting plates 703, 703' the
cameras 750, 750' and mounting plates 703, 703' can be secured to
the camera pair mounting plate 710 using screws. Screws 725, 725',
725'' (which is not fully visible) and 725''' pass through
corresponding slots 724 into threaded holes 745, 745', 745'' and
745''' of the camera pair mounting plate 710 to secure the camera
plate 703 and camera 750 to the camera pair mounting plate 710.
Similarly, screws 727, 727'(which is not fully visible), 727'' and
727'' pass through corresponding slots 726, 726', 726'' and 726'''
into threaded holes 746, 746', 746'' and 746''' of the camera pair
mounting plate 710 to secure the camera plate 703' and camera 750'
to the camera pair mounting plate 710.
[0070] The support structure 720 has standoff rollers 732, 732'
mounted to reduce the risk that an object moving past the support
structure will get caught on the support structure as it moves
nearby. This reduces the risk of damage to the support structure
720. Furthermore by having a hollow area inside behind the roller
an impact to the roller is less likely to be transferred to the
main portion of the support structure. That is, the void behind the
rollers 732, 732' allows for some deformation of the bar portion of
the support structure on which the standoff roller 732' is mounted
without damage to the main portion of the support structure
including the slots used to secure the camera mounting plates.
[0071] In various embodiments the camera rig 400 includes a base
722 to which the support structure 720 is rotatable mounted e.g. by
a shaft or threaded rod extending trough the center of the base
into the support plate 720. Thus in various embodiments the camera
assembly on the support structure 720 can be rotated 360 degrees
around an axis that passes through the center of the base 722. In
some embodiments the base 722 may be part of a tripod or another
mounting device. The tripod includes legs formed by pairs of tubes
(742, 742'), (742'' and 742'') as well as additional leg which is
not visible in FIG. 4 due to the viewing angle. The legs are
secured by a hinge to the base 722 and can be folded for transport.
The support structure maybe made of plastic, metal or a composite
material such as graphite or fiberglass or some combination
thereof. The camera pairs can be rotated around a central point,
sometimes referred to as center nodal point, in some
embodiments.
[0072] The assembly 400 shown in FIG. 4 allows for the position of
individual cameras to be adjusted from the top by loosing the
screws securing the individual camera mounting plates to the camera
pair mounting plate and then adjusting the camera position before
retightening the screws. The position of a camera pair can be
adjusted by moving the camera pair mounting plate after loosening
the screws accessible from the bottom side of the support structure
720, moving the plate and then retightening the screws.
Accordingly, what the general position and direction of the camera
pairs is defined by the slots in the support plate 720, the
position and direction can be finely adjusted as part of the camera
calibration process to achieve the desired camera alignment while
the cameras are secured to the support structure 720 in the field
where the camera rig is to be used.
[0073] In FIG. 5 reference numbers which are the same as those used
in FIG. 4 refer to the same elements. FIG. 5 illustrates a drawing
500 showing the exemplary camera rig 400 in assembled form with
additional stabilization plates 502, 502', 504, 504', 506 and
stabilization plate joining bars 503, 505, 507, 509, 511, 513 added
to the tops of the camera pairs to increase the rigidity and
stability of the cameras pairs after they have been adjusted to the
desired positions.
[0074] In the drawing 500 the camera pairs 702, 704, 706 can be
seen mounted on the support structure 720 with at least one of the
camera pair mounting plate 710 being visible in the illustrated
drawing. In addition to the elements of camera rig 400 already
discussed above with regard to FIG. 4, in drawing 500 two simulated
ears 730, 732 mounted on the camera rig can also be seen. These
simulated ears 730, 732 imitate human ears and in some embodiments
are made from silicone or plastic molded in the shape of a human
ear. Simulated ears 730, 732 include microphones with the two ears
being separated from each other by a distance equal to, or
approximately equal to, the separation between human ears of an
average human. The microphones mounted in the simulated ears 730,
732 are mounted on the front facing camera pair 702 but could
alternatively be mounted on the support structure, e.g., platform,
720. The simulated ears 730, 732 are positioned perpendicular to
the front surface of the camera pair 702 in a similar manner as
human ears are positioned perpendicular to the front surface of
eyes on a human head. Holes in the side of the simulated ears 730,
732 act as an audio/sound entry point to the simulated ears with
the simulated ears and hole operating in combination to direct
audio towards a microphone mounted in each one of the simulated
ears much as a human ear directs audio sounds into the eardrum
included in a human ear. The microphones in the left and right
simulated ears 730, 732 provide for stereo sound capture similar to
what a human at the location of the camera rig 500 would perceive
via the human's left and right ears if located at the position of
the camera rig. The audio input of the microphones mounted in the
simulate ears is perpendicular to the face of the outer lens of
front facing cameras 750, 750' in the same manner that the sensor
portion of a human ear would be somewhat perpendicular to the
humans beings face. The simulate ears direct sound into toward the
microphone just as a human ear would direct sound waves towards a
human ear drum.
[0075] The simulated ears 730, 730 are mounted on a support bar 510
which includes the microphones for capturing sound. The audio
capture system 730, 732, 810 is supported by a movable arm 514
which can be moved via handle 515.
[0076] While FIGS. 4-5 illustrate one configuration of an exemplary
camera rig with three stereoscopic camera pairs, it should be
appreciated that other variations are possible. For example, in one
implementation the camera rig 400 includes a single pair of
stereoscopic cameras which can rotate around the center point of
the camera rig allowing for different 120 degree views to be
captured at different times. Thus a single camera pair can be
mounted on the support structure and rotated around the center
support of the rig and allowed to capture different scenes at
different times allowing for a 360 degree scene capture.
[0077] In other embodiments the camera rig 400 includes a single
stereoscopic camera pair 702 and one camera mounted in each of the
second and third positions normally used for a pair of stereoscopic
cameras. In such an embodiment a single camera is mounted to the
rig in place of the second camera pair 704 and another single
camera is mounted to the camera rig in place of the camera pair
706. Thus, in such an embodiment, the second camera pair 704 may be
thought of as being representative of a single camera and the
camera pair 706 may be thought of as being illustrative of the
additional single camera.
[0078] FIGS. 6-9 illustrate various views of other exemplary camera
rigs implemented in accordance with some exemplary embodiments.
[0079] FIG. 6 illustrates a drawing 800 showing one view of an
exemplary camera rig 801 implemented in accordance with some
exemplary embodiments. An array of cameras is included in the
camera rig 801 some of which are stereoscopic cameras. In the
illustrated view of the camera rig 801 in drawing 800, only a
portion of the camera rig 801 is visible while a similar
arrangement of cameras exist on the other sides (also referred to
as different faces) of the camera rig 801 which cannot be fully
seen in the drawing 800. In some but not all embodiments, the
camera rig 801 includes 13 cameras secured by a top plastic body or
cover 805 and a bottom base cover 842. In some embodiments 8 of
these 13 cameras are stereoscopic cameras such as the cameras 804,
806, 812 and 814 in pairs while many other cameras are light field
cameras such as cameras 802 and 810 which are visible in the
drawing 800 and cameras 815 and 820 which are not fully but
partially visible in drawing 800. Various other combinations of the
cameras are possible. In some embodiments a camera 825 is also
mounted on the top portion of the camera rig 801, e.g., top face
840 of camera rig 801, to capture images of a top hemisphere of an
environment of interest. The plastic body/cover 805 includes
handles 811, 813, 817 which can be used to lift or rotate the
camera rig 801.
[0080] In some embodiments the camera rig 801 includes one light
field camera (e.g., camera 802) and two other cameras (e.g.,
cameras 804, 806) forming a stereoscopic camera pair on each longer
side of the camera rig 801. In some such embodiments there are four
such longer sides (also referred to as the four side faces 830,
832, 834 and 836) with each longer side having one light field
camera and one stereoscopic camera pair, e.g., light field camera
802 and stereoscopic camera pair 804, 806 on one longer side 836 to
the left while another light field camera 810 and stereoscopic
camera pair 812, 814 on the other longer side 830 to the right can
be seen in drawing 800. While the other two side faces are not
fully shown in drawing 800, they are shown in more detail in FIG.
8. In some embodiments at least some of the cameras, e.g.,
stereoscopic cameras and the light field cameras, in the camera rig
801 use a fish eye lens. In various embodiments each of the cameras
in the camera rig 801 is protected by a corresponding lens/camera
guard to protect the camera and/or lens against a physical impact
and/or damage that may be caused by an object. For example cameras
802, 804 and 806 are protected by guards 845, 847 and 849
respectively. Similarly cameras 810, 812 and 814 are protected by
guards 850, 852 and 854 respectively.
[0081] In addition to the stereoscopic camera pair and the light
field camera on each of the four side faces 830, 832, 834 and 836,
in some embodiments the camera rig 801 further includes a camera
825 facing in the upward vertical direction, e.g., towards the sky
or another top ceiling surface in the case of a closed environment,
on the top face 840 of the camera rig 801. In some such embodiments
the camera 825 on the top face of the camera rig 801 is a light
field camera. While not shown in drawing 800, in some other
embodiments the top face 840 of the camera rig 801 also includes,
in addition to the camera 825, another stereoscopic camera pair for
capturing left and right eye images. While in normal circumstances
the top hemisphere (also referred to as the sky portion) of a 360
degree environment, e.g., stadium, theater, concert hall etc.,
captured by the camera 825 may not include action and/or remain
static in some cases it may be important or desirable to capture
the sky portion at the same rate as other environmental portions
are being captured by other cameras on the rig 801.
[0082] While one exemplary camera array arrangement is shown and
discussed above with regard to camera rig 801, in some other
implementations instead of just a single light field camera (e.g.,
such as cameras 802 and 810) arranged on top of a pair of
stereoscopic cameras (e.g., cameras 804, 806 and 812, 814) on four
faces 830, 832, 834, 836 of the camera rig 801, the camera rig 801
includes an array of light field cameras arranged with stereoscopic
camera pair. For example in some embodiments there are 3 light
field cameras arranged on top of a stereoscopic camera pair on each
of the longer sides of the camera rig 801. In another embodiment
there are 6 light field cameras arranged on top of stereoscopic
camera pair on each of the longer sides of the camera rig 801,
e.g., with two rows of 3 light field cameras arranged on top of the
stereoscopic camera pair. Some of such variations are discussed
with regard to FIGS. 12-13. Moreover in another variation a camera
rig of the type shown in drawing 800 may also be implemented such
that instead of four faces 830, 832, 834, 836 with the cameras
pointed in the horizontal direction as shown in FIG. 8, there are 3
faces of the camera rig with cameras pointing in the horizontal
direction.
[0083] In some embodiments the camera rig 801 may be mounted on a
support structure such that it can be rotated around a vertical
axis. In various embodiments the camera rig 801 may be deployed in
an environment of interest, e.g., such as a stadium, auditorium, or
another place where an event to be captured is taking place. In
some embodiments the light field cameras of the camera rig 801 are
used to capture images of the environment of interest, e.g., a 360
degree scene area of interest, and generate depth maps which can be
used in simulating a 3D environment and displaying stereoscopic
imaging content.
[0084] FIG. 7 illustrates a drawing 900 showing the exemplary
camera rig 801 with some elements of the camera rig 801 being shown
in a disassembled form for more clarity and detail. Various
additional elements of the camera rig 801 which were not visible in
the illustration shown in drawing 800 are shown in FIG. 7. In FIG.
7, same reference numbers have been used to identify the elements
of the camera rig 801 which were shown and identified in FIG. 6. In
drawing 900 at least the two side faces 830 and 836 as well as the
top face 840 and bottom face 842 of the camera rig 801 are
visible.
[0085] In drawing 900 various components of the cameras on two out
of four side faces 830, 832, 834, 836 of the camera rig 801 are
shown. The lens assemblies 902, 904 and 906 correspond to cameras
802, 804 and 806 respectively of side face 836 of the camera rig
801. Lens assemblies 910, 912 and 914 correspond to cameras 810,
812 and 814 respectively of side face 830 while lens assembly 925
corresponds to camera 825 on the top face of the camera rig 801.
Also show in drawing 900 are three side support plates 808, 808',
and 808''' which are support the top and bottom cover plates 805
and 842 of the camera rig 801. The side support plates 808, 808',
and 808''' are secured to the top cover 805 and bottom base cover
842 via the corresponding pairs of screws shown in the Figure. For
example the side support plate 808 is secured to the top and bottom
cover plates 805, 842 via the screw pairs 951 and 956, the side
support plate 808' is secured to the top and bottom cover plates
805, 842 via the screw pairs 952 and 954, and the side support
plate 808''' is secured to the top and bottom cover plates 805, 842
via the screw pairs 950 and 958. The camera rig 801 in some
embodiments includes a base support 960 secured to the bottom cover
plate 842 via a plurality of screws 960. In some embodiments via
the base support 960 the camera rig may be mounted on a support
structure such that it can be rotated around a vertical axis, e.g.,
axis going through the center of base 960. The external support
structure may be a tripod or another platform.
[0086] FIG. 8 illustrates a drawing 1000 showing a top view of the
exemplary camera rig 801 with more elements of the camera rig 801
being shown in greater detail. In the top view of the camera rig
801 the other two side faces 832 and 834 which were not fully
visible in drawings 800-900 are more clearly shown. The lens
assemblies 915, 916 and 918 correspond to camera 815 and the
stereoscopic camera pair on the side face 832 of the camera rig
801. Lens assemblies 920, 922 and 924 correspond to camera 920 and
the stereoscopic camera pair on the side face 834 of the camera rig
801.
[0087] As can be seen in drawing 1000, the assembly of cameras on
each of the four sides faces 830, 832, 834, 836 (small arrows
pointing towards the faces) and the top face 840 of the camera rig
801 face in different directions. The cameras on the side faces
830, 832, 834, 836 of the camera rig 801 are pointed in the
horizontal (e.g., perpendicular to the corresponding face) while
the camera(s) on the top face 840 is pointed in the upward vertical
direction. For example as shown in FIG. 8 the cameras on the face
836 of the camera rig 801 (cameras corresponding to lens assemblies
902, 904, 906) are facing in a first direction shown by arrow 1002.
The arrow 1004 shows a second direction in which the cameras on the
face 830 of the camera rig 801 (cameras corresponding to lens
assemblies 910, 912, 914) are facing, arrow 1006 shows a third
direction in which the cameras on the face 832 of the camera rig
801 (cameras corresponding to lens assemblies 915, 916, 918) are
facing, arrow 1008 shows a fourth direction in which the cameras on
the face 834 of the camera rig 801 (cameras corresponding to lens
assemblies 920, 922, 924) are facing and arrow 1010 shows a fifth
(vertical) direction in which the camera on the top face 840 of the
camera rig 801 (camera 825 corresponding to lens assembly 925, is
facing. In various embodiments the first, second, third and fourth
directions are generally horizontal directions while the fifth
direction is a vertical direction. In some embodiments the cameras
on the different side faces 830, 832, 834 and 836 are uniformly
spaced. In some embodiments the angle between the first, second,
third and fourth directions is the same. In some embodiments the
first, second, third and fourth directions are different and 90
degrees apart. In some other embodiments the camera rig is
implemented such that instead of four side faces the camera rig has
3 side faces with the same or similar camera assemblies as shown in
drawings 800-1000. In such embodiments the cameras on the side
faces of the camera rig 801 point in three different directions,
e.g., a first, second and third direction, with the first, second
and third directions being 120 degrees apart.
[0088] FIG. 9 illustrates a drawing 1100 showing a view of yet
another exemplary camera rig 1101 implemented in accordance with
some exemplary embodiments. The exemplary camera rig 1101 is
similar to the camera rig 801 in most and many aspects and includes
the same or similar configuration of cameras as discussed with
regard to camera rig 801 above. The camera rig 1101 includes four
side faces 1130, 1132, 1134, 1136 and a top face 1140 similar to
camera rig 801. Each of the four side faces 1130, 1132, 1134, 1136
of the camera rig 1101 includes an array of cameras including a
light field camera and a pair of stereoscopic camera pair while the
top face 1140 of camera rig includes at least one camera device
1125 similar to what has been shown and discussed with regard to
camera rig 801. However the camera rig 1101 further includes, in
addition to the camera arrays on each of the five faces 1130, 1132,
1134, 1136 and 1140, a sixth bottom face 1142 including at least
one camera 1126 facing vertically downward, e.g., towards the
ground. In some such embodiments the bottom surface camera 1126
facing vertically downwards and the top face camera 1125 facing
vertically upwards are light field cameras. In some embodiments
each of the cameras 1125 and 1126 are part of a corresponding
stereoscopic camera pair on the top and bottom faces 1140, 1142 of
the camera rig 1101.
[0089] While the stereoscopic cameras of the camera rigs 801 and
1101 are used to capture stereoscopic imaging content, e.g., during
an event, the use of light field cameras allows for scanning the
scene area of interest and generate depth maps of various portions
of the scene area captured by the light field cameras (e.g., from
the captured images corresponding to these portions of the scene of
interest). In some embodiments the depth maps of various portions
of the scene area may be combined to generate a composite depth map
of the scene area. Such depth maps and/or composite depth map may,
and in some embodiments are, provided to a playback device for use
in displaying stereoscopic imaging content and simulating a 3D
environment which can be experienced by the viewers.
[0090] FIG. 10 illustrates a front view of an exemplary arrangement
1200 of an array of cameras that can be used in an exemplary camera
rig implemented in accordance with the invention such as camera rig
300, camera rig 400 and/or camera rigs 801 and 1101 in accordance
with some embodiments. In comparison to the arrangement shown in
drawing 800 with a single light field camera arranged on top of a
pair of stereoscopic cameras on each of the faces of the camera rig
801, the exemplary arrangement 1200 uses an array of light field
cameras 1202, 1204 and 1206 arranged with a stereoscopic camera
pair 1208, 1210. The exemplary arrangement 1200 may be, and in some
embodiments is, used in a camera rig (such as camera rig 801)
implemented in accordance with the invention. In such embodiments
each face of the camera rig uses the exemplary arrangement 1200
with three light field cameras (e.g., 1202, 1204 and 1206) arranged
with a single pair of stereoscopic cameras (e.g., 1208, 1210). It
should be appreciated that many variations in arrangement are
possible and are within the scope of the invention.
[0091] FIG. 11 illustrates a front view of yet another exemplary
arrangement 1300 of an array of cameras that can be used in an
exemplary camera rig such as camera rig 801 or any of the other
camera rigs discussed earlier, in accordance with some embodiments.
In comparison to the arrangement shown in drawing 800 with a single
light field camera arranged on top of a pair of stereoscopic
cameras, the exemplary arrangement 1300 uses an array of six light
field cameras 1302, 1304, 1306, 1308, 1310 and 1312 arranged with a
stereoscopic camera pair 1320, 1322. The light field cameras are
stacked in two rows of 3 light field cameras arranged one on top of
the other with each row including a group of three light field
cameras as shown. The exemplary arrangement 1300 may be, and in
some embodiments is, used in a camera rig (such as camera rig 801)
implemented in accordance with the invention with each face of the
camera rig using the arrangement 1300.
[0092] While the stereoscopic cameras of the camera rigs discussed
above are used to capture stereoscopic imaging content, e.g.,
during an event, the use of light field cameras allows for scanning
the scene area of interest and generate depth maps of various
portions of the scene area captured by the light field cameras
(from the captured images corresponding to these portions of the
scene of interest). In some embodiments the depth maps of various
portions of the scene area may be combined to generate a composite
depth map of the scene area. Such depth maps and/or composite depth
map may, and in some embodiments are, provided to a playback device
for use in displaying stereoscopic imaging content and simulating a
3D environment which can be experienced by the viewers.
[0093] The use of light field camera on combination with the
stereoscopic cameras allows for environmental measurements and
generation the environmental depth maps in real time, e.g., during
an event being shot, thus obviating the need for deployment of
environmental measurements to be performed offline ahead in time
prior to the start of an event, e.g., a football game.
[0094] While the depth map generated from each image corresponds to
a portion of the environment to be mapped, in some embodiments the
depth maps generated from individual images are processed, e.g.,
stitched together, to form a composite map of the complete
environment scanned using the light field cameras. Thus by using
the light field cameras a relatively complete environmental map can
be, and in some embodiments is generated.
[0095] In the case of light field cameras, an array of micro-lenses
captures enough information that one can refocus images after
acquisition. It is also possible to shift, after image capture,
one's viewpoint within the sub-apertures of the main lens,
effectively obtaining multiple views. In the case of a light field
camera, depth cues from both defocus and correspondence are
available simultaneously in a single capture. This can be useful
when attempting to fill in occluded information/scene portions not
captured by the stereoscopic cameras.
[0096] The depth maps generated from the light field camera outputs
will be current and is likely to accurately measure changes in a
stadium or other environment of interest for a particular event,
e.g., a concert or game to be captured by a stereoscopic camera. In
addition, by measuring the environment from the same location or
near the location at which the stereoscopic camera are mounted, the
environmental map, at least in some embodiments, accurately
reflects the environment as it is likely to be perceived from the
perspective of the stereoscopic cameras that are used to capture
the event.
[0097] In some embodiments images captured by the light field
cameras can be processed and used to fill in for portions of the
environment which are not captured by a stereoscopic camera pair,
e.g., because the position and/or field of view of the stereoscopic
camera pair may be slightly different from that of the light field
camera and/or due to an obstruction of view from the stereoscopic
cameras. For example, when the light field camera is facing
rearward relative to the position of the stereoscopic pair it may
capture a rear facing view not visible to a forward facing
stereoscopic camera pair. In some embodiments output of the light
field camera is provided to a playback device separately or along
with image data captured by the stereoscopic camera pairs. The
playback device can use all or portions of the images captured by
the light field camera when display of a scene area not
sufficiently captured by the stereoscopic camera pairs is to be
displayed. In addition a portion of an image captured by the light
field camera may be used to fill in a portion of the a stereoscopic
image that was occluded from view from the position of the
stereoscopic camera pair but which a user expects to be able to see
when he or she shifts his or her head to the left or right relative
to the default viewing position corresponding to the location of
the stereoscopic camera pair. For example, if a user leans to the
left or right in an attempt to peer around a column obstructing
his/her view in some embodiments content from one or more images
captured by the light field camera will be used to provide the
image content which was not visible to the stereoscopic camera pair
but which is expected to be visible to the user from the shifted
head portion the user achieves during playback by leaning left or
right.
[0098] Various exemplary camera rigs illustrated in FIGS. 1-9 may
be equipped with a variety of different cameras, e.g., normal
cameras, stereoscopic camera pairs, light field cameras etc. The
exemplary camera rigs are used in various embodiments to capture,
e.g., using the equipped cameras, environmental information, e.g.,
measurements and images, to support various applications in
accordance with the features of the present invention.
[0099] FIG. 12 illustrates an exemplary system 1400 implemented in
accordance with some embodiments of the invention. The system 1400
supports environmental information measurement and capture
including image capture, processing and delivery, e.g., imaging
content, environmental model and/or texture map delivery, to one or
more customer devices, e.g., playback devices/content players,
located at customer premises. The system 1400 includes an exemplary
imaging apparatus 1404, a stereoscopic imaging system 1406, a
processing system 1408, a communications network 1450, and a
plurality of customer premises 1410, . . . , 1412. The imaging
apparatus 1404 includes one or more light field cameras while
stereoscopic imaging system 1406 includes one or more stereoscopic
cameras. In some embodiments the imaging apparatus 1404 and the
stereoscopic imaging system 1406 are included in an exemplary
camera rig 1402 which may be any of the camera rigs discussed
earlier with regard to FIGS. 1-9. The camera rig 1402 may include
additional imaging and/or environmental measurement devices in
addition to the light field camera apparatus and the stereoscopic
imaging system 1406. The imaging apparatus 1402 captures and
processes imaging content in accordance with the features of the
invention. The communications network 1450 may be, e.g., a hybrid
fiber-coaxial (HFC) network, satellite network, and/or
internet.
[0100] The processing system 1408 is configured to process imaging
data received from the one or more light field cameras 1404 and one
or more stereoscopic cameras included in the stereoscopic imaging
system 1406, in accordance with the invention. The processing
performed by the processing system 1408 includes generating depth
map of the environment of interest, generating 3D mesh models and
UV maps and communicating them to one or more playback devices in
accordance with some features of the invention. The processing
performed by the processing system 1408 further includes processing
and encoding stereoscopic image data received from the stereoscopic
imaging system 1406 and delivering that to one or more playback
devices for use in rendering/playback of stereoscopic content
generated from stereoscopic cameras.
[0101] In some embodiments the processing system 1408 may include a
server with the server responding to requests for content, e.g.,
depth map corresponding to environment of interest and/or 3D mesh
model and/or imaging content. The playback devices may, and in some
embodiments do, use such information to simulate a 3D environment
and render 3D image content. In some but not all embodiments the
imaging data, e.g., depth map corresponding to environment of
interest and/or imaging content generated from images captured by
the light field camera device of the imaging apparatus 1404, is
communicated directly from the imaging apparatus 1404 to the
customer playback devices over the communications network 1450.
[0102] The processing system 1408 is configured to stream, e.g.,
transmit, imaging data and/or information to one or more customer
devices, e.g., over the communications network 1450. Via the
network 1450, the processing system 1408 can send and/or exchange
information with the devices located at the customer premises 1410,
1412 as represented in the figure by the link 1409 traversing the
communications network 1450. The imaging data and/or information
may be encoded prior to delivery to one or more playback
devices.
[0103] Each customer premise 1410, 1412 may include a plurality of
devices/players, which are used to decode and playback/display the
imaging content, e.g., captured by stereoscopic cameras 1406 and/or
other cameras deployed in the system 100. The imaging content is
normally processed and communicated to the devices by the
processing system 1408. The customer premise 1 1410 includes a
decoding apparatus/playback device 1422 coupled to a display device
1420 while customer premise N 1412 includes a decoding
apparatus/playback device 1426 coupled to a display device 1424. In
some embodiments the display devices 1420, 1424 are head mounted
stereoscopic display devices. In some embodiments the playback
devices 1422, 1426 receive and use the depth map of the environment
of interest and/or 3D mesh model and UV map received from the
processing system 1408 in displaying stereoscopic imaging content
generated from stereoscopic content captured by the stereoscopic
cameras.
[0104] In various embodiments playback devices 1422, 1426 present
the imaging content on the corresponding display devices 1420,
1424. The playback devices 1422, 1426 may be devices which are
capable of decoding stereoscopic imaging content captured by
stereoscopic camera, generate imaging content using the decoded
content and rendering the imaging content, e.g., 3D image content,
on the display devices 1420, 1424. In various embodiments the
playback devices 1422, 1426 receives the image data and depth maps
and/or 3D mesh models from the processing system 1408 and use them
to display 3D image content.
[0105] FIG. 13, which comprises a combination of FIGS. 13A and 13B,
illustrates a flowchart 1500 of an exemplary method of operating an
imaging system in accordance with some embodiments. The method of
flowchart 1500 is implemented in some embodiments using the imaging
system including image capturing devices and a processing system.
The image capturing devices, e.g., light field cameras and/or
stereoscopic cameras, in the system may be included in and/or
mounted on the various camera rigs shown in the drawings and
discussed in detail above.
[0106] The method starts in step 1502, e.g., with the imaging
system being powered on and initialized. The method proceeds from
start step 1502 to steps 1504, 1506, 1508 which may be performed in
parallel by different elements of the imaging system, e.g., one or
more cameras and a processing system.
[0107] In step 1506 the processing system acquires static
environmental depth map corresponding to an environment of
interest, e.g., by downloading it on the system and/or uploading it
on the processing system a storage medium including the
environmental depth map. The environment of interest may be, e.g.,
a stadium, an auditorium, a field etc. where an event of interest
takes place. In various embodiments the event is captured, e.g.,
recorded, by one or more camera devices including stereoscopic
cameras and light field cameras. The static environmental depth map
includes environmental measurements of the environment of interest
that have been previously made, e.g., prior to the event and thus
are called static. Static environmental depth maps for various
famous environments of interests, e.g., known stadiums, auditoriums
etc., where events occur are readily available however such
environmental depth maps do not take into consideration dynamic
changes to the environment that may occur during an event and/or
other changes that may have occurred since the time when the
environmental measurements were made. The static depth map of
environment of interest may be generated using various measurement
techniques, e.g., using LIDAR and/or other methods. Operation
proceeds from step 1504 to step 1510. While in various embodiments
the processing systems acquires static depth map when available, in
case when static depth map is not available operation proceeds to
next step 1510.
[0108] In step 1510 it is checked if the static depth map is
available, e.g., to the processing system. If the static depth map
is available the operation proceeds from step 1510 to step 1512
otherwise the operation proceeds to step 1518. In step 1512 the
processing system sets the current depth map (e.g., base
environmental depth map to be used) to be the static depth map. In
some embodiments when the system is initialized and depth maps from
other sources are not available then the processing system
initially sets the current depth map to be the static depth map.
Operation proceeds from step 1512 to step 1518.
[0109] Referring to steps along the path corresponding to step
1506. In step 1506 stereoscopic image pairs of portions of the
environment of interest, e.g., left and right eye images, are
captured using one or more stereoscopic camera pair(s). In some
embodiments the stereoscopic camera pair(s) capturing the images
are mounted on the camera rigs implemented in accordance with
various embodiments discussed above. Operation proceeds from step
1506 to step 1514. In step 1514 the captured stereoscopic image
pairs are received at the processing system. Operation proceeds
from step 1514 to step 1516. In step 1516 an environmental depth
map (e.g., composite depth map of the environment of interest) is
generated from the one or more stereoscopic image pairs. Operation
proceeds from step 1516 to step 1518.
[0110] Returning to step 1518. In step 1518 the processing system
determines if the environmental depth map generated from the one or
more stereoscopic image pairs is available (for example in some
cases when the stereoscopic camera pair(s) have not started
capturing stereoscopic images and/or the environmental depth map
has not yet been generated, the environmental depth map based on
the stereoscopic images may not be available to the processing
system). If in step 1518 it is determined that environmental depth
map generated from the one or more stereoscopic image pairs is
available the operation proceeds from step 1518 to step 1520
otherwise the operation proceeds to step 1530.
[0111] In step 1520 it is determined if a current depth map has
already been set. If it is determined that the current depth map
has not been set, the operation proceeds to step 1522 where the
processing system sets the current depth map to be the
environmental depth map generated from the one or more stereoscopic
image pairs. Operation proceeds from step 1522 to step 1530. If in
step 1520 it is determined that the current depth map has already
been set, (e.g., for example the static depth map may have been set
as the current depth map) the operation proceeds to step 1524 where
the processing system reconciles the environmental depth map
generated from the one or more stereoscopic image pairs with the
current depth map. After reconciling operation completes the
reconciled environmental depth map is set as the current depth map.
In various embodiments the reconciled depth map has more and
enhanced depth information compared to either one of the two
individual depth maps used for reconciliation. Operation proceeds
from step 1524 to step 1530.
[0112] Referring to steps along the path corresponding to step
1508. In step 1508 images of portions of the environment of
interest are captured using one or more light field cameras. In
some embodiments the one or more light field cameras capturing the
images are mounted on the camera rigs implemented in accordance
with various embodiments discussed above. Operation proceeds from
step 1508 to step 1526. In step 1526 the images captured by the
light field cameras are received at the processing system
optionally along with depth maps of the portions of the environment
of interest. Thus in some embodiments the one or more light field
cameras generate depth maps of portions of the environment from the
captured images and provides them to the processing system. In some
other embodiments the images captured by the light field cameras
are provided and the processing system generates depth maps of
portions of the environment of interest. Operation proceeds from
step 1526 to step 1528. In step 1528 an environmental depth map
(e.g., composite depth map of the environment of interest) is
generated from the one or more received images captured by the
light field cameras and/or from the depth maps of portions of the
environment of interest. Operation proceeds from step 1528 to step
1530.
[0113] Returning to step 1530. In step 1530 the processing system
determines if the environmental depth map, generated from the image
captured by the light field cameras or from the depth maps of one
or more portions of the environment of interest, is available to
the processing system. If in step 1530 it is determined that
environmental depth map is available the operation proceeds from
step 1530 to step 1532 otherwise the operation proceeds to step
1542 via connecting node B 1540.
[0114] In step 1532 it is determined if a current depth map has
already been set. If it is determined that the current depth map
has not been set, the operation proceeds from step 1532 to step
1534 where the processing system sets the current depth map to be
the environmental depth map generated from the one or more received
images captured by the light field cameras and/or from the depth
maps of portions of the environment of interest. Operation proceeds
from step 1534 to step 1546 via connecting node A 1538. If in step
1532 it is determined that the current depth map has already been
set, (e.g., for example the static depth and/or environmental depth
map generated from stereoscopic images and/or reconciled depth map
may have been set as the current depth map) the operation proceeds
to step 1536 where the processing system reconciles the
environmental depth map generated in step 1528 from the one or more
received images captured by the light field cameras with the
current depth map. After reconciling operation completes the
reconciled environmental depth map is set as the current depth map.
Operation proceeds from step 1536 to step 1546 via connecting node
A 1538.
[0115] If in step 1530 it is determined that environmental depth
map is not available the operation proceeds from step 1530 to step
1542 via connecting node B 1540. In step 1542 it is determined if a
current depth map has already been set. If it is determined that
the current depth map has not been set, the operation proceeds from
step 1542 to step 1544 where the processing system sets the current
depth map to a default depth map corresponding to a sphere since no
other environmental depth map is available to the processing
system. Operation proceeds from step 1544 to step 1546.
[0116] In step 1542 if it is determined if a current depth map has
already been set (e.g., set to one of the generated/reconciled
environmental depth maps or the static depth map or the default
sphere environmental depth) the operation proceeds from step 1542
to step 1546.
[0117] Returning to step 1546. In step 1546 the processing system
outputs the current depth map. The current environmental depth map
may be, and in various embodiments is, provided to one or more
customer rendering and playback devices, e.g., for use in
displaying 3D imaging content. The environmental depth map may be
generated multiple times during an event, e.g., a game and/or other
performance, as things may change dynamically during the event
which may impact the environment of interest and thus updating the
environmental depth map to keep it current is useful if the system
is to be provide information and imaging content which can be used
to provide a real life 3D experience to the viewers. It should be
appreciated that method discussed with regard to flowchart 1500
allows for generating an enhanced and improved environmental depth
map based on depth information from multiple sources, e.g., static
depth maps, depth maps generated using images captured by one or
more stereoscopic camera pairs and/or depth maps generated using
images captured by one or more light field cameras.
[0118] FIGS. 14A and 14B in combination, illustrate a method of
generating and updating 3D mesh models and UV maps in accordance
with an exemplary embodiment that is well suited for use with the
method shown in FIGS. 13A and 13B
[0119] FIGS. 15C and 15D in combination, illustrate a flowchart
1550 of a method of generating and updating 3D mesh models and UV
maps in accordance with an exemplary embodiment that is well suited
for user with the method shown in FIGS. 15A and 15B. In accordance
with one aspect of some embodiments, the generation, transmission
and updating of the 3D mesh model and UV map may be triggered by
detection of significant changes to environmental depth information
obtained from one or more depth measurement sources, e.g., the
light field camera outputs and/or stereoscopic camera pair output.
In some embodiments various steps of the method of flowchart 1550
are performed by the processing system 1408 of system 1400. The
method starts in step 1552 and proceeds to 1554. In step 1554 a
current environmental depth map, e.g., a first environmental depth
map, is received (e.g., selected from the environmental depth maps
generated by the processing system using input from one or more
depth measurement sources).
[0120] Operation proceeds from step 1554 to 1556. In step 1556 a
first 3D mesh model is generated from the current environmental
depth map. Operation proceeds from step 1556 to 1558. In step 1558
a first UV map to be used for wrapping frames (e.g., frames of
images) onto the first 3D mesh model is generated. Operation
proceeds from step 1558 to 1560 wherein the first 3D mesh model and
the first UV map is communicated, e.g., transmitted, to a playback
device.
[0121] Operation proceeds from step 1560 to step 1562. In step 1562
the processing system initializes a current 3D mesh model and UV
map to the first 3D mesh model and the first UV map respectively,
e.g., by setting the current 3D mesh model as the first 3D mesh
model and current UV map as the first UV map. Operation proceeds
from step 1562 to step 1564. In step 1564 the processing system
receives current environmental depth map, e.g., a new environmental
depth map.
[0122] Operation proceeds from step 1564 to step 1566 where it is
determined whether the current environmental depth map reflect a
significant environmental change from the environmental depth map
used to generate the current 3D mesh model. In some embodiments,
the system processing the depth information monitors the depth
information to detect a significant change in the depth
information, e.g., a change in depth information over a
predetermined amount. In some embodiments detection of such a
significant change triggers updating of the current mesh model
and/or UV map. Thus if in step 1566 it is determined that a
significant environmental change is detected between the current
environmental depth map and the environmental depth map used to
generate the current 3D mesh model, the operation proceeds to step
1568 otherwise the operation proceeds back to step 1564.
[0123] Following the determination that a significant environmental
change is detected, in step 1568 the processing system generates an
updated 3D mesh model from the new current environmental depth map.
Operation proceeds from step 1568 to step 1570. In step 1570 an
updated UV map to be used for wrapping frames onto the updated 3D
mesh model is generated.
[0124] Operation proceeds from step 1570 to step 1574 via
connecting node M 1572. In step 1574 3D mesh model difference
information is generated. In various embodiments the 3D mesh model
difference information includes information reflecting the
difference between the new updated 3D mesh model and the currently
used 3D mesh model, e.g., first 3D mesh model. In some cases
communicating the difference information to a playback device is
more efficient rather than communicating the entire updated 3D mesh
model. In such cases by using the received different information
the playback device can, and in various embodiments does, updates
its current 3D mesh model to generate an updated mesh model. While
the 3D mesh model difference information is generated in some
embodiments, e.g., where it is determined that it is more
convenient and/or efficient to send difference information rather
than the entire updated mesh model, step 1574 is optional and not
necessarily performed in all embodiments. Operation proceeds from
step 1574 to step 1576. In step 1576, which is optional too, UV map
difference information is generated, where the UV map difference
information reflects the difference between the new updated UV map
and the currently used UV map, e.g., first UV map.
[0125] Operation proceeds from step 1576 to step 1578. In step 1578
the processing system communicates updated 3D mesh model
information, e.g., the generated updated 3D mesh model or the mesh
model difference information, to a playback device. Operation
proceeds from step 1578 to step 1580. In step 1580 the processing
system communicates updated UV map information, e.g., the generated
updated UV map or the UV map difference information, to a playback
device.
[0126] Operation proceeds from step 1580 to step 1582. In step 1582
the processing system sets the current 3D mesh model to be the
updated 3D mesh model. Operation proceeds from step 1582 to step
1584. In step 1584 the processing system sets the current UV map to
be the updated UV map. It should be appreciated that the updated
mesh model and UV map is based on current depth measurements making
the new mesh model and/or UV map more accurate than the older mesh
models and/or maps based on depth measurement taken at a different
time. Operation proceeds from step 1584 back to 1564 via connecting
node N 1585 and the process continues in the manner as discussed
above.
[0127] FIG. 15 illustrates an exemplary light field camera 1600
implemented in accordance with one exemplary embodiment of the
present invention which can be used in any of the camera rigs
discussed above and shown in the preceding figures. The exemplary
camera device 1600 includes a display device 1602, an input device
1604, an I/O interface 1606, a processor 1608, memory 1610, and a
bus 1609 which are mounted in a housing represented by the
rectangular box touched by the line leading to reference number
1600. The camera device 1600 further includes an optical chain 1612
and a network interface 1614. The various components are coupled
together via bus 1609 which allows for signals and information to
be communicated between the components of the camera 1600.
[0128] The display device 1602 may be, and in some embodiments is,
a touch screen, used to display images, video, information
regarding the configuration of the camera device, and/or status of
data processing being performed on the camera device. In the case
where the display device 1602 is a touch screen, the display device
1602 serves as an additional input device and/or as an alternative
to the separate input device, e.g., buttons, 1604. The input device
1604 may be, and in some embodiments is, e.g., keypad, touch
screen, or similar device that may be used for inputting
information, data and/or instructions.
[0129] Via the I/O interface 1606 the camera device 1600 may be
coupled to external devices and exchange information and signaling
with such external devices. In some embodiments via the I/O
interface 1606 the camera 1600 may, and in some embodiments does,
interfaces with the processing system 1600. In some such
embodiments the processing system 1600 can be used to configure
and/or control the camera 1600.
[0130] The network interface 1614 allows the camera device 1600 to
be able to receive and/or communicate information to an external
device over a communications network. In some embodiments via the
network interface 1614 the camera 1600 communicates captured images
and/or generated depth maps to other devices and/or systems over a
communications network, e.g., internet and/or other network.
[0131] The optical chain 1610 includes a micro lens array 1624 and
an image sensor 1626. The camera 1600 uses the micro lens array
1624 to capture light information of a scene of interest coming
from more than one direction when an image capture operation is
performed by the camera 1600.
[0132] The memory 1612 includes various modules and routines, which
when executed by the processor 1608 control the operation of the
camera 1600 in accordance with the invention. The memory 1612
includes control routines 1620 and data/information 1622. The
processor 1608, e.g., a CPU, executes control routines and uses
data/information 1622 to control the camera 1600 to operate in
accordance with the invention and implement one or more steps of
the method of flowchart 1500. In some embodiments the processor
1608 includes an on-chip depth map generation circuit 1607 which
generates depth map of various portions of the environment of
interest from captured images corresponding to these portions of
the environment of interest which are captured during the operation
of the camera 1600 in accordance with the invention. In some other
embodiments the camera 1600 provides captured images 1628 to the
processing system 1600 which generates depth maps using the images
captured by the light field camera 1600. The depth maps of various
portions of the environment of interest generated by the camera
1600 are stored in the memory 1612 as depth maps 1630 while images
corresponding to one or more portions of the environment of
interest are stored as captured image(s) 1628. The captured images
and depth maps are stored in memory 1612 for future use, e.g.,
additional processing, and/or transmission to another device. In
various embodiments the depth maps 1630 generated by the camera
1600 and one or more captured images 1628 of portions of the
environment of interest captured by the camera 1600 are provided to
the processing system 1408, e.g., via interface 1606 and/or 1614,
for further processing and actions in accordance with the features
of the invention. In some embodiments the depth maps and/or
captured images are provided, e.g., communicated by the camera
1500, to one or more customer devices.
[0133] FIG. 17 illustrates an exemplary processing system 1700 in
accordance with the features of the invention. The processing
system 1700 can be used to implement one or more steps of the
method of flowchart 1500. The processing system 1700 includes
multi-rate encoding capability that can be used to encode and
stream stereoscopic imaging content. The exemplary processing
system 1700 may be used as the processing system 1408 of system
1400.
[0134] The processing system 1700 may be, and in some embodiments
is, used to perform composite environmental depth map generation
operation, multi-rate encoding operation, storage, and transmission
and/or content output in accordance with the features of the
invention. The processing system 1700 may also include the ability
to decode and display processed and/or encoded image data, e.g., to
an operator.
[0135] The system 1700 includes a display 1702, input device 1704,
input/output (I/O) interface 1706, a processor 1708, network
interface 1710 and a memory 1712. The various components of the
system 1700 are coupled together via bus 1709 which allows for data
to be communicated between the components of the system 1700.
[0136] The memory 1712 includes various routines and modules which
when executed by the processor 1708 control the system 1700 to
implement the composite environmental depth map generation,
environmental depth map reconciling, encoding, storage, and
streaming/transmission and/or output operations in accordance with
the invention.
[0137] The display device 1702 may be, and in some embodiments is,
a touch screen, used to display images, video, information
regarding the configuration of the processing system 1700, and/or
indicate status of the processing being performed on the processing
device. In the case where the display device 602 is a touch screen,
the display device 602 serves as an additional input device and/or
as an alternative to the separate input device, e.g., buttons,
1706. The input device 1704 may be, and in some embodiments is,
e.g., keypad, touch screen, or similar device that may be used for
inputting information, data and/or instructions.
[0138] Via the I/O interface 1706 the processing system 1700 may be
coupled to external devices and exchange information and signaling
with such external devices, e.g., such as the camera rig 801 and/or
other camera rigs shown in the figures and/or light field camera
1600. The I/O interface 1606 includes a transmitter and a receiver.
In some embodiments via the I/O interface 1706 the processing
system 1700 receives images captured by various cameras, e.g.,
stereoscopic camera pairs and/or light field cameras (e.g., camera
1600), which may be part of a camera rig such as camera rig
801.
[0139] The network interface 1710 allows the processing system 1700
to be able to receive and/or communicate information to an external
device over a communications network, e.g., such as communications
network 105. The network interface 1710 includes a multiport
broadcast transmitter 1740 and a receiver 1742. The multiport
broadcast transmitter 1740 allows the processing system 1700 to
broadcast multiple encoded stereoscopic data streams each
supporting different bit rates to various customer devices. In some
embodiments the processing system 1700 transmits different portions
of a scene, e.g., 180 degree front portion, left rear portion,
right rear portion etc., to customer devices via the multiport
broadcast transmitter 1740. Furthermore, in some embodiments via
the multiport broadcast transmitter 1740 the processing system 1700
also broadcasts a current environmental depth map to the one or
more customer devices. While the multiport broadcast transmitter
1740 is used in the network interface 1710 in some embodiments,
still in some other embodiments the processing system transmits,
e.g., unicasts, the environmental depth map, 3D mesh model, UV map,
and/or stereoscopic imaging content to individual customer
devices.
[0140] The memory 1712 includes various modules and routines, which
when executed by the processor 1708 control the operation of the
system 1700 in accordance with the invention. The processor 1708,
e.g., a CPU, executes control routines and uses data/information
stored in memory 1712 to control the system 1700 to operate in
accordance with the invention and implement one or more steps of
the method of flowchart of FIGS. 13 and 14. The memory 1712
includes control routines 1714, image encoder(s) 1716, a depth map
availability determination module 1717, a composite depth map
generation module 1718, a current depth map determination module
1719, streaming controller 1720, an image generation module 1721, a
depth map reconciliation module 1722, a 3D mesh model generation
and update module 1740, a UV map generation and update module 1742,
received images 1723 of environment of interest captured by one or
more light field cameras, optional received depth maps of the
environment of interest 1725, received stereoscopic image data
1724, encoded stereoscopic image data 1728, acquired static depth
map 1730, environmental depth map generated from stereoscopic image
pairs 1732, environmental depth map generated from images captured
by one or more light field cameras 1734, reconciled environmental
depth map(s) 1736, a default depth map corresponding to a sphere
1738, generated 3D mesh model(s) 1744, generated UV map(s) 1746,
current 3D mesh model 1748, current UV map 1750.
[0141] In some embodiments the modules are, implemented as software
modules. In other embodiments the modules are implemented outside
the memory 1612 in hardware, e.g., as individual circuits with each
module being implemented as a circuit for performing the function
to which the module corresponds. In still other embodiments the
modules are implemented using a combination of software and
hardware. In the embodiments where one or more modules are
implemented as software modules or routines, the modules and/or
routines are executed by the processor 1708 to control the system
1700 to operate in accordance with the invention and implement one
or more operations discussed with regard to flowcharts 1500 and/or
1550.
[0142] The control routines 1714 include device control routines
and communications routines to control the operation of the
processing system 1700. The encoder(s) 1716 may, and in some
embodiments do, include a plurality of encoders configured to
encode received image content, stereoscopic images of a scene
and/or one or more scene portions in accordance with the features
of the invention. In some embodiments encoder(s) include multiple
encoders with each encoder being configured to encode a
stereoscopic scene and/or partitioned scene portions to support a
given bit rate stream. Thus in some embodiments each scene portion
can be encoded using multiple encoders to support multiple
different bit rate streams for each scene. An output of the
encoder(s) 1716 is the encoded stereoscopic image data 1728 stored
in the memory for streaming to customer devices, e.g., playback
devices. The encoded content can be streamed to one or multiple
different devices via the network interface 1710.
[0143] The composite depth map generation module 1717 is configured
to generate a composite environmental depth maps of the environment
of interest from the images captured by various cameras, e.g.,
stereoscopic camera pairs and one or more light field cameras. Thus
the composite depth map generation module 1717 generates the
environmental depth map 1732 from stereoscopic image pairs, the
environmental depth map 1734 from images captured by one or more
light field cameras.
[0144] The depth map availability determination module 1718 is
configured to determine whether a given depth map is available at a
given time, e.g., whether a static depth map is available and/or
whether an environmental depth map generated from images captured
by light field cameras is available and/or whether an environmental
depth map generated from images captured by stereoscopic camera
pairs is available, at given times.
[0145] The current depth map determination module 1719 is
configured to determine if a current depth map has been set. In
various embodiments the current depth map determination module 1719
is further configured to set one of the environmental depth map or
a reconciled depth map as the current depth map in accordance with
the features of the invention. For example when a reconciled
environmental depth map is available, e.g., having been generated
by reconciling environmental depth maps generated from two or more
sources, the current depth map determination module 1719 sets the
reconciled environmental depth map as the current depth map.
[0146] The streaming controller 1720 is configured to control
streaming of encoded content for delivering the encoded image
content (e.g., at least a portion of encoded stereoscopic image
data 1728) to one or more customer playback devices, e.g., over the
communications network 105. In various embodiments the streaming
controller 1720 is further configured to communicate, e.g.,
transmit, an environmental depth map that has been set as the
current depth map to one or more customer playback devices, e.g.,
via the network interface 1710.
[0147] The image generation module 1721 is configured to generate a
first image from at least one image captured by the light field
camera, e.g., received images 1723, the generated first image
including a portion of the environment of interest which is not
included in at least some of the stereoscopic images (e.g.,
stereoscopic image content 1724) captured by the stereoscopic
cameras. In some embodiments the streaming controller 1720 is
further configured to transmit at least a portion of the generated
first image to one or more customer playback devices, e.g., via the
network interface 1710.
[0148] The depth map reconciliation module 1722 is configured to
perform depth map reconciling operations in accordance with the
invention, e.g., by implementing the functions corresponding to
steps 1526 and 1536 of flowchart 1500. The 3D mesh model generation
and update module 1740 is configured to generate a 3D mesh model
from a current environmental depth map (e.g., reconciled depth map
or environmental depth map that has been set as the current
environmental depth map). The module 1740 is further configured to
update the 3D mesh model when significant environmental changes
have been detected in a current environmental depth map compared to
the environmental depth map used to generate the current 3D mesh
model. In some embodiments the generated 3D mesh model(s) 1744 may
include one or more 3D mesh models generated by module 1740 and the
most recently updated 3D mesh model in the 3D mesh model(s) 1744 is
set as the current 3D mesh model 1748. The UV map generation and
update module 1742 is configured to generate a UV map to be used in
wrapping frames onto the generated 3D mesh model. The module 1742
is further configured to update the UV map. The generated UV map(s)
1746 may include one or more UV maps generated by module 1742 and
the most recently updated UV map in the generated UV map(s) 1746 is
set as the current UV map 1750. In some embodiments the modules are
configured to perform the functions corresponding to various steps
discussed in FIGS. 14A and 14B.
[0149] Received stereoscopic image data 1724 includes stereoscopic
image pairs captured by received from one or more stereoscopic
cameras, e.g., such as those included in the rig 801. Encoded
stereoscopic image data 1728 includes a plurality of sets of
stereoscopic image data which have been encoded by the encoder(s)
1716 to support multiple different bit rate streams.
[0150] The static depth map 1730 is the acquired, e.g., downloaded,
depth map of the environment of interest. The environmental depth
map generated from images captured by stereoscopic camera pairs
1732 and the environmental depth map generated from images captured
by one or more light field cameras 1734 are outputs of the
composite environmental depth map generation module 1717. The
reconciled environmental depth map(s) 1736 includes one or more
environmental depth maps generated by the reconciliation module
1722 in accordance with the invention. The default depth map
corresponding to a sphere 1738 is also stored in memory 1712 for
use in the event when an environmental depth map is not available
from other sources, e.g., when none of the static depth map 1730,
environmental depth map 1732 and environmental depth map 1734 is
available for use. Thus in some embodiments the reconciled
environmental depth map(s) 1736 is set as the current environmental
depth map and used in generating 3D mesh models.
[0151] In some embodiments generation, transmission and updating of
the 3D mesh model and UV map may be triggered by detection of
significant changes to environmental depth information obtained
from one or more depth measurement sources, e.g., the light field
camera outputs and/or stereoscopic camera pair output. See for
example FIGS. 14A and 14B which in combination show a 3D model
updating process. In some embodiments, the system processing the
depth information monitors the depth information to detect a
significant change in the depth information, e.g., a change in
depth over a predetermined amount, e.g., over 20% of the original
measured distance to the perimeter of the environment for an area
corresponding to a portion of the environment, e.g. a portion over
a predetermined threshold size, e.g., 5%, 10%, 20% or some other
amount. In response to detecting such a change, a new model and/or
UV map is generated and transmitted to the playback devices. The
new map is based on current depth measurements making the new mesh
model and/or map more accurate than the old mesh model and/or map
based on depth measurement taken at a different time. Since, in
some embodiments the depth measurements are made during an event on
an ongoing basis and/or are based on environmental measurement made
from images (light field and/or stereoscopic image pairs) captured
during an event, 3D models can be generated in response to changes
in the environment, e.g., changes representing a significant change
in distance from the camera position from which images used as
textures are captured to an object or edge of the environment,
e.g., a wall, roof, curtain, etc. or changes in overall volume,
e.g., due to a roof retracting, a wall moving, etc.
[0152] A complete new 3D model or model difference information
maybe, and in some embodiments is, transmitted to the playback
device as updated model information. In addition to the generation
and transmission of updated 3D model information, updated UV map
information maybe, and some embodiments is, generated and
transmitted to the playback device to be used when rendering images
using the updated 3D model information. Mesh model and/or UV map
updates are normally timed to coincide with scene changes and/or to
align with group of picture (GOP) boundaries in a transmitted image
stream. In this way, application of the new model and/or map will
normally begin being applied in the playback device at a point
where decoding of a current frame does not depend on a frame or
image which was to be rendered using the older model or map since
each GOP boundary normally coincides with the sending of
intra-frame coded image data. Since the environmental changes will
frequently coincide with scene changes such as the closing of a
curtain, moving of a wall, etc. the scene change point is a
convenient point to implement the new model and in many cases will
coincide with the event that triggered the generation and
transmission of the updated model information and/or updated UV
map.
[0153] FIG. 17 illustrates the steps of a method 1800 of operating
a playback device in on exemplary embodiment. In some embodiments
the playback and rendering system 1900 is used to implement the
steps of the method of flowchart 1900. In the FIG. 17 exemplary
embodiment the playback device receives information, e.g., 3D model
information and a UV map and then at a later time, e.g., while an
event is on-going in the case of live streaming, receives updated
model and/or UV map information reflecting changes to the 3D
environment being modeled. For example a stage change and/or
intermission event may have environment changes associated with it
which may be reflected in the new model information and/or UV map.
The updated model information is communicated to, and received by
the playback device as difference information in some embodiments
with the playback device using received information indicating
changes from the original model in combination with the original
model information, e.g., original set of node coordinates in X, Y,
Z space defining the mesh, to produce the updated mesh model, e.g.,
by replacing some coordinates in the set of coordinates defining
the first mesh model with coordinates in the updated mesh model
information to create an updated mesh model. While model difference
information is received and used to create the updated mesh model
in some embodiments, in other embodiments or in cases where there
are changes to the majority of a previously supplied model, a
complete new mesh model may be, and sometimes is, received as part
of the updated mesh model information by the playback device. The
mesh model update information may be based on depth measurements,
e.g., environmental distance measurements based on light field
camera and/or stereoscopic image data captured during the
event.
[0154] In addition to receiving a updated mesh model, in many cases
the playback device receives a corresponding UV map to be used to
map images, e.g., frames, to the 3D space, e.g., onto a 3D mesh
model defining the 3D environmental space. The frames may be, and
sometimes are, generated from image data captured by one or more
stereoscopic camera pairs mounted on a camera rig which also
includes one or more light field cameras, e.g., Lytro cameras, used
to capture depth information useful in updating a 3D map. While new
or updated UV map information is often received when updated mesh
model information is received, if the number of nodes in the 3D
mesh model remains the same before and after an update, the UV map
may not be updated at the same time as the 3D mesh model. UV map
information may be transmitted and received as a complete new map
or as difference information. Thus, in some embodiments UV map
difference information is received and processed to generate an
updated UV map. The updated difference map maybe and sometimes is,
generated by applying the differences indicated in the updated UV
map information to the previous UV map.
[0155] The method of flowchart 1800 begins in start step 1802 with
a playback device such as a game console and display or head
mounted display assembly being powered on and set to begin
receiving, storing and processing 3D related image data, e.g.,
frames representing texture information produced from captured
images, model information and/or UV maps to be used in rendering
images. Operation proceeds from start step 1802 to step 1804 in
which information communicating a first mesh model of a 3D
environment, e.g., a stadium, theater, etc., generated based on
measurements of at least a portion of the environment made using a
light field camera at a first time is received and stored, e.g., in
memory. The model maybe, and sometimes is, in the form of a set of
3D coordinates (X, Y, Z) indicating distances to nodes from an
origin corresponding to a user viewing position. The node
coordinates define a mesh model. Thus in some embodiments the first
mesh model information includes a first set of coordinate triples,
each triple indicating a coordinate in X, Y, Z space of a node in
the first mesh model.
[0156] The mesh model includes segments formed by the
interconnection of the nodes points in an indicated or
predetermined manner. For example, each node in all or a portion of
the mesh may be coupled to the nearest 3 adjacent nodes for
portions of the mesh model where 3 sided segments are used. In
portions of the mesh model where four sided segments are used, each
node may be known to interconnect with its four nearest neighbors.
In addition to node location information, the model may, and in
some embodiments does, include information about how nodes in the
model are to be interconnected. In some embodiments information
communicating the first mesh model of the 3D environment includes
information defining a complete mesh model.
[0157] Operation proceeds from step 1804 to step 1806 in which a
first map, e.g., a first UV map, indicating how a 2D image, e.g.,
received frame, is to be wrapped onto the first 3D model is
received. The first UV map usually includes one segment for each
segment of the 3D model map with there being a one to one indicated
or otherwise known correspondence between the first UV map segments
to the first 3D model segments. The first UV map can, and is used,
as part of the image rendering process to apply, e.g., wrap, the
content of 2D frames which correspond to what is sometimes referred
to as UV space to the segments of the first 3D model. This mapping
of the received textures in the form of frames corresponding to
captured image data to the 3D environment represented by the
segments of the 3D model allows received left and right eye frames
corresponding to stereoscopic image pairs to be rendered into
images which are to be viewed by the user's left and right eyes,
respectively.
[0158] The receipt of the first 3D model and first rendering map,
e.g., a first UV map, can occur together or in any order and are
shown as sequential operation in FIG. 17 for purposes of providing
a simple to understand example. Operation process from step 1806 to
step 1808 in which image content, e.g., one or more frames is
received. The image content maybe, and in some embodiments is, in
the form of stereoscopic image data where pairs of left and right
eye images are received with the content for each eye sometimes
being represented as a single frame of a stereoscopic frame pair.
The image content received in step 1808 will normally be a sequence
of frame pairs, e.g., a video sequence, corresponding to a portion
of an event.
[0159] Operation proceeds from step 1808 to step 1810 in which at
least one image is rendered using the first mesh model. As part of
the image rendering performed in step 1810, the first UV map is
used to determine how to wrap an image included in the received
image content on to the first mesh model to generate an image which
can be displayed and viewed by a user. Each of the left and right
eye images of a stereoscopic pair will be, in some embodiments,
rendered individually and may be displayed on different portions of
a display so that different images are viewed by the left and right
eyes allowing for images to be perceived by the user as having a 3D
effect. The rendered images are normally displayed to the user
after rendering, e.g., via a display device which in some
embodiments is a cell phone display mounted in a helmet which can
be worn on a person's head, e.g., as a head mounted display
device.
[0160] While multiple images may be rendered and displayed over
time as part of step 1810, at some point during the event being
captured and streamed for playback, a change in the environment may
occur such as a curtain being lowered, a wall of a stage being
moved, a dome on a stadium being opened or closed. Such events may,
and in various embodiments will be, detected by environmental
measurements being performed. In response to detecting a change in
the environment, a new 3D mesh model and UV map may be generated by
the system processing the captured images and/or environmental
measurements.
[0161] In step 1814, updated mesh model information is received.
The updated mesh model, in some cases, includes updated mesh model
information, e.g., new node points, generated based on measurement
of a portion of the environment. The measurements may correspond to
the same portion of the environment to which the earlier
measurements for the first mesh model correspond and/or the new
measurements may at include measurements of the portion of the
environment. Such measurements maybe, and sometimes are, based on
environmental depth measurements relative to the camera rig
position obtained using a light field camera, e.g., such as the
ones illustrated in the preceding figures. In some embodiments
updated mesh model information including at least some updated mesh
model information generated based on measurements of at least the
portion said environment using said light field camera at a second
time, e.g., a time period after the first time period.
[0162] The updated mesh model information received in step 1814 may
be in the form of a complete updated mesh model or in the form of
difference information indicating changes to be made to the first
mesh model to form the updated mesh model. Thus in some embodiments
updated mesh model information is difference information indicating
a difference between said first mesh model and an updated mesh
model. In optional step 1815 which is performed when model
difference information is received, the playback device generates
the updated mesh model from the first mesh model and the received
difference information. For example, in step 1815 nodes not
included in the updated mesh model may be deleted from the set of
information representing the first mesh model and replaced with new
nodes indicated by the mesh module update information that was
received to thereby create the updated mesh model. Thus in some
embodiments the updated mesh model information includes information
indicating changes to be made to the first mesh model to generate
an updated mesh model. In some embodiments the updated mesh model
information provides new mesh information for portions of the 3D
environment which have changed between the first and second time
periods. In some embodiments the updated mesh model information
includes at least one of: i) new sets of mesh coordinates for at
least some nodes in the first mesh model information, the new
coordinates being intended to replace coordinates of corresponding
nodes in the first mesh model; or ii) a new set of coordinate
triples to be used for at least a portion of the mesh model in
place of a previous set of coordinate triples, the new set of
coordinate triples including the same or a different number of
coordinate triples than the previous set of coordinate triples to
be replaced.
[0163] In addition to receiving updated mesh model information the
playback device may receive updated map information. This is shown
in step 1816. The updated map information maybe in the form of a
complete new UV map to be used to map images to the updated mesh
model or in the form of difference information which can be used in
combination with the first map to generate an updated map. While an
updated UV map need not be supplied with each 3D model update, UV
map updates will normally occur at the same time as the model
updates and will occur when a change in the number of nodes occurs
resulting in a different number of segments in the 3D mesh model.
Updated map information need not be provided if the number of
segments and nodes in the 3D model remain unchanged but will in
many cases be provided even if there is no change in the number of
model segments given that the change in the environmental shape may
merit a change in how captured images are mapped to the 3D mesh
model being used.
[0164] If difference information is received rather than a complete
UV map, the operation proceeds from step 1816 to step 1818. In step
1818, which is used in the case where map difference information is
received in step 1816, an updated map is generated by applying the
map difference information included in the received updated map
information to the first map. In the case where a complete updated
UV map is received in step 1816 there is no need to generate the
updated map from difference information since the full updated map
is received.
[0165] In parallel with or after the receipt and/or generation of
the updated 3D mesh model and/or updated UV map, additional image
content is received in step 1820. The additional image content, may
and sometimes does correspond to, for example, a second portion of
an event which follows a first event segment to which the first 3D
model corresponded. Operation proceeds from step 1820 to step 1822.
In step 1822 the additional image content is rendered. As part of
the image rendering performed in step 1822, the updated 3D model is
used to render at least some of the received additional image
content as indicated in step 1824. The update UV map will also be
used as indicated by step 1826 when it is available. When no
updated UV map has been received or generated, the image rendering
in step 1822 will use the old, e.g., first UV map as part of the
rendering process. Images rendered in step 1822 are output for
display.
[0166] The updating of the 3D model and/or UV map may occur
repeatedly during a presentation in response to environmental
changes. This on going potential for repeated model and UV map
updates is represented by arrow 1827 which returns processing to
step 1814 where additional updated mesh model information may be
received. With each return to step 1814, the current mesh model and
UV model is treated as the first mesh model for purposes of
generating a new updated mesh model and/or UV map in the case where
the update includes difference information.
[0167] The processing described with regard to FIG. 17 is performed
under control of a playback device processor. Accordingly, in some
embodiments the playback device includes a processor configured to
control the playback device to implement the steps shown in FIG.
17. The transmission and receiving steps are performed via the
interfaces (which include transmitters and receivers) of the
playback devices.
[0168] In some embodiments the playback device includes
instructions which, when executed by a processor of the playback
device, control the playback device to implemented the steps shown
in FIG. 17. Separate processor executable code can be and sometimes
is included for each of the steps shown in FIG. 17. In other
embodiments a circuit is included in the playback device for each
of the individual steps shown in FIG. 17.
[0169] FIG. 18 illustrates an exemplary playback device, e.g.,
system, 1900 that can be used to receive, decode and display the
content streamed by one or more sub-systems of the system 1400 of
FIG. 12, e.g., such as the processing system 1408/1700. The
exemplary rendering and playback system 1900 may be used as any of
the rendering and playback devices shown in FIG. 12. In various
embodiments the playback system 1900 is used to perform the various
steps illustrated in flowchart 1800 of FIG. 17.
[0170] The rendering and playback system 1900 in some embodiments
include and/or coupled to 3D head mounted display 1905. The system
1900 includes the ability to decode the received encoded image data
and generate 3D image content for display to the customer. The
playback system 1900 in some embodiments is located at a customer
premise location such as a home or office but may be located at an
image capture site as well. The playback system 1900 can perform
signal reception, decoding, 3D mesh model updating, rendering,
display and/or other operations in accordance with the
invention.
[0171] The playback system 1900 includes a display 1902, a display
device interface 1903, a user input interface device 1904,
input/output (I/O) interface 1906, a processor 1908, network
interface 1910 and a memory 1912. The various components of the
playback system 1900 are coupled together via bus 1909 which allows
for data to be communicated between the components of the system
1900.
[0172] While in some embodiments display 1902 is included as an
optional element as illustrated using the dashed box, in some
embodiments an external display device 1905, e.g., a head mounted
stereoscopic display device, can be coupled to the playback system
1900 via the display device interface 1903. The head mounted
display 1202 maybe implemented using the OCULUS RIFT.TM. VR
(virtual reality) headset which may include the head mounted
display 1202. Other head mounted displays may also be used. The
image content is presented on the display device of system 1900,
e.g., with left and right eyes of a user being presented with
different images in the case of stereoscopic content. By displaying
different images to the left and right eyes on a single screen,
e.g., on different portions of the single screen to different eyes,
a single display can be used to display left and right eye images
which will be perceived separately by the viewer's left and right
eyes. While various embodiments contemplate a head mounted display
to be used in system 1900, the methods and system can also be used
with non-head mounted displays which can support 3D image.
[0173] The operator of the playback system 1900 may control one or
more parameters and/or provide input via user input device 1904.
The input device 1904 may be, and in some embodiments is, e.g.,
keypad, touch screen, or similar device that may be used for
inputting information, data and/or instructions.
[0174] Via the I/O interface 1906 the playback system 1900 may be
coupled to external devices and exchange information and signaling
with such external devices. In some embodiments via the I/O
interface 1906 the playback system 1900 receives images captured by
various cameras, e.g., stereoscopic camera pairs and/or light field
cameras, receive 3D mesh models and UV maps.
[0175] The memory 1912 includes various modules, e.g., routines,
which when executed by the processor 1908 control the playback
system 1900 to perform operations in accordance with the invention.
The memory 1912 includes control routines 1914, a user input
processing module 1916, a head position and/or viewing angle
determination module 1918, a decoder module 1920, a stereoscopic
image rendering module 1922 also referred to as a 3D image
generation module, a 3D mesh model update module 1924, a UV map
update module 1926, received 3D mesh model 1928, received UV map
1930, and data/information including received encoded image content
1932, decoded image content 1934, updated 3D mesh model information
1936, updated UV map information 1938, updated 3D mesh model 1940,
updated UV map 1940 and generated stereoscopic content 1934.
[0176] The processor 1908, e.g., a CPU, executes routines 1914 and
uses the various modules to control the system 1900 to operate in
accordance with the invention. The processor 1908 is responsible
for controlling the overall general operation of the system 1100.
In various embodiments the processor 1108 is configured to perform
functions that have been discussed as being performed by the
rendering and playback system 1900.
[0177] The network interface 1910 includes a transmitter 1911 and a
receiver 1913 which allows the playback system 1900 to be able to
receive and/or communicate information to an external device over a
communications network, e.g., such as communications network 1450.
In some embodiments the playback system 1900 receives, e.g., via
the interface 1910, image content 1932, 3D mesh model 1928, UV map
1930, updated mesh model information 1936, updated UV map
information 1938 from the processing system 1700 over the
communications network 1450. Thus in some embodiments the playback
system 1900 receives, via the interface 1910, information
communicating a first mesh model, e.g., the 3D mesh model 1928, of
a 3D environment generated based on measurements of at least a
portion of the environment made using a light field camera at a
first time. The playback system 1900 in some embodiments further
receives via the interface 1910, image content, e.g., frames of
left and right eye image pairs.
[0178] The control routines 1914 include device control routines
and communications routines to control the operation of the system
1900. The request generation module 1916 is configured to generate
request for content, e.g., upon user selection of an item for
playback. The received information processing module 1917 is
configured to process information, e.g., image content, audio data,
environmental models, UV maps etc., received by the system 1900,
e.g., via the receiver of interface 1906 and/or 1910, to recover
communicated information that can be used by the system 1900, e.g.,
for rendering and playback. The head position and/or viewing angle
determination module 1918 is configured to determine a current
viewing angle and/or a current head position, e.g., orientation, of
the user, e.g., orientation of the head mounted display, and in
some embodiment report the determined position and/or viewing angle
information to the processing system 1700.
[0179] The decoder module 1920 is configured to decode encoded
image content 1932 received from the processing system 1700 or the
camera rig 1402 to produce decoded image data 1934. The decoded
image data 1934 may include decoded stereoscopic scene and/or
decoded scene portions.
[0180] The 3D image renderer 1922 uses decoded image data to
generate 3D image content in accordance with the features of the
invention for display to the user on the display 1902 and/or the
display device 1905. In some embodiments the 3D image renderer 1922
is configured to render, using a first 3D mesh model at least some
of received image content. In some embodiments the 3D image
renderer 1922 is further configured to use a first UV map to
determine how to wrap an image included in received image content
onto the first 3D mesh model.
[0181] The 3D mesh model update module 1924 is configured to update
a received first 3D mesh model 1928 (e.g., initially received mesh
model) using received updated mesh model information 1936 to
generate an updated mesh model 1940. In some embodiments the
received updated mesh model information 1936 includes mesh model
difference information reflecting the changes with respect to a
previous version of the 3D mesh model received by the playback
device 1900. In some other embodiments the received updated mesh
model information 1936 includes complete information for generating
a full complete 3D mesh model which is then output as the updated
mesh model 1940.
[0182] The UV map update module 1926 is configured to update a
received first UV map 1930 (e.g., initially received UV map) using
received updated UV map information 1938 to generate an updated UV
map 1942. In some embodiments the received updated UV map
information 1938 includes difference information reflecting the
changes with respect to a previous version of the UV map received
by the playback device 1900. In some other embodiments the received
updated UV map information 1938 includes information for generating
a full complete UV map which is then output as the updated UV map
1942.
[0183] In various embodiments when the 3D mesh model and/or UV map
is updated in accordance with the invention, 3D image rendering
module 1922 is further configured to render, using a updated mesh
model, at least some of the image content, e.g., additional image
content. In some such embodiments the 3D image rendering module
1922 is further configured use the updated UV map to determine how
to wrap an image included in the image content to be rendered onto
the updated 3D mesh model. The generated stereoscopic image content
1944 is the output of the 3D image rendering module 1922.
[0184] In some embodiments some of the modules are implemented,
e.g., as circuits, within the processor 1908 with other modules
being implemented, e.g., as circuits, external to and coupled to
the processor. Alternatively, rather than being implemented as
circuits, all or some of the modules may be implemented in software
and stored in the memory of the playback device 1900 with the
modules controlling operation of the playback device 1900 to
implement the functions corresponding to the modules when the
modules are executed by a processor, e.g., processor 1908. In still
other embodiments, various modules are implemented as a combination
of hardware and software, e.g., with a circuit external to the
processor 1908 providing input to the processor 1908 which then
under software control operates to perform a portion of a module's
function.
[0185] While shown in FIG. 18 example to be included in the memory
1912, the modules shown included in the memory 1912 can, and in
some embodiments are, implemented fully in hardware within the
processor 1908, e.g., as individual circuits. In other embodiments
some of the elements are implemented, e.g., as circuits, within the
processor 1108 with other elements being implemented, e.g., as
circuits, external to and coupled to the processor 1108. As should
be appreciated the level of integration of modules on the processor
and/or with some modules being external to the processor may be one
of design choice.
[0186] While shown in the FIG. 18 embodiment as a single processor
1908, e.g., computer, within device 1900, it should be appreciated
that processor 1908 may be implemented as one or more processors,
e.g., computers. When implemented in software, the modules include
code, which when executed by the processor 1908, configure the
processor, e.g., computer, to implement the function corresponding
to the module. In some embodiments, processor 1908 is configured to
implement each of the modules shown in memory 1912 in FIG. 18
example. In embodiments where the modules are stored in memory
1912, the memory 1912 is a computer program product, the computer
program product comprising a computer readable medium, e.g., a
non-transitory computer readable medium, comprising code, e.g.,
individual code for each module, for causing at least one computer,
e.g., processor 1908, to implement the functions to which the
modules correspond.
[0187] As should be appreciated, the modules illustrated in FIG. 18
control and/or configure the system 1900 or elements therein
respectively such as the processor 1908 to perform the functions of
corresponding steps of the methods of the present invention, e.g.,
such as those illustrated and/or described in the flowchart
1800.
[0188] In one exemplary embodiment the processor 1908 is configured
to control the playback device 1900 to: receive, e.g., via
interface 1910, information communicating a first mesh model of a
3D environment generated based on measurements of at least a
portion of said environment made using a light field camera at a
first time; receive, e.g., via the interface 1910, image content;
and render, using said first mesh model at least some of the
received image content.
[0189] In some embodiments the processor is further configured to
control the playback device to receive, e.g., via the interface
1910, updated mesh model information, said updated mesh model
information including at least some updated mesh model information
generated based on measurements of at least the portion said
environment using said light field camera at a second time. In some
embodiments the updated mesh model information communicates a
complete updated mesh model.
[0190] In some embodiments the processor is further configured to
control the playback device to: receive additional image content;
and render, using said updated mesh model information, at least
some of the received additional image content.
[0191] In some embodiments the processor is further configured to
control the playback device to: receive (e.g., via the interface
1910 or 1906), a first map mapping a 2D image space to said first
mesh model; and use said first map to determine how to wrap an
image included in said received image content onto said first mesh
model as part of being configured to render, using said first mesh
model, at least some of the received image content.
[0192] In some embodiments the processor is further configured to
control the playback device to: receive (e.g., via the interface
1910 or 1906) updated map information corresponding to said updated
mesh model information; and use said updated map information to
determine how to wrap an additional image included in said received
additional image content onto said updated mesh model as part of
being configured to render, using said updated mesh model
information, at least some of the received additional image
content.
[0193] In some embodiments the updated map information includes map
difference information. In some such embodiments the processor is
further configured to control the playback device to: generate an
updated map by applying said map difference information to said
first map to generate an updated map; and use said updated map to
determine how to wrap an additional image included in said received
additional image content onto said updated mesh model as part of
rendering, using said updated mesh model information, at least some
of the received additional image content.
[0194] While steps are shown in an exemplary order it should be
appreciated that in many cases the order of the steps may be
altered without adversely affecting operation. Accordingly, unless
the exemplary order of steps is required for proper operation, the
order of steps is to be considered exemplary and not limiting.
[0195] While various embodiments have been discussed, it should be
appreciated that not necessarily all embodiments include the same
features and some of the described features are not necessary but
can be desirable in some embodiments.
[0196] While various ranges and exemplary values are described the
ranges and values are exemplary. In some embodiments the ranges of
values are 20% larger than the ranges discussed above. In other
embodiments the ranges are 20% smaller than the exemplary ranges
discussed above. Similarly, particular values may be, and sometimes
are, up to 20% larger than the values specified above while in
other embodiments the values are up to 20% smaller than the values
specified above. In still other embodiments other values are
used.
[0197] FIG. 19 illustrates an exemplary 3D mesh model 2000 that may
be used in various embodiments with a plurality of nodes
illustrated as the point of intersection of lines used to divide
the 3D model into segments. Note that the model of FIG. 19 is shown
in 3D space and can be expressed as a set of [X,Y,Z] coordinates
defining the location of the nodes in the mesh in 3D space assuming
the shape of the segments is known or the rules for interconnecting
the nodes is known or defined in the 3D model. In some embodiments
the segments are predetermined to have the same number of sides
with each node connecting to a predetermined number of adjacent
nodes by straight lines. In the FIG. 19 example the top portion of
the model 2000 is a set of triangular segments while the side
portions are formed by a plurality of four sided segments. Such a
configuration, e.g., top portion being formed of 3 sided segments
and a side portion formed by 4 sided segments may be included in
the information forming part of the 3D model or predetermined. Such
information is provided to the customer rendering and playback
devices along with or as part of the mesh model information.
[0198] FIG. 20 shows an exemplary UV map 2002 which may be used in
mapping a frame in what is sometimes referred to as 2D UV space to
the 3D model 2000 shown in FIG. 19. Note that the UV map 2002
includes the same number of nodes and segments as in the 3D model
2000 with a one to one mapping relationship. Frames which provide
what is sometimes referred to as texture, but which normally
include content of images captured from the vantage point of a
camera rig in a real environment, at a location corresponding to
the position [0, 0, 0] within the 3D model 2000 of the simulated
environment, may be applied, e.g., wrapped, on to the 3D model 2000
in accordance with the map 2002 as part of an image rendering
operation.
[0199] In FIGS. 19 and 20, exemplary node P which is shown as a dot
for emphasis, like each of the other mesh nodes, appears in both
the UV map 2002 and the 3D model 2000. Note that the node P[X, Y,
Z] corresponds to the node P[U,V], where X, Y, Z specify the
position of node P in X, Y, Z space and U,V specify the location of
the corresponding node P in the two dimensional space. Each U,V
pair represents the X, Y of a single pixel of the 2D image texture,
e.g., a frame. Surrounding pixels are mapped from the 2D frame to
the 3D mesh during the rendering process by interpolating between
nearby U,V pairs.
[0200] The techniques of various embodiments may be implemented
using software, hardware and/or a combination of software and
hardware. Various embodiments are directed to apparatus, e.g., a
image data capture and processing systems. Various embodiments are
also directed to methods, e.g., a method of image capture and/or
processing image data. Various embodiments are also directed to a
non-transitory machine, e.g., computer, readable medium, e.g., ROM,
RAM, CDs, hard discs, etc., which include machine readable
instructions for controlling a machine to implement one or more
steps of a method.
[0201] Various features of the present invention are implemented
using modules. Such modules may, and in some embodiments are,
implemented as software modules. In other embodiments the modules
are implemented in hardware. In still other embodiments the modules
are implemented using a combination of software and hardware. In
some embodiments the modules are implemented as individual circuits
with each module being implemented as a circuit for performing the
function to which the module corresponds. A wide variety of
embodiments are contemplated including some embodiments where
different modules are implemented differently, e.g., some in
hardware, some in software, and some using a combination of
hardware and software. It should also be noted that routines and/or
subroutines, or some of the steps performed by such routines, may
be implemented in dedicated hardware as opposed to software
executed on a general purpose processor. Such embodiments remain
within the scope of the present invention. Many of the above
described methods or method steps can be implemented using machine
executable instructions, such as software, included in a machine
readable medium such as a memory device, e.g., RAM, floppy disk,
etc. to control a machine, e.g., general purpose computer with or
without additional hardware, to implement all or portions of the
above described methods. Accordingly, among other things, the
present invention is directed to a machine-readable medium
including machine executable instructions for causing a machine,
e.g., processor and associated hardware, to perform one or more of
the steps of the above-described method(s).
[0202] Some embodiments are directed a non-transitory computer
readable medium embodying a set of software instructions, e.g.,
computer executable instructions, for controlling a computer or
other device to encode and compresses stereoscopic video. Other
embodiments are embodiments are directed a computer readable medium
embodying a set of software instructions, e.g., computer executable
instructions, for controlling a computer or other device to decode
and decompresses video on the player end. While encoding and
compression are mentioned as possible separate operations, it
should be appreciated that encoding may be used to perform
compression and thus encoding may, in some include compression.
Similarly, decoding may involve decompression.
[0203] In various embodiments a processor of a processing system is
configured to control the processing system to perform the method
steps performed by the exemplary described processing system. In
various embodiments a processor of a playback device is configured
to control the playback device to implement the steps, performed by
a playback device, of one or more of the methods described in the
present application.
[0204] Numerous additional variations on the methods and apparatus
of the various embodiments described above will be apparent to
those skilled in the art in view of the above description. Such
variations are to be considered within the scope.
* * * * *