U.S. patent application number 15/573682 was filed with the patent office on 2018-05-03 for generation, transmission and rendering of virtual reality multimedia.
The applicant listed for this patent is PCP VR INC.. Invention is credited to Erik PETERSON, Aria SHAHINGOHAR.
Application Number | 20180122129 15/573682 |
Document ID | / |
Family ID | 54479071 |
Filed Date | 2018-05-03 |
United States Patent
Application |
20180122129 |
Kind Code |
A1 |
PETERSON; Erik ; et
al. |
May 3, 2018 |
GENERATION, TRANSMISSION AND RENDERING OF VIRTUAL REALITY
MULTIMEDIA
Abstract
A method of generating virtual reality data includes: obtaining
point cloud data, the point cloud data including colour and
three-dimensional position data for each of a plurality of points
corresponding to locations in a capture volume; generating primary
image data containing (i) a first projection of a first subset of
the points into a two-dimensional frame of reference, and (ii) for
each point of the first subset, depth data derived from the
corresponding position data; generating secondary image data
containing (i) a second projection of a second subset of the points
into the two-dimensional frame of reference, the second projection
overlapping with at least part of the first projection in the
two-dimensional frame of reference, and (ii) for each point of the
second subset, depth data derived from the corresponding position
data; and storing the primary image data and the secondary image
data in a memory.
Inventors: |
PETERSON; Erik; (Toronto,
CA) ; SHAHINGOHAR; Aria; (Toronto, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
PCP VR INC. |
Toronto |
|
CA |
|
|
Family ID: |
54479071 |
Appl. No.: |
15/573682 |
Filed: |
November 19, 2015 |
PCT Filed: |
November 19, 2015 |
PCT NO: |
PCT/IB2015/058987 |
371 Date: |
November 13, 2017 |
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
H04N 2213/003 20130101;
H04N 13/366 20180501; G06T 1/60 20130101; H04N 13/279 20180501;
H04N 13/204 20180501; G06T 19/006 20130101; G06T 2210/56 20130101;
H04N 13/161 20180501; G06T 15/205 20130101; G06T 15/00 20130101;
G06T 2215/16 20130101; H04N 21/854 20130101; H04N 13/344 20180501;
G06T 2207/10028 20130101; G06T 19/00 20130101 |
International
Class: |
G06T 15/20 20060101
G06T015/20 |
Foreign Application Data
Date |
Code |
Application Number |
May 13, 2015 |
CA |
PCT/CA2015/000306 |
Claims
1. A method of generating virtual reality multimedia data,
comprising: obtaining point cloud data at a processor of a
generation computing device, the point cloud data including colour
and three-dimensional position data for each of a plurality of
points corresponding to locations in a capture volume; generating,
at the processor, primary image data containing (i) a first
projection of a first subset of the points into a two-dimensional
frame of reference, and (ii) for each point of the first subset,
depth data derived from the corresponding position data;
generating, at the processor, secondary image data containing (i) a
second projection of a second subset of the points into the
two-dimensional frame of reference, the second projection
overlapping with at least part of the first projection in the
two-dimensional frame of reference, and (ii) for each point of the
second subset, depth data derived from the corresponding position
data; and storing the primary image data and the secondary image
data in a memory connected to the processor.
2. The method of claim 1, wherein obtaining the point cloud data
includes retrieving the point cloud data from a memory.
3. The method of claim 1, wherein obtaining the point cloud data
includes: receiving raw point cloud data from a capture apparatus;
and generating the point cloud data by registering the raw point
cloud data to a common three-dimensional frame of reference.
4. The method of claim 1, the primary image data including: a first
image dimensioned according to the two-dimensional frame of
reference and containing colour data for each point of the first
subset; and a second image dimensioned according to the
two-dimensional frame of reference and containing depth data for
each point of the first subset.
5. The method of claim 4, wherein the first image and the second
image are cube map projections.
6. The method of claim 4, wherein the first image and the second
image are YUV images having luminance and chrominance channels, and
wherein the depth data includes a depth value stored in the
luminance channel.
7. The method of claim 1, the secondary image data including: a
first image dimensioned according to the two-dimensional frame of
reference and containing colour data for each point of the second
subset; and a second image dimensioned according to the
two-dimensional frame of reference and containing depth data for
each point of the first subset.
8. The method of claim 1, wherein the first image and the second
image are cube map projections.
9. The method of claim 7, further comprising: detecting that a
plurality of colliding ones of the second subset of points have a
common position in the two-dimensional frame of reference; storing
colour data and depth data for one of the colliding points in the
first image and the second image according to the common position;
storing colour data and depth data for another of the colliding
points in the first image and the second image at a two-dimensional
offset from the common position.
10. The method of claim 9, further comprising: storing the
two-dimensional offset in the second image.
11. The method of claim 1, wherein generating the primary image
data includes: setting a viewpoint position corresponding to a
location in the capture volume; and selecting the first subset of
the points by: for each of a plurality of paths extending from the
viewpoint position, selecting the first point of the point cloud
data encountered by the path.
12. The method of claim 1, wherein generating the primary image
data includes: setting a viewpoint position corresponding to a
location in the capture volume; and selecting the first subset of
the points by: determining a distance from the viewpoint to each of
the plurality of points; comparing the distance to a threshold; and
selecting the points having a smaller distance than the threshold
from the viewpoint.
13. The method of claim 1, further comprising: transmitting the
primary image data and the secondary image data for receipt by a
client device.
14. A generation computing device, comprising: a memory; a network
interface; and a processor interconnected with the memory and the
network interface, the processor configured to: obtain point cloud
data at a processor of a generation computing device, the point
cloud data including colour and three-dimensional position data for
each of a plurality of points corresponding to locations in a
capture volume; generate primary image data containing (i) a first
projection of a first subset of the points into a two-dimensional
frame of reference, and (ii) for each point of the first subset,
depth data derived from the corresponding position data; generate
secondary image data containing (i) a second projection of a second
subset of the points into the two-dimensional frame of reference,
the second projection overlapping with at least part of the first
projection in the two-dimensional frame of reference, and (ii) for
each point of the second subset, depth data derived from the
corresponding position data; and store the primary image data and
the secondary image data in the memory.
15. A method of rendering virtual reality multimedia data,
comprising: obtaining primary image data containing (i) a first
projection of a first subset of points in a three-dimensional point
cloud into a two-dimensional frame of reference, and (ii) for each
point of the first subset, depth data derived from corresponding
position data of the points; obtaining secondary image data
containing (i) a second projection of a second subset of the points
into the two-dimensional frame of reference, the second projection
overlapping with at least part of the first projection in the
two-dimensional frame of reference, and (ii) for each point of the
second subset, depth data derived from the corresponding position
data; receiving a viewpoint position from a virtual reality
display; selecting at least a portion of the primary image data and
the secondary image data based on the viewpoint position; and
rendering the selected primary and secondary image data on the
virtual reality display.
16. A client computing device, comprising: a memory; a network
interface; and a processor interconnected with the memory and the
network interface, the processor configured to: obtain primary
image data containing (i) a first projection of a first subset of
points in a three-dimensional point cloud into a two-dimensional
frame of reference, and (ii) for each point of the first subset,
depth data derived from corresponding position data of the points;
obtain secondary image data containing (i) a second projection of a
second subset of the points into the two-dimensional frame of
reference, the second projection overlapping with at least part of
the first projection in the two-dimensional frame of reference, and
(ii) for each point of the second subset, depth data derived from
the corresponding position data; receive a viewpoint position from
a virtual reality display; select at least a portion of the primary
image data and the secondary image data based on the viewpoint
position; and render the selected primary and secondary image data
on the virtual reality display.
Description
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims priority from PCT patent application
no. PCT/CA2015/000306, filed May 13, 2015 and entitled "Method,
System And Apparatus For Generation And Playback Of Virtual Reality
Multimedia", which is incorporated herein by reference.
FIELD
[0002] The specification relates generally to processing techniques
for multimedia data, and specifically to the generation,
transmission and rendering of virtual reality multimedia.
BACKGROUND
[0003] Virtual reality display devices, such as the GearVR and the
Oculus Rift, enable viewing of content such as video, games and the
like in a virtual reality environment, in which the display adapts
to the user's movements. Various challenges confront
implementations of virtual reality display. For example,
particularly in the case of captured video, capturing video from a
sufficient variety of viewpoints to account for potential movements
of the operator of the display can be difficult, particularly for
large or complex scenes. In addition, the resulting volume of
captured data can be large enough to render storing, transmitting
and processing the data prohibitively costly in terms of
computational resources.
SUMMARY
[0004] According to an aspect of the specification, a method of
generating virtual reality multimedia data is provided, comprising:
obtaining point cloud data at a processor of a generation computing
device, the point cloud data including colour and three-dimensional
position data for each of a plurality of points corresponding to
locations in a capture volume; generating, at the processor,
primary image data containing (i) a first projection of a first
subset of the points into a two-dimensional frame of reference, and
(ii) for each point of the first subset, depth data derived from
the corresponding position data; generating, at the processor,
secondary image data containing (i) a second projection of a second
subset of the points into the two-dimensional frame of reference,
the second projection overlapping with at least part of the first
projection in the two-dimensional frame of reference, and (ii) for
each point of the second subset, depth data derived from the
corresponding position data; and storing the primary image data and
the secondary image data in a memory connected to the
processor.
BRIEF DESCRIPTIONS OF THE DRAWINGS
[0005] Embodiments are described with reference to the following
figures, in which:
[0006] FIG. 1 depicts a system for generating, transmitting and
rendering virtual reality multimedia data, according to a
non-limiting embodiment;
[0007] FIG. 2 depicts a method of generating, transmitting and
rendering virtual reality multimedia data, according to a
non-limiting embodiment;
[0008] FIGS. 3A and 3B depict a capture volume and point cloud data
generated by the method of FIG. 2, according to a non-limiting
embodiment;
[0009] FIG. 4 depicts example capture apparatuses of the system of
FIG. 1, according to a non-limiting embodiment;
[0010] FIG. 5 depicts a method of obtaining point cloud data,
according to a non-limiting embodiment;
[0011] FIG. 6 depicts a method of generating primary and secondary
image data, according to a non-limiting embodiment;
[0012] FIGS. 7A and 7B depict an implementation of cube mapping in
the method of FIG. 2, according to a non-limiting embodiment;
[0013] FIG. 8 depicts primary image data generated by the method of
FIG. 2, according to a non-limiting embodiment;
[0014] FIGS. 9A and 9B depict secondary image data generated by the
method of FIG. 2, according to a non-limiting embodiment;
[0015] FIGS. 10A and 10B depict secondary image data generated by
the method of FIG. 2, according to another non-limiting
embodiment;
[0016] FIG. 11 depicts an example data structure for the secondary
image data, according to a non-limiting embodiment;
[0017] FIG. 12 depicts a method of generating index data at a
generation device, according to a non-limiting embodiment;
[0018] FIG. 13A and 13B depict an example performance of blocks
1210 and 1215 of the method of FIG. 12, according to a non-limiting
embodiment; and
[0019] FIG. 14 depicts a rendering index generated at the
generation device of FIG. 1, according to a non-limiting
embodiment.
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0020] FIG. 1 depicts a system 100 for generation, transmission and
rendering of virtual reality multimedia data. In the examples
discussed herein, the multimedia data includes image data, and
preferably video data (i.e. sequences of images). The video data
can be accompanied by audio data, but the generation and subsequent
processing of audio data is not of particular relevance to the
present disclosure, and is therefore not discussed in further
detail. As will become apparent throughout the discussions below,
the virtual reality multimedia data herein is distinguished from
conventional two-dimensional image or video data in that the
virtual reality multimedia data simulates the physical presence of
a viewer within the volume (also referred to as a scene) depicted
by the multimedia data. Thus, for example, movement of the viewer's
head can be tracked and used to update the appearance of the
multimedia data to simulate three-dimensional movement of the
viewer within the depicted volume.
[0021] System 100 includes a generation computing device 104, also
referred to herein as generation device 104. Generation device 104,
as will be discussed in detail below, is configured to generate the
above-mentioned virtual reality multimedia data for transmission
to, and rendering at, a client computing device 108, also referred
to herein as client device 108. Client device 108 is configured to
receive the virtual reality multimedia data generated by generation
device 104, and to render (that is, play back) the virtual reality
multimedia data. The virtual reality multimedia data can be
transferred between generation device 104 and client device 108 in
a variety of ways. For example, the multimedia data can be
transmitted to client device 108 via a network 112. Network 112 can
include any suitable combination of wired and wireless networks,
including but not limited to a Wide Area Network (WAN) such as the
Internet, a Local Area Network (LAN) such as a corporate data
network, cell phone networks, WiFi networks, WiMax networks and the
like.
[0022] Transmission of the multimedia data to client device 108 via
network 112 need not occur directly from generation device 104. For
example, the multimedia data can be transmitted from generation
device 104 to an intermediate device via network 112, and
subsequently to client device 108. In other embodiments, the
multimedia data can be sent from generation device 104 to a
portable storage medium (e.g. optical discs, flash storage and the
like), and the storage medium can be physically transported to
client device 108.
[0023] Generation device 104 can be based on any suitable computing
environment, such as a server or personal computer. In the present
example, generation device 104 is a desktop computer housing one or
more processors, referred to generically as a processor 116. The
nature of processor 116 is not particularly limited. For example,
processor 116 can include one or more general purpose central
processing units (CPUs), and can also include one or more graphics
processing units (GPUs). The performance of the various processing
tasks discussed herein can be shared between such CPUs and GPUs, as
will be apparent to a person skilled in the art.
[0024] Processor 116 is interconnected with a non-transitory
computer readable storage medium such as a memory 120. Memory 120
can be any suitable combination of volatile (e.g. Random Access
Memory ("RAM")) and non-volatile (e.g. read only memory ("ROM"),
Electrically Erasable Programmable Read Only Memory ("EEPROM"),
flash memory, magnetic computer storage device, or optical disc)
memory. In the present example, memory 120 includes both a volatile
memory and a non-volatile memory. Processor 116 and memory 120 are
generally comprised of one or more integrated circuits (ICs), and
can have a wide variety of structures, as will now be apparent to
those skilled in the art.
[0025] Generation device 104 can also include one or more input
devices 124 interconnected with processor 116. Input device 124 can
include any suitable combination of a keyboard, a mouse, a
microphone, and the like. Such input devices are configured to
receive input and provide data representative of such input to
processor 116. For example, a keyboard can receive input from a
user in the form of the depression of one or more keys, and provide
data identifying the depressed key or keys to processor 116.
[0026] Generation device 104 can also include one or more output
devices interconnected with processor 116, such as a display 128
(e.g. a Liquid Crystal Display (LCD), a plasma display, an Organic
Light Emitting Diode (OLED) display, a Cathode Ray Tube (CRT)
display). Other output devices, such as speakers (not shown), can
also be present. Processor 116 is configured to control display 128
to present images to an operator of generation device 104.
Generation device 104 also includes one or more network interfaces
interconnected with processor 116, such as a network interface 132,
which allows generation device 104 to connect to other computing
devices (e.g. client device 108) via network 112. Network interface
132 thus includes the necessary hardware (e.g. radios, network
interface controllers, and the like) to communicate over network
112.
[0027] As noted above, generation device 104 is configured to
generate the multimedia data to be provided to client device 108.
To that end, generation device is connected to, or houses, or both,
one or more sources of data to be employed in the generation of
virtual reality multimedia data. The sources of such raw data can
include a multimedia capture apparatus 134. In general, capture
apparatus 134 captures video (with or without accompanying audio)
of an environment or scene and provides the captured data to
generation device 104. Capture apparatus 134 will be described
below in greater detail. The sources of raw data can also include,
in some embodiments, an animation application 135 (e.g. a
three-dimensional animation application) stored in memory 120 and
executable by processor 116 to create the raw data. In other words,
the virtual reality multimedia data can be generated from raw data
depicting a virtual scene (via application 135) or from raw data
depicting a real scene (via capture apparatus 134).
[0028] Client device 108 can be based on any suitable computing
environment, such as a personal computer (e.g. a desktop or laptop
computer), a mobile device such as a smartphone, a tablet computer,
and the like. Client device 108 includes a processor 136
interconnected with a memory 140. Client device 108 can also
include an input device 144, a display 148 and a network interface
152. Processor 136, memory 140, input device 144, display 148 and
network interface 152 can be substantially as described above in
connection with the corresponding components of generation device
108. As will be discussed in greater detail below, in some
embodiments the components of client device 108, although
functionally similar to those of generation device 104, may have
limited computational resources relative to generation device 104.
For example, processor 136 can include a CPU and a GPU that, due to
power, thermal envelope or physical size constraints (or a
combination thereof), are able to process a smaller volume of data
in a given time period than the corresponding components of
generation device 104. As noted in connection with generation
device 104, the CPU and GPU of client device 108 (collectively
referred to as processor 136) can share computational tasks between
them, as will be apparent to those skilled in the art. In certain
situations, however, as will be described below, specific
computational tasks are assigned specifically to one or the other
of the CPU and the GPU.
[0029] In addition, system 100 includes a virtual reality display
156 connected to processor 136 of client device 108 via any
suitable interface. Virtual reality display 156 includes any
suitable device comprising at least one display and a mechanism to
track movements of an operator. For example, virtual reality
display 156 can be a head-mounted display device with head
tracking, such as the Oculus Rift from Oculus VR, Inc. or the Gear
VR from Samsung. Virtual reality display 156 can include a
processor, memory, communication interfaces, displays and the like
beyond those of client device 108, in some embodiments. In other
embodiments, certain components of client device 108 can act as
corresponding components for virtual reality display 156. For
example, the above-mentioned Gear VR device mounts a mobile device
such as a smart phone, and employs the display (e.g. display 148)
and processor (e.g. processor 136) of the smart phone. In any
event, client device 108 is configured to control virtual reality
display 156 to render the virtual reality multimedia received from
generation device 104.
[0030] In general, generation device 104 is configured, via the
execution by processor 116 of a virtual reality data generation
application 160 consisting of computer readable instructions
maintained in memory 120, to receive source data (also referred to
as raw data) from capture apparatus 134 or application 135 (or a
combination thereof), and to process the source data to generate
virtual reality multimedia data packaged for transmission to client
device 108. Client device 108, in turn, is configured via the
execution by processor 136 of a virtual reality playback
application 164 consisting of computer readable instructions
maintained in memory 140, to receive the virtual reality multimedia
data generated by generation device 104, and process the virtual
reality multimedia data to render a virtual reality scene via
virtual reality display 156. Those skilled in the art will
appreciate that in some embodiments, the functionality of the
above-described applications (e.g. applications 135, 160 and 164)
may be implemented using pre-programmed hardware or firmware
elements (e.g., application specific integrated circuits (ASICs),
electrically erasable programmable read-only memories (EEPROMs),
etc.), or other related components.
[0031] Turning now to FIG. 2, the generation, transmission and
rendering of multimedia data mentioned above will be described in
further detail in connection with a method 200. Method 200 will be
described in conjunction with its performance in system 100;
specifically, certain blocks of method 200 are performed by
generation device 104, while other blocks of method 200 are
performed by client device 108, as illustrated. It is contemplated
that method 200 can also be performed by other suitable
systems.
[0032] Beginning at block 205, generation device 104 is configured
to obtain point cloud data. The point cloud data includes colour
and three-dimensional position data for each of a plurality of
points corresponding to locations in a capture volume. An example
illustration of point cloud data obtained at block 205 is shown in
FIGS. 3A and 3B. FIG. 3A depicts an object 300 in a capture volume
304 to be represented in virtual reality multimedia data. FIG. 3B
depicts point cloud data 306 representing capture volume 304. That
is, the outer boundaries of point cloud data 306 reflect the outer
boundaries of capture volume 304, such that point cloud data 306
can represent any object within capture volume 304. As seen in FIG.
3B, object 300 is represented in point cloud data 306 by a
plurality of points 308 (two of which are labelled). More
generally, any object visible within capture volume 304 is depicted
in point cloud data 306; the only points visible in point cloud
data 306 are those defining object 300, because for illustrative
purposes it has been assumed that object 300 is the only object
present within capture volume 304.
[0033] As noted above, each point 308 in point cloud data 306
includes colour data and three-dimensional position data. The
colour data indicates the colour of the point 308 in any suitable
representation of any suitable colour model (e.g. RGB, CMYK, HSV,
HSL, YUV and the like). The position data indicates the position of
the point 308 within point cloud 306, and thus corresponds to a
certain location in capture volume 304. The nature of the position
data is also not particularly limited. For example, the position
data can be in the form of a set of Cartesian coordinates (e.g.
distances along x, y, and z axes that intersect at the center of
point cloud data 306). In another example, the position data can be
in the form of spherical coordinates (e.g. a radial distance, a
polar angle and an azimuthal angle, all relative to a center of
point cloud data 306).
[0034] The point cloud data obtained at block 205 can be stored in
any of a variety of data structures, including, for example, a
table containing a plurality of records, each corresponding to one
point 308 and containing the colour data and position data for that
point. A variety of other data structures will also occur to those
skilled in the art.
[0035] The manner in which point cloud data 306 is obtained at
block 205 is not particularly limited. As noted earlier, point
cloud data can be generated by generation device 104 via the
execution of animation application 135, in which case obtaining
point cloud data 306 can include retrieving the point cloud data
from memory 120. In other embodiments, in which capture volume 304
is a volume of real space (rather than a virtual volume generation
via application 135), obtaining point cloud data at block 205
includes receiving and processing data from capture apparatus 134.
A description of capture apparatus 134 itself follows, with
reference to FIG. 4.
[0036] Capture apparatus 134 includes a plurality of capture nodes
arranged in or around capture volume 304. Each node, placed in a
distinct position from the other nodes, generates colour and depth
data for a plurality of points in its field of view. In the present
example, the field of view for each node is about three hundred and
sixty degrees by about three hundred sixty degrees (that is, each
node captures data in a full sphere). However, in other embodiments
nodes may have reduced fields of view. The nature of the nodes is
not particularly limited. For example, each node can include a
camera and a depth sensor (e.g. a lidar sensor). In some
embodiments, each node may include a plurality of cameras and depth
sensors to achieve the above-mentioned field of view. An example of
a device that may be employed for each node is the Bublcam by Bubl
Technology Inc. of Toronto, Canada.
[0037] A wide variety of node arrangements may be employed to
capture the raw data to be processed by generation device 104 in
order to obtain point cloud data 306 at block 205. In general,
greater numbers of nodes allow for a greater level of detail to be
captured, particularly in complex scenes. Examples of presently
preferred configurations of nodes for capture apparatus 134 are
discussed below.
[0038] FIG. 4 illustrates three non-limiting examples of multi-node
capture apparatuses 134, indicated as 134a, 134b and 134c. Setup
500 has a tetrahedral shape, setup 504 has the shape of a
triangular prism, and setup 508 has an octahedral shape. The
capture volume 304 is also illustrated as a dashed-line sphere
around each arrangement (although the actual size of capture volume
304 may be larger or smaller than shown in relation to apparatuses
134a, 134b, 134c). Each arrangement includes a plurality of capture
nodes including a central node x and peripheral nodes a, b, c, d,
as well as (for apparatuses 134b and 134c) e and f.
[0039] The arrangements of capture apparatus 134 illustrated in
FIG. 4 create safe movement zones within capture volume 304. A safe
movement zone describes a volume around the center of capture
volume 304 (i.e. the location of nodes x in FIG. 4) within which
the resulting point cloud data 306 maintains continuity with
capture volume 304. In other words, virtual reality display 156
will be able to simulate movement of the operator within this safe
zone with substantially all rotations and positions in the volume
supported. Conversely, outside of the safe movement zone, the
likelihood of objects in capture volume 304 being incompletely
captured in point cloud data 306 (because the objects are visible
by too few nodes) increases.
[0040] Returning briefly to FIG. 2, at block 205 the process of
obtaining point cloud data can therefore include receiving the
point cloud data from capture apparatus 134. When (as shown in FIG.
4) capture apparatus 134 includes a plurality of nodes, generation
device 104 can be configured to receive point cloud data 306 in a
form that requires no further processing. In other embodiments,
generation device 104 can receive raw data in the form of a
plurality of point clouds from capture apparatus 134, and process
the raw data to generate point cloud data 306, as discussed below
in connection with FIG. 5.
[0041] FIG. 5 depicts a method 500 of generating point cloud data
(e.g. as part of the performance of block 205 of method 200). At
block 505, generation device 104 is configured to receive raw point
cloud data from each node in capture apparatus 134. As will be
apparent to those skilled in the art from FIG. 4, each node in any
given capture setup can generate point cloud data for at least a
portion of capture volume 304.
[0042] At block 510, generation device 104 is configured to
register the raw point cloud data received at block 505 to a common
frame of reference (i.e. the same coordinate space). For example,
each node of capture apparatus 134 can be configured to generate
point cloud data in which each point has coordinates (either
Cartesian or spherical, as mentioned earlier) centered on the node
itself. With the relative locations of the nodes being known, the
point cloud data from any given node can be transformed via
conventional techniques to a frame of reference centered on the
center of capture volume 304.
[0043] It will now be apparent that when the sets of raw point
cloud data are registered to a common frame of reference, a number
of locations within capture volume 304 may be represented multiple
times within the co-registered point cloud data. That is, more than
one node may capture the same location in capture volume 304.
Generation device 104 is therefore configured to collapse fully or
partially any overlapping points in the co-registered point cloud
data to a smaller number of points, as discussed below.
[0044] At block 515 generation device 104 is configured to
determine, for each point in the co-registered point cloud data,
whether the point overlaps (either exactly or partially) with other
points in the common frame of reference. When the determination is
negative, generation device 104 proceeds to block 520, at which the
co-registered point cloud data is updated with no change being made
to the non-overlapping points (in other words, the update may be a
null update). When the determination at block 515 is affirmative
for any points, however, generation device 104 can be configured to
perform block 525. At block 525, generation device 104 is
configured to determine whether the difference in colour between
the overlapping points identified at block 515 is greater than a
predetermined threshold. That is, if different nodes record
significantly different appearances for the same location in
capture volume 304, that is an indication that the capture volume
includes surfaces that are highly reflective, specular or the
like.
[0045] When the determination at block 525 is negative (e.g. the
differences in colour for overlapping points are non-existent or
below the above-mentioned threshold), generation device 104
proceeds to block 520 and updates the co-registered point cloud by
replacing the overlapping points with a single point. The single
point can have a colour value equivalent to an average of the
colour values of the original overlapping points, for example.
[0046] When the determination at block 525 is affirmative, however,
generation device 104 can be configured to create a palette image
containing a subset, or all, of the colour values from the
overlapping points. A palette image stores a plurality of possible
colours for a single point in the co-registered point cloud. The
palette image preferably stores possible colours in a
two-dimensional array. The colour at the center of the palette
image corresponds to the colour of the point when viewed from the
center of the point cloud, and colours spaced apart from the center
of the palette image in varying directions and at varying distances
correspond to the colour of the point when viewed from
corresponding directions and distances from the center of the point
cloud. In some embodiments, rather than full colour values, the
palette image can store only luminance or intensity values, while
chrominance or other colour values can be stored in the point
itself (along with a reference to the palette image).
[0047] At block 520, generation device 104 is then configured to
update the co-registered point cloud with an index value pointing
to the palette image (which can be stored separately from point
cloud data 306), in place of a colour value. In some embodiments,
the performance of blocks 525 and 530 can be omitted.
[0048] Returning to FIG. 2, once generation device 104 has obtained
point cloud data 306, generation device 104 is configured to
perform block 210 of method 200. At block 210, generation device
104 is configured to generate primary image data. The primary image
data includes two components: (i) a projection of a first subset of
the points defined in point cloud data 306 into a two-dimensional
frame of reference (from the three-dimensional frame of reference
of point cloud data 306); and (ii) for each point of the
above-mentioned subset, depth data derived from the corresponding
position data (in point cloud data 306) for that point.
[0049] In brief, the primary image data generated at block 210
depicts the portions of point cloud data 306 that are visible from
a predicted viewpoint of virtual reality display 156. In other
words, the primary image data depicts the portions of point cloud
data 306 that are expected to be initially visible to the operator
of virtual reality display 156. As will now be apparent, from any
given viewpoint within point cloud data 306, any object may be
occluded by other objects or by the object itself (e.g. the rear
surface of an object may be occluded from view by the remainder of
that same object). The above-mentioned subset of points in the
primary image data correspond to the portions of point cloud data
306 that are visible from the initial viewpoint. Other points in
point cloud data 306 that are not visible from the initial
viewpoint are not included in the subset.
[0050] In general, generation device 104 is configured to generate
the primary image data by selecting the above-mentioned subset of
points, and for each of the subset of points, determining a
projected location in a two-dimensional frame of reference for that
point, along with accompanying depth data. An example
implementation of block 210 will be discussed below, following the
discussion of block 215.
[0051] At block 215, generation device 104 is configured to
generate secondary image data. The secondary image data includes a
projection (distinct from the projection mentioned above in
connection with the primary image data) of a second subset of the
points in point cloud data 306 into the two-dimensional frame of
reference mentioned above. The second subset of points is distinct
from the subset of points represented by the primary image data.
More specifically, the second subset of points, when projected into
the two-dimensional frame of reference, overlaps with at least part
of the projection in the primary image data. That is, each of the
second subset of points, when projected, occupies a location in the
two-dimensional frame of reference that matches the location (in
that same frame of reference) of a point in the first subset. The
secondary image data also includes, for each point of the second
subset, depth data derived from the corresponding position data of
that point in point cloud data 306.
[0052] In contrast to the primary image data, the secondary image
data depicts the portions of point cloud data 306 that are not
visible from a predicted initial viewpoint established by virtual
reality display 156. Instead, the secondary image data depicts
portions of point cloud data 306 that are initially occluded by the
primary image data, but may become visible due to movement of the
viewpoint through manipulation of virtual reality display 156 by an
operator.
[0053] As with the primary image data, generation device 104 is
configured to generate the secondary image data by selecting the
above-mentioned second subset of points, and for each of the second
subset of points, determining a projected location in a
two-dimensional frame of reference for that point, along with
accompanying depth data. In the present example, generation device
104 is configured to perform blocks 210 and 215 in parallel (that
is, substantially simultaneously) according to the process depicted
in FIG. 6.
[0054] Referring now to FIG. 6, a method 600 of generating primary
and secondary image data at generation device 104 (i.e. a method of
performing blocks 210 and 215 of method 200) is depicted. Beginning
at block 605, generation device 104 is configured to select a
viewpoint within the volume depicted by point cloud data 306 (in
other words, within capture volume 304). The selection of a
viewpoint is the predicted starting location of the viewer, as
detected by the virtual reality display 156. For example, the
centre of point cloud data 306 may be selected as the
viewpoint.
[0055] At block 610, generation device 104 is configured to select
a vector (also referred to as a path) for processing. In the
example above, in which point cloud data 306 defines a spherical
volume (i.e. defined by spherical coordinates), the selection of a
vector at block 610 comprises selecting azimuthal and polar angles.
In general, at block 610 generation device 104 selects a path
extending from the viewpoint selected at block 605, but does not
select a depth (e.g. a radial distance when using spherical
coordinates) corresponding to that path.
[0056] At block 615, generation device 104 is configured to
identify the first point in point cloud data 306 that is visible to
the selected viewpoint along the selected path or vector. That is,
travelling along the selected path from the selected viewpoint, the
first point in point cloud data 306 that the selected path
intersects is identified, and projected into a two-dimensional
frame of reference. Projection at block 615 includes determining
two-dimensional coordinates, such as an x and a y coordinate,
corresponding to the first visible point in a previously selected
two-dimensional frame of reference. The projection can also include
determining a depth for the first visible point, which defines the
distance (generally in scalar form) from the viewpoint to the first
visible point.
[0057] A wide variety of two-dimensional frames of reference may be
employed at block 615. In the present example, the two-dimensional
frame of reference is a cube map. Various features of cube maps,
and various techniques for projecting points in three-dimensional
space onto two-dimensional faces of cube maps, will be familiar to
those skilled in the art. To illustrate the application of cube
mapping to the present disclosure, reference is now made to FIG.
7A.
[0058] FIG. 7A depicts a viewpoint 700 centered within a cube 704
having six faces: an upper face 708, a lower face 712, a right face
716, a left face 720, a front face 724 and a rear face 728. In
general, determining two-dimensional coordinates corresponding to a
point 732 in three-dimensional space involves projecting point 732
towards viewpoint 700 until the path of projection intersects with
one of the faces of cube 704 (in the example shown in FIG. 7A, the
projection path intersects with right face 716). The location
(defined by a horizontal coordinate and a vertical coordinate, e.g.
x and y) of the intersection is the two-dimensional projection of
point 732. Thus, any number of points in three-dimensional space
can be represented on a two-dimensional plane, in the form of one
of the faces of cube 704.
[0059] Referring now to FIG. 7B, an example of cube map projection
as applied to point cloud data 306 is illustrated. In particular, a
path 736 selected at block 610 is shown extending from a viewpoint
(selected at block 605) centered on a cube. The first point in
point cloud data 306 that is intersected by path 736 is point 308a.
The two-dimensional coordinates of the projection of point 308a are
the coordinates on the face of the cube intersected by path 736
during its travel from the viewpoint to point 308a. Generation
device 104 is also configured to determine the length of path 736
between the viewpoint and point 308a, based on the positions in
three dimensional space of the viewpoint and point 308a. The length
of path 736 represents a depth value corresponding to the
two-dimensional projection of point 308a.
[0060] Returning to FIG. 6, at block 620 generation device 104 is
configured to determine whether any additional points intersect
with the path selected at block 610 (that is, after the point
projected at block 615). Referring again to FIG. 7B, path 736
intersects only one point (point 308a). However, another path 740
intersects a point 308b on one face of object 300 before traversing
object 300 and intersecting another point 308c on another face of
object 300. Thus, for path 736 the determination at block 620 would
be negative, but for path 740 the determination would be
affirmative. Such additional points are also referred to as "fold"
points, and generally represent locations in point cloud data 306
that require movement of the initial viewpoint in order to become
visible.
[0061] When the determination at block 620 is affirmative,
generation device 104 is configured to determine two-dimensional
coordinates and depth data for any additional points along the
selected path at block 625. As will now be apparent, the
two-dimensional coordinates for the additional points are identical
to those of the first visible point, and thus there is no need to
repeat projection calculations at block 625. Instead, only the
depth of such additional points needs to be determined.
[0062] At block 630, generation device 104 is configured to
determine whether any additional paths remain to be processed.
Generation device 104 is configured to process a plurality of paths
to generate the primary and secondary image data. The number of
paths to be processed is set based on the desired resolution of the
primary and secondary image data--a greater number of paths (i.e. a
higher-resolution sampling of point cloud data 306) leads to higher
resolution image data. When paths remain to be processed, the
performance of method 600 returns to block 610 to select the next
path. When all paths have been processed, the performance of method
600 instead proceeds to block 635.
[0063] At block 635, generation device 104 is configured to store
the first visible point projections and corresponding depth values
as primary image data, and at block 640, generation device 104 is
configured to store the additional point projections and
corresponding depth values as secondary image data. Blocks 635 and
640 need not be performed separately after a negative determination
at block 630. Instead, the storage operations at blocks 635 and 640
can be integrated with blocks 615 and 625, respectively.
[0064] The storage operations of blocks 635 and 640 will be
described in greater detail in conjunction with FIGS. 8 and 9A-9B.
Referring now to FIG. 8, an example of primary image data is shown
in the form of two packages of data 800 and 804, such as image
files. Although the term "file" is used herein to discuss image
data stored in packages 800 and 804 as well as other packages to be
introduced, the files discussed herein can be stored in other types
of packages, including streams of data, blocks of memory in a CPU
or GPU, and the like. File 800 contains the colour data for the
primary image data, and file 804 contains the depth data for the
primary image data. Each file is divided into regions corresponding
to faces of cube 704, described above. The regions of files 800 and
804 are arranged to represent the unfolding of cube 704 into a
cross shape, and the relocation of rear face 728 and lower face 712
to transform the cross shape into a rectangular shape that
corresponds more closely with conventional image formats. Any
arrangement of the faces of cube 704 can be employed in files 800
and 804, however. In general, files 800 and 804 use the same
arrangement of faces; in some embodiments, however, different
arrangements may be employed for each of files 800 and 804, the
added requirement of storing data (for example, in an index or
other metadata to be discussed below) defining the concordance
between files 800 and 804.
[0065] Each of files 800 and 804 consists of a two-dimensional
array. In the case of file 800, the two-dimensional array is an
array of pixels, each storing colour data in any suitable format
(e.g. HSV). Thus, as illustrated in FIG. 8, the regions
corresponding to face 716 contain colour data and depth data for a
subset of the points representing object 300 (specifically, the
subset visible from viewpoint 700), while the remaining faces are
blank (e.g. contain null values), as no other objects exist in the
simplified example capture volume 304 discussed herein. Although
faces 716 are shown populated with an image for illustrative
purposes, files 800 and 804 are generally implemented with arrays
of numeric values (e.g. triplets of values in each pixel of file
800, and a single depth value in each pixel of file 804). It is
contemplated that the pixels of files 800 and 804 do not contain
any positional data. Rather, such positional data is implicit in
the position of the pixels in the above-mentioned two-dimensional
array.
[0066] Turning to FIG. 9A, in some embodiments the secondary image
data (that is, the fold points, or folds) can be stored in two
files 900 and 904. File 900 contains colour data for each of the
additional points detected and projected at blocks 620 and 625,
while file 904 contains depth data for each of the additional
points. In addition, files 900 and 904 preferably have the same
dimensions as files 800 and 804 described above. As with files 800
and 804, files 900 and 904 include a plurality of pixels in a
predefined two-dimensional array, with each pixel containing either
a null value, or colour values (for file 900) or a depth value (for
file 904). Files 900 and 904 are divided into the same regions as
files 800 and 804 (corresponding to the faces of cube 704). This
technique of storing secondary image data may also be referred to
as regional fold data.
[0067] As seen in FIG. 9A, the region of files 900 and 904
corresponding to face 716 of cube 704 depicts a different portion
of object 300 than files 800 and 804. Specifically, files 800 and
804 depict the "top" of object 300, which includes point 308c,
labelled in FIG. 7B. As discussed earlier, and as shown in FIG. 8,
the top of object 300 is not visible in the primary image data.
Instead, the top of object 300 is "behind" the portion of object
300 shown in files 800 and 804, from the perspective of viewpoint
700.
[0068] It will now be apparent that the "back" of object 300 is
also not visible in the primary image data. In some examples, the
back of object 300 would therefore also be depicted in files 900
and 904. In the present example, however for illustrative purposes,
the back of object 300 has been omitted from files 900 and 904.
More specifically, generation device 104 can be configured, at
block 625, to determine whether a fold point is within a predicted
range of motion of the viewpoint selected at block 605 before
projecting the fold point. That is, generation device 104 can store
a predicted maximum travel distance for viewpoint 700, and omit
fold points entirely if such points would only become visible if
the viewpoint moved beyond the maximum travel distance. In the
presently preferred embodiment, however, such determinations are
omitted from the generation of secondary image data, and instead
addressed at the rendering stage, to be discussed further
below.
[0069] In the example shown in FIGS. 7B and 9A, only one additional
layer of points is present in point cloud data 306 behind the
points of the primary image data. All fold points can therefore be
readily represented in files 900 and 904 in their "true" locations.
However, in more complex point cloud data, further fold points may
be detected at blocks 620 and 625 behind other a first layer of
fold points (in much the same way as the fold points shown in FIG.
9A lie "behind", or overlap, the primary points shown in FIG. 8).
In such embodiments, generation device 104 can create a further
pair of files 800 and 804 for each deeper layer of fold points.
Preferably, however, only a single pair of files 800 and 804 are
employed by generation device 104 to store all fold data.
Therefore, generation device 104 is configured to detect collisions
between fold points--instances in which multiple fold points have
the same two-dimensional projections. Generation device 104 is
configured to store such colliding fold points in a manner
described below in connection with FIGS. 10A and 10B.
[0070] FIG. 10A depicts viewpoint 700 and cube 704, with a path
extending from right face 716 and intersecting with three points A,
B and C in point cloud data 306. According to the process shown in
FIG. 6, point A is stored as primary image data, and points B and C
are stored as secondary image data. However, the two-dimensional
projections of points B and C have the position, and thus the point
collide in files 900 and 904. Generation device 104 is therefore
configured, as shown in FIG. 10B, to store point B in a file 900',
in the actual position of the two-dimensional projection of all
three points. Point C, meanwhile, is offset from point A in the
two-dimensional array. In other words, generation device 104 is
configured to detect collisions between fold points, and when a
collision is detected, generation device 104 is configured to
search the vicinity of the collision location for an unused pixel,
and to store the colliding point in the unused pixel. Generation
device 104 is also configured to store an offset for the colliding
point (point C, in the present example). The offset can be stored
in file 900 or 904 themselves (e.g. in a header or other metadata
segment of the files, or in the UV--chrominance--portion of the
relevant depth file, as depth is a single value and can thus be
stored in the Y--luminance--portion of the file), or in a separate
index file (e.g. containing the position of point C in file 900',
and an offset vector or coordinate pair indicating the true
position of point C). Substantial portions of files 900 and 904 may
be empty, depending on the complexity of the scene, and thus the
above process may make more efficient use of storage space than
generating a plurality of layers of files 900 and 904.
[0071] Returning to FIG. 9B, in other embodiments the secondary
image data can be stored in files 908 and 912 rather than in files
900 and 904. Files 908 and 912 have different data structures than
files 900 and 904, as will be discussed below. In the present
embodiment, files 908 and 912 have a predetermined height "H", but
a variable length "L" (in contrast with files 800 and 804, as well
as 900 and 904, which have predetermined height and length). In
general, the length L is determined by the volume of data to be
stored in files 908 and 912, which varies based on the number of
additional points identified and projected at blocks 620 and 625.
Neither H nor L need equal the height or length of files 800 and
804. File 908 contains colour data for each of the additional
points detected and projected at blocks 620 and 625, while file 912
contains depth data for each of the additional points.
[0072] While files 900 and 904 do not contain explicit position
data in the pixels thereof--since such position data is implicit in
the pixel array--files 908 and 912 do contain such position data,
indicating the position of the colour and depth values of files 908
and 912 within the array of files 800 and 804. This is because the
dimensions of files 908 and 912 generally do not match those of
files 800 and 804, and thus the position of a data point within
file 908 or 912 may not imply a specific position in the array of
files 800 and 804. Generation device 104 is configured to perform
various processing activities to reduce the volume of position data
stored in files 908 and 912.
[0073] In general, generation device 104 is configured to identify
portions of the two-dimensional frame of reference (that is, the
two-dimensional array according to which files 800, 804, 900 and
904 are formatted) that are occupied at least to a threshold
fraction by fold points. For any such portions that are identified,
generation device 104 is configured to select geometric parameters
identifying the portion, and store the geometric parameters along
with the colour or depth data for the fold points within the
portion (absent individual positional data for each fold point) in
files 908 and 912. In other words, generation device 104 is
configured to group the fold points into portions such that the
locations of those fold points within the two-dimensional array can
be represented with a volume of data that does not exceed--and is
preferably smaller than--the volume of data required to store the
individual coordinates of each fold point.
[0074] Generation device 104 can be configured to identify portions
of a variety of different types. For example, generation device 104
can be configured to identify any one of, or any combination of,
straight lines, curved lines, polygons, circles and the like.
Generation device 104 is configured to select a portion, determine
the total number of available positions in the two-dimensional
array that are contained by that portion, and determine whether at
least a threshold fraction of the positions within the portion
contain fold data. The threshold fraction can be preconfigured at
generation device, or can be determined dynamically based on the
selected portion. When the determination is negative (i.e. too few
fold points are present within the portion), generation device 104
is configured to select and evaluate a different portion according
to the above process. When the determination is affirmative,
however, generation device 104 is configured to store geometric
parameters corresponding to the portion, as well as colour data or
depth data for each fold point within the portion, in files 908 and
912. Having stored the geometric parameters and corresponding
colour and depth data, generation device 104 is configured to
repeat the above process on the remaining fold points (that is,
those not yet assigned to portions) until all fold point data has
been stored.
[0075] Turning to FIG. 11, an example of the above process is
illustrated. An array 1100 is shown having the dimensions of files
800 and 804, and divided into regions corresponding to the faces of
cube 704. Face 716 contains fold points defining a polygon
representing the top of object 300, as discussed above, while face
720 contains two fold points (whose size is exaggerated for
illustrative purposes). The points in face 720 are not present in
capture volume 304 or point cloud data 306, but rather are provided
in this particular example to illustrate the process of storing
secondary image data.
[0076] Generation device 104 may determine that a portion of face
720 in the shape of a line extending between the two points in face
720 encompasses both of those points. However, storing secondary
image data for such a line, as indicated at 1100, requires storing
geometric parameters such as a start location and an end location
(e.g. the locations of the two fold points themselves), as well as
colour (or depth) data for the entire line. As indicated by the
darker-coloured points, only two colour or depth values in the line
represent fold points. The remaining, light-coloured, points simply
contain null values. Thus, the total storage requirements for the
line are greater than simply storing the two points individually
with location data for each point. In other words, generation
device 104 may determine that the number of fold points on the line
is below a threshold at which the volume of data required to store
the line is lower than the volume of data required to store the
individual points. Generation device 104 therefore does not store
the line, and may instead select a different portion of face 720 to
test.
[0077] Referring to face 716, on the other hand, generation device
104 may determine that a polygon having corners at the corners of
the top of object 300 is entirely filled with fold point data.
Thus, face 716 can be represented in files 908 and 912 by
coordinates for the four corners, and a set of colour or depth data
without explicitly specified position data. As will now be
apparent, this requires less storage space than storing each
individual point in the polygon along with its individual
coordinates within the array.
[0078] As noted earlier, a variety of types of geometric structures
are contemplated for storing fold points. These include x-folds,
indicating horizontal lines extending across the entirety of either
a face of the array or the entire array; y-folds, indicating
vertical lines extending across the entirety of either a face of
the array or the entire array; partial x- or y-folds, indicating
horizontal and vertical lines, respectively, that extend only
partially across a face or array and thus are represented by start
and end point rather than a single x or y index value. The types of
geometric structures also include curved lines (e.g. defined by
start points, end points and radii), polygons (e.g. defined by
coordinates of the corners of the polygons), and angled lines (e.g.
defined by start and end points). Any points that cannot be
assigned to portions more efficiently than storing the points
individually (that is, any points for which the determinations
above remain negative after all other fold data has been stored)
can be stored as individual points, also referred to as
m-folds.
[0079] Returning to FIG. 9B, therefore, files 908 and 912 may have
various sections, distinguished from each other by header data or
tags indicating the start or end of each section, with each section
containing a certain type of fold. Thus, in the present example,
files 908 and 912 each contain a single section including header
data 916 and 920. Header data 916 corresponds to colour data 924,
and header data 920 corresponds to depth data 928. The header data
includes geometric parameters, such as the corners of the polygon
shown in FIG. 9A, and may also contain an identifier of the type of
fold that follows (e.g. a polygon rather than an x-fold or a
y-fold).
[0080] Returning to FIG. 6, the storage of primary and secondary
image data at blocks 635 and 640 may also include generating an
index file. The index file can contain the offset values mentioned
above in connection with FIG. 10B. The index file can also contain
data identifying the relative positions of the primary and
secondary image data. For example, when the secondary image data is
stored in accordance with the structure shown in FIG. 9B, the index
can contain one or more pairs indicating which locations in the
frame of reference of files 800 and 804 correspond to which
locations in files 908 and 912.
[0081] When the primary and secondary image data have been stored,
generation device 104 proceeds to the next frame at block 645. As
will now be apparent, the above process generates primary and
secondary image data for a single set of point cloud data 306,
which depicts a single frame (i.e. a still image or a frame of a
video). Method 600 can therefore be repeated for a plurality of
other frames when the virtual reality multimedia data includes
video data.
[0082] Variations to the processes described above for storing
primary and secondary image data are contemplated. For example,
rather than selecting the first visible point (i.e. the
"shallowest" point) at block 615, generation device can be
configured to select the deepest point at block 615, and to detect
additional points as those points that are in between the viewpoint
and the primary point rather than behind the primary point. In
further embodiments, other divisions of image data between the
primary image data and the secondary image data can be implemented.
For example, instead of dividing point cloud data 306 based on
visibility to viewpoint 700 as described above, the primary and
secondary image data can be selected based on a predetermined depth
threshold. That is, points located at a depth (from viewpoint 700)
greater than the threshold can be included in one of the primary
and secondary image data, while points located at a depth smaller
than the threshold are included in the other of the primary and
secondary image data. When this implementation is used, both
primary and secondary image data can be stored in structures
similar to that shown in FIG. 9A and 10B. In other words, both
colour and depth files for each of the primary and secondary image
data can include offsets to manage point collisions. In further
embodiments, the division of data between the primary and secondary
image data can be based on any suitable combination of factors,
including any one or more of surface flatness rather than depth
(i.e. based on variations of depth in the area of the points).
[0083] Various other data structures are also contemplated for
storing the primary and secondary image data. For example, files
800 and 804 can be subdivided into a plurality of files or other
package types, each file corresponding to a single face of cube
704. In further embodiments, individual files may be generated by
generation device 104 for each face of cube 704, but each file can
contain both colour and depth data rather than colour and depth
data being separated into distinct files. In such embodiments, the
above-mentioned index can also include data defining the relative
position of the face-specific files.
[0084] Further variations to the generation process are
contemplated. For example, generation device 104 can be configured
to employ depth files such as files 804 and 904 as intermediate
files, not sent to client device 108 but rather employed to
generate an index file. More specifically, generation device 104
can be configured to perform a method 1200, depicted in FIG. 12,
for generating the index file mentioned above. It is contemplated
that method 1200 can be performed for each of the primary and
secondary image data. Beginning at block 1210, generation device
104 is configured to identify a portion of the primary or secondary
image data (specifically, the depth data in files 804 or 904) to be
discarded. For example, generation device 104 can compare each
depth value to a predetermined threshold, to determine whether each
depth value is above (i.e. further from viewpoint 700) or below
(i.e. closer to viewpoint 700) the threshold. Other processes may
also be employed for selecting a portion of the depth data. The
depth threshold can be predetermined as an absolute value, or as a
fraction (e.g. 80%) of the maximum depth present in the primary and
secondary image data.
[0085] Having identified the above-mentioned portion, at block 1215
generation device 104 is configured to discard the identified depth
values. The corresponding colour data for the identified points is
retained, however. Thus, for certain points in files 800, 900 or
908, the corresponding depth values in files 804, 904 or 912,
respectively, are discarded. FIGS. 13A and 13B illustrate the
effect of the performance of block 1215. FIG. 13A depicts a
viewpoint 1300, assumed to be facing towards three cylinders 1302,
1304 and 1308 represented by primary and secondary image data
generated at generation device 104. The depth data corresponding to
cylinders 1304 and 1308 is beyond a depth threshold applied at
block 1210, and thus generation device 104 is configured to discard
the depth data associated with the points representing cylinders
1304 and 1308. As a result, the primary and secondary image data
retained for further processing following the performance of block
1215 is illustrated in FIG. 13B, in which cylinders 1304 and 1308
are represented simply as flattened cylinders (e.g. rectangles)
1312 and 1316 on a background plane 1320, which may for example be
at infinite depth.
[0086] Returning to FIG. 12, at block 1220, generation device 104
is configured to select an index of the retained depth data (i.e.
depth data not discarded at block 1215). More specifically, turning
to FIG. 14, generation device 104 is configured to generate an
index file 1400 to replace depth files 804 or 904. Index file 1400
is illustrated in FIG. 14 as having the same dimensions as file 800
(also shown in FIG. 14 for illustrative purposes), although in
practice file 1400 can have any suitable dimensions. Index file
1400 includes a plurality of subregions 1404, each corresponding to
an equivalent subregion of file 800. In each subregion 1404,
generation device 104 is configured to store a list of remaining
depth values, in conjunction with a subregion index indicating the
position of the depth values in a corresponding subregion of file
800. In other words, a plurality of triplets (d, x, y) are stored
in subregions 1404. The size--and therefore the number--of
subregions 1404 can be selected, for example, based on the bit
depth of the indices x and y used to indicate locations within each
subregion. For example, an 8-bit index permits the identification
of a 256.times.256 grid, and thus subregions 1404 should not exceed
256 pixels in height or width. As will be seen below, client device
108 also employs the same subregions.
[0087] Thus, through method 1200, generation device 104 replaces
depth files (such as file 804) with an index of a subset of the
depth values in the original depth files. The index can
additionally identify individual points as well as geometric
parameters encompassing a plurality of points. That is, each
subregion 1404 can include a plurality of index lists, each list
containing depth and position data for a different type of geometry
(e.g. different point sizes including both small, or single-pixel
points, and large, or multi-pixel points, background polygons such
as rectangles, other polygons such as triangles, and the like). For
example, each subregion 1404 of index 1400 can include a background
plane corresponding to the equivalent portion of plane 1320 shown
in FIG. 13B.
[0088] Returning to FIG. 2, having generated primary and secondary
image data at blocks 210 and 215, generation device 104 is
configured to transmit the primary and secondary image data to
client device 108, for example via network 112, at block 220. Prior
to the transmission of the image data, generation device 108 can
perform various preprocessing techniques to prepare the data for
transmission. For example, conventional compression algorithms for
two-dimensional images and video streams can be applied to reduce
the volume of data to be transmitted. In the present example, the
primary and secondary image data are formatted into streams of data
and compressed according to any suitable standard, such as H.264,
VP8/VP9 and the like. The streams of data (or any other suitable
packaging for the data) can be formatted according to any suitable
container format, including that specified by the Motion Pictures
Expert Group (MPEG)-4 standard.
[0089] In addition, generation device 104 can be configured to
create additional versions of the primary and secondary image data
having lower resolutions than the original versions. For example,
generation device 104 can receive an indication from client device
108 of the location and direction of viewpoint 700, and transmit
virtual reality multimedia data that includes either down sampled
versions or omits entirely the portions of the primary and
secondary image data that is not currently visible from the
viewpoint location and direction. For example, image data for one
face of cube 704 may be transmitted at an original resolution,
while the other faces may be transmitted at a lower resolution, or
simply omitted. Combinations of the above are also
contemplated.
[0090] At block 225, client device 108 is configured to receive the
data transmitted by generation device 104 (or an intermediary, as
noted earlier), and decode the prepared data, based on the standard
according to which the data was encoded for transmission at block
220 (e.g. MPEG4). In other words, at block 225 client device 108 is
configured to retrieve, from the data received from generation
device 104, the primary and secondary image data described above,
in the form of files 800 and 804, as well as files 900 and 904 or
files 908 and 912 (or variants thereof). Alternatively, client
device 108 can receive the above-mentioned index files as discussed
in connection with FIG. 14, rather than depth files 804, 904 or
912.
[0091] At block 230, client device 108 is configured to receive a
viewpoint position and direction from virtual reality display 156.
The position and direction received at block 230 need not match the
position of viewpoint 700 discussed above. Viewpoint 700 was
employed for projecting point cloud data 306 into two dimensions,
and reprojecting the primary and secondary image data into three
dimensions to recreate point cloud data 306. The position and
direction received at block 230, on the other hand, corresponds to
the position and direction of the viewpoint within point cloud data
306 as detected by virtual reality display 156 under the command of
an operator. The position and direction of the viewpoint may be
detected by way of accelerometers, pupil detection cameras, or any
other suitable sensors included in virtual reality display 156.
[0092] Upon receipt of the viewpoint position and direction, at
block 235 client device 108 is configured to select and render at
least a portion of the primary and secondary image data received at
block 225, based on the viewpoint position and direction received
at block 230. In general, the selection and rendering process
includes selecting data from the primary and secondary image data
at the CPU of client device 108, and issuing one or more draw calls
to a GPU, for causing the GPU to regenerate point cloud data based
on the selected image data and control virtual reality display 156.
As will be discussed below, client device 108 is configured to
implement various processing techniques to reduce the volume of
point cloud data to be regenerated and processed to control virtual
reality display 156 (i.e. to reduce the computational load on the
GPU).
[0093] Client device 108 is configured to select a subset of the
primary and secondary image data received at block 225, based on
the viewpoint position and direction (including a definition of the
field of view of the viewpoint, also referred to as the frustum of
the viewpoint) received at block 230. For example, client device
108 can be configured to determine which face, or combination of
faces of cube 704 are visible based on the viewpoint position.
Generally, three or fewer faces will be visible from the viewpoint.
Client device 108 can therefore be configured to omit from further
processing any primary and secondary image data located on the
faces that are not visible to the viewpoint. For example, if the
viewpoint has the same location as shown in FIG. 7B, and is
directed towards face 716, then client device 108 may determine
that face 716 is the only face that is currently visible to the
viewpoint. The other five faces may therefore be omitted from
further processing. In other words, the primary image data of files
800 and 804 corresponding to the faces other than face 716, and the
secondary image data of files 900 and 904 or files 908 and 912
corresponding to the faces other than face 716, may be omitted.
[0094] In further embodiments, referring to FIG. 13, client device
108 can be configured to subdivide each face of the two-dimensional
array into a plurality of subregions corresponding to those
discussed in connection with FIG. 14. Client device 108 can
therefore be configured to select a subset of subregions 1404 that
are impinged by the field of view and omit all other
subregions.
[0095] Following the selection of primary and secondary image data
for rendering, client device 108 is configured to transmit the
selected colour and depth data for rendering. For example, the CPU
of client device 108 can be configured to generate a plurality of
draw calls and transmit the draw calls to the GPU of client device
108, or to a GPU or other processor integrated with virtual reality
display 156. Response to receiving such data, the GPU (or any other
suitable processor connected to virtual reality display 156) is
configured to regenerate point cloud data from the selected colour
and depth data, and present the regenerated point cloud data at
virtual reality display 156.
[0096] The data transmitted to the GPU or any other suitable
processing hardware at block 1225 can include one or more indices
of points or geometries, according to the format of data received
from generation device 104. For example, when indices such as those
described in connection with FIG. 14 are received from generation
device 104, client device 108 is configured to submit different
types of draw calls based on the type of geometry listed in the
index. For example, as noted earlier the index received from
generation device 104 can include different point sizes.. The data
sent to the GPU can therefore instruct the GPU to draw a large
point, which causes the GPU to render a larger point to fill in the
space between the non-adjacent points provided in the indices. In
addition, the colour of the large point can be selected based on
colour data for a plurality of points surrounding the center of the
large point.
[0097] It is contemplated that blocks 230 and 235 are generally
performed twice in parallel. Virtual reality display 156 generally
includes two distinct displays (corresponding to the eyes of the
operator), and thus at block 230 includes receiving two viewpoint
positions and block 235 includes selecting and rendering two
distinct sets of image data. Having rendered image data at block
235, client device 108 is configured to return to block 230 to
receive updated viewpoint positions. In some embodiments, the
viewpoint positions can be transmitted to generation device 104,
which can perform at least some of the selection activities
referred to above, and send the resulting image data to client
device 108.
[0098] Variations to the above systems and processes are
contemplated. For example, rather than capturing, processing and
rendering a scene (e.g. capture volume 304) in order to simulate
movement of the operator of virtual reality display 156 within the
scene, system 100 can also be configured to capture, process and
render an object in order to simulate movement of the operator of
virtual reality display 156 around the object. Capturing the object
to generate point cloud data can be accomplished substantially as
described above, however central nodes (e.g. node "x" in FIG. 4)
are generally omitted, as the object to be captured is generally in
that location.
[0099] In further variations, rendering computational performance
(e.g. at block 235) may be improved by reducing the resolution of
the rendered image data based on the proximity of the image data to
the center of the viewpoint frustum. For example, image data
determined by client device 108 to be near the outer edge of the
viewpoint frustum can be reduced in resolution. In an example
embodiment, the reduction in resolution can be achieved by
replacing a number of small points with a small number of large
points, prior to transmission of image data and geometric
parameters to the GPU for rendering. In implementations employing
the subdivisions shown in FIG. 13, client device 108 can determine
whether a subregion 1300 is a peripheral subregion (i.e. located at
the periphery of the viewpoint frustum), and for peripheral
subregions can reduce the resolution prior to data transmission to
the GPU.
[0100] It is also contemplated that the source of the image data
described herein can be supplemented or replaced with light field
capture data (e.g. obtained from one or more light field cameras in
capture apparatus 134). Light field data represents a collection of
light rays passing through a volume. Such data can indicate not
only position and colour data for a plurality of points, but also
properties such as the incident direction of light on the points
and the appearance of each point from a plurality of different
directions. In some embodiments, the light field data can omit
depth data. However, the depth data can be determined from the
depth data.
[0101] The scope of the claims should not be limited by the
embodiments set forth in the above examples, but should be given
the broadest interpretation consistent with the description as a
whole.
* * * * *