U.S. patent application number 14/072953 was filed with the patent office on 2014-05-15 for system and method of real time image playback.
This patent application is currently assigned to Sony Computer Entertainment Europe Limited. The applicant listed for this patent is Sony Computer Entertainment Europe Limited. Invention is credited to Simon Mark Benson, Ian Henry Bickerstaff, Sharwin Winesh Raghoebardayal.
Application Number | 20140132715 14/072953 |
Document ID | / |
Family ID | 47470365 |
Filed Date | 2014-05-15 |
United States Patent
Application |
20140132715 |
Kind Code |
A1 |
Raghoebardayal; Sharwin Winesh ;
et al. |
May 15, 2014 |
SYSTEM AND METHOD OF REAL TIME IMAGE PLAYBACK
Abstract
A method of real-time video playback is provided. The method
includes receiving video image data, receiving supplementary data
relating to at least one step of a process of rendering a 3D model
of a scene depicted in a current frame of the video image data, and
obtaining texture information from the video image data. The method
also includes selecting at least a first viewpoint for rendering
the 3D model of the scene, and rendering the 3D model of the scene
depicted in the current frame of the video image data at the first
selected viewpoint using the obtained textures.
Inventors: |
Raghoebardayal; Sharwin Winesh;
(London, GB) ; Benson; Simon Mark; (London,
GB) ; Bickerstaff; Ian Henry; (London, GB) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Sony Computer Entertainment Europe Limited |
London |
|
GB |
|
|
Assignee: |
Sony Computer Entertainment Europe
Limited
London
GB
|
Family ID: |
47470365 |
Appl. No.: |
14/072953 |
Filed: |
November 6, 2013 |
Current U.S.
Class: |
348/43 |
Current CPC
Class: |
A63F 13/40 20140902;
H04N 13/128 20180501; H04N 2013/0081 20130101; H04N 13/279
20180501; G06T 17/00 20130101; A63F 2300/695 20130101; A63F 13/10
20130101; G06T 19/006 20130101; A63F 13/212 20140902; A63F 2300/69
20130101; G06T 15/04 20130101; G06T 17/20 20130101; H04N 13/271
20180501; G06F 3/012 20130101 |
Class at
Publication: |
348/43 |
International
Class: |
H04N 13/02 20060101
H04N013/02 |
Foreign Application Data
Date |
Code |
Application Number |
Nov 9, 2012 |
GB |
1220219.8 |
Feb 25, 2013 |
GB |
1303301.4 |
Claims
1. A method of real-time video playback, comprising the steps of:
receiving video image data; receiving supplementary data relating
to at least one step of a process of rendering a 3D model of a
scene depicted in a current frame of the video image data;
obtaining texture information from the video image data; selecting,
by one or more processors, at least a first viewpoint for rendering
the 3D model of the scene; and rendering, by the one or more
processors, the 3D model of the scene depicted in the current frame
of the video image data at the first selected viewpoint using the
obtained textures.
2. A method according to claim 1, in which the supplementary data
comprises stereoscopic depth information at a selection of points
in the current frame of the video image data, from which a mesh of
a 3D model can be constructed.
3. A method according to claim 1, in which the supplementary data
comprises 3D mesh information forming some or all of the 3D model
of the scene depicted in the current frame of the video image.
4. A method according to claim 1, in which the supplementary data
comprises information identifying edges in the 3D mesh to which a
transparency gradient may be applied.
5. A method according to claim 4, in which the supplementary data
comprises parametric approximations of the identified edges.
6. A method according to claim 1, in which the received video image
data comprises a stereoscopic image pair.
7. A method according to claim 1, in which the received video image
data comprises textures corresponding to a current frame of a
source video.
8. A method according to claim 1, in which the step of selecting at
least a first viewpoint comprises the steps of: tracking a position
of at least first a user's head with respect to a display; and
calculating the first viewpoint for rendering responsive to a
deviation of the user's head from a default viewpoint.
9. A non-transitory computer program product comprising computer
readable instructions that when implemented by a computer cause it
to perform a method comprising the steps of: receiving video image
data; receiving supplementary data relating to at least one step of
a process of rendering a 3D model of a scene depicted in a current
frame of the video image data; obtaining texture information from
the video image data; selecting at least a first viewpoint for
rendering the 3D model of the scene; and rendering the 3D model of
the scene depicted in the current frame of the video image data at
the first selected viewpoint using the obtained textures.
10. A playback device for real-time video playback, comprising:
first input means for receiving video image data; second input
means arranged for receiving supplementary data relating to at
least one step of a process of rendering a 3D model of a scene
depicted in a current frame of the video image data; texture
obtaining means arranged for obtaining texture information from the
video image data; viewpoint selection means arranged for selecting
at least a first viewpoint for rendering the 3D model of the scene;
rendering means arranged for rendering the 3D model of the scene
depicted in the current frame of the video image data at the first
selected viewpoint using the obtained textures; and output means
arranged for outputting each render for display.
11. A playback device according to claim 10, in which the first and
second input means are one or more selected from the list
consisting of: i. an internet connection for receiving streamed
data; and ii. a reading means for reading data from a
non-transitory physical recording medium.
12. A playback system comprising: the playback device of claim 10;
and a non-transitory physical recording medium, comprising in turn:
video image data comprising image information for a sequential
plurality of video frames; and supplementary data relating to at
least one step of a process of rendering respective 3D models of
respective scenes depicted in respective frames of the sequential
plurality of video frames.
13. A method of authoring a video for real-time video playback,
comprising: receiving stereoscopic video image data comprising a
sequential plurality of stereoscopic video frames; implementing at
least one step of a process of rendering respective 3D models of
respective scenes depicted in respective frames of the sequential
plurality of stereoscopic video frames, using one selected from the
list consisting of: i. generating a disparity map of a current
stereoscopic video frame, and selecting sample points from one or
both of the images in the stereoscopic video frame and associating
with those points depth information derived from the generated
disparity map; and ii. generating a mesh for a 3D model using
selected sample points and associated depth information; and
outputting video image data comprising image information for a
sequential plurality of video frames and supplementary data
generated by the at least one implemented step of a process of
rendering respective 3D models of respective scenes depicted in
respective frames of the sequential plurality of video frames, in
which the outputted supplementary data comprises the selected
sample points and associated depth information, or the mesh for the
3D model, according to the at least one implemented step of the
process of rendering a respective 3D model.
Description
BACKGROUND OF THE INVENTION
[0001] 1. Field of the Invention
[0002] The present invention elates to a system and method of
real-time image playback.
[0003] 2. Description of the Prior Art
[0004] The "background" description provided herein is for the
purpose of generally presenting the context of the disclosure. Work
of the presently named inventors, to the extent it is described in
this background section, as well as aspects of the description
which may not otherwise qualify as prior art at the time of filing,
are neither expressly or impliedly admitted as prior art against
the present invention.
[0005] Many videogame consoles and media players are capable of
processing stereoscopic images for use with 3D televisions. These
images may be variously come from video games or from data
representing a stereoscopic still or video image.
[0006] Preferably, the additional information inherent in a
stereoscopic image can be used to enhance the viewing experience of
a user of such consoles and players when using 3D televisions or
even when using a conventional 2D television to display a version
of the image.
[0007] The present invention seeks to provide such an
enhancement.
SUMMARY OF THE INVENTION
[0008] In a first aspect, a method of real-time video playback is
provided in accordance with claim 1.
[0009] In another aspect, a playback device for real-time video
playback is provided in accordance with claim 10.
[0010] In another aspect, a playback system is provided in
accordance with claim 12.
[0011] In another aspect, a method of authoring a video for
real-time video playback is provided in accordance with claim
13.
[0012] Further respective aspects and features of the invention are
defined in the appended claims.
[0013] It is to be understood that both the foregoing general
description of the invention and the following detailed description
are exemplary, but are not restrictive, of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] A more complete appreciation of the disclosure and many of
the attendant advantages thereof will be readily obtained as the
same becomes better understood by reference to the following
detailed description when considered in connection with the
accompanying drawings, wherein:
[0015] FIG. 1 is a schematic diagram of a stereoscopic pair of
images.
[0016] FIG. 2 is a schematic plan view of a portion of a mesh
generated from the stereoscopic pair of images.
[0017] FIGS. 3A to 3C are schematic plan views of a sequence of
meshes generated from the stereoscopic pair of images in accordance
with an embodiment of the present invention.
[0018] FIGS. 4A to 4C are schematic plan views of a sequence of
meshes generated from the stereoscopic pair of images in accordance
with an embodiment of the present invention.
[0019] FIG. 5A is a schematic diagram of a stereoscopic pair of
images, indicating colour samples.
[0020] FIG. 5B is a schematic diagram of a texture to be
interpolated in accordance with an embodiment of the present
invention.
[0021] FIG. 6 is a schematic diagram of an extrapolation of
surfaces in a model generated from the stereoscopic pair of images,
in accordance with an embodiment of the present invention.
[0022] FIG. 7 is a schematic diagram of an entertainment device in
accordance with an embodiment of the present invention.
[0023] FIG. 8 is a schematic diagram of a polygon mesh at the edge
of an object, in accordance with an embodiment of the present
invention.
[0024] FIG. 9 is a flow diagram of a method of real time image
playback in accordance with an embodiment of the present
invention.
[0025] FIG. 10 is a flow diagram of a method of authoring images
for real time playback in accordance with an embodiment of the
present invention.
DESCRIPTION OF THE EMBODIMENTS
[0026] A system and method of real-time image playback are
disclosed. In the following description, a number of specific
details are presented in order to provide a thorough understanding
of the embodiments of the present invention. It will be apparent,
however, to a person skilled in the art that these specific details
need not be employed to practice the present invention. Conversely,
specific details known to the person skilled in the art are omitted
for the purposes of clarity where appropriate.
[0027] Referring now to FIG. 1, this shows an example stereoscopic
pair of images such as may be captured by a 3D video camera
attached to a console. In the left and right images (denoted R and
L in the figure), a child is performing actions in their living
room in response to a video game, and in each image a different
viewpoint on the scene is captured.
[0028] However, it will be appreciated that a small area of the
room behind the child is not seen in either image, and similarly
there are sections of the room behind the chair that are obscured.
In order to potentially digitally recreate the room (for example to
insert monsters to battle, or to rotate the room on screen to
reveal treasure, or to apparently bounce virtual objects on walls
and/or furniture in the room, or to appropriately calculate the
effects of a virtual light source on the captured video), it would
be desirable to fill in the missing areas within a digital model of
the room.
[0029] Hence in an embodiment of the present invention, as a
preparatory step the left and right images can be rectified to line
up vertically.
[0030] Next, a disparity map is generated, using one of several
known techniques. A disparity map indicates the horizontal
disparity between corresponding pixels in each image. Most
techniques rely on some form of localised cross-correlation between
regions of the two images, but any suitable technique may be
used.
[0031] The disparity map is an indirect indicator of distance
between the 3D video camera and a surface depicted in the image.
For a pair of parallel aligned video cameras in a 3D video camera,
it will be appreciated that the parallel lines converge at infinity
and so at that distance there would be no disparity. Meanwhile an
object very close to the cameras would show significant horizontal
disparity. Hence the degree of disparity corresponds to the
distance of the pixel from the camera.
[0032] Meanwhile, a small object very close to the cameras may in
fact not properly appear in both images, and so the stereo
disparity also effectively imposes an operational near-distance
limit on the stereoscopic effect.
[0033] However, for objects within the operational region of the
device, the disparity between these objects in the two images can
be related to their relative depth from the camera.
[0034] To generate a digital model of the scene one may calculate
the depth information or `z` value at each x, y point in the
disparity map to create a notional point-cloud of (x,y) positions
with associated `z` value data, and then define a mesh describing
the room by, for example, Delaunay triangulation of the calculated
(x,y) points or a subsample thereof. This mesh can then be
projected into 3D by adding the associated `z` value to the
mesh.
[0035] Optionally, the disparity map can be pre-processed to
improve the fidelity of the mesh. Firstly, disparity data for
successive video frames can be stored, and disparity values that
are inconsistent between frames can be replaced. For example, if a
patch of a wall appears to have a different disparity in only one
frame due to an autocorrelation error (for example because a shadow
on the wall resembles a different feature of the other image in the
stereo pair) then this can be identified and corrected using
disparity values from one or more previous maps.
[0036] Similarly optionally, inconsistencies in disparity may be
isolated by using different block sizes (e.g. windows for
autocorrelation detection) to derive disparity maps and identifying
inconsistencies between these versions of the map to produce a map
with higher confidence disparity values.
[0037] Similarly optionally, an edge detection algorithm can be
used to cross-validate where disparity values should be expected to
change in the images.
[0038] Similarly optionally, a point-by-point disparity check can
be implemented, for example by using a 3.times.3 pixel test window
on the disparity map, and calculating whether the central pixel
disparity is different by more than a predetermined amount; and if
so, that pixel disparity is replace, for example with the average
of the disparity of the other eight pixels in the test window (or
some other local disparity value).
[0039] Other optional refinements to the process of making the
initial mesh relate to the selection of (x,y) points in the image
to use for triangulation.
[0040] To provide a sampling of points of reasonable density in the
image, optionally at least one point is sampled in each P.times.Q
block of pixels, where P and Q are predetermined dimensions (for
example, and 8.times.8 block of pixels, and is stored with an
associated disparity value. The point may be selected from either
image of the stereo pair or alternatively from a processed version
of one image (or from both if combined).Optionally, more points are
sampled within a block where there is an apparent edge in either
colour or disparity values, in order to make the resulting mesh
more faithfully track the structural elements of the scene likely
to correspond with such colour and/or disparity edges. The edge
itself may be determined first to satisfy a consistency criterion,
for example having a predetermined minimum length, and/or gradient
of change in colour or disparity.
[0041] Thus optional filters have been provided to remove
inconsistencies in the disparity map, and to select salient (x,y)
points for triangulation (for example Delaunay triangulation) to
create a 2D mesh with associated disparity or depth values.
[0042] This 2D mesh can then be easily projected into 3D by giving
the vertices of the 2D mesh the depth values associated with the
points.
[0043] It will be appreciated that in principle the (x,y) points
and z values can be used to generate a 3D mesh in one step.
However, by optionally having a 2D mesh and exactly corresponding
3D mesh, it is simple to cross reference the 3D mesh with the 2D
mesh to calculate the distance between pixels in the image space
that the 3D model will replicate.
[0044] As will be noted below, polygons comprising small
differences in (x,y) distances but proportionately large
differences in z distance are indicative of meshing errors and can
be removed, as explained later herein.
[0045] Hence returning now to FIG. 1 and now also FIG. 2, then
using the line A-A in FIG. 1 as an example, then FIG. 2 illustrates
a plan view of a slice through a mesh at a corresponding line in
the digital model. FIG. 2 is shown aligned with one of the images
from FIG. 1 for ease of understanding. It can be seen that the
depth of the mesh on the left side is effectively infinite (or at a
maximum depth), corresponding to the doorway out of the room. The
mesh then generally maps along the wall. However, there is a clear
error where the images show the child. As noted above, the problem
is that a simple triangulation of the points in the disparity map
can create a mesh that incorrectly treats isolated near-field
objects as solid projections from the background. Hence in FIG. 2,
the (x,y,z) points corresponding to the child's head are
interpreted as a projection forwards from the adjacent (x,y,z)
points corresponding to the wall of the room. This is clearly
wrong.
[0046] To address this, in an embodiment of the present invention,
the generation of such a mesh is performed in a plurality of N
stages or layers. These layers are defined as follows.
[0047] The minimum disparity in the image, corresponding to the
furthest distance, is denoted dispMin.
[0048] The maximum disparity (or the maximum valid disparity, if a
cut-off is being applied) is denoted dispMax.
[0049] Then, dispPerLayer=(dispMax-DispMin)/N.
[0050] dispPerLayer defines a disparity range for successive
analysis layers of the disparity map. Hence a first layer
encompases a start point dispMin to an end point
(dispMin+dispPerLayer)-1, and a second layer encompasses a start
point (dispMin+dispPerLayer) to an end point
(dispMin+(2.times.dispPerLayer)-1), and so on. In this embodiment,
the layers simply adjoin and do not overlap, or only overlap in the
sense of starting or terminating at the same depth as the adjacent
layer. Both interpretations are treated as `non-overlapping`
herein.
[0051] It will be appreciated that since there is typically a
non-linear relationship between disparity and physical distance,
then similarly the calculated distance may be divided equally by N,
and the corresponding disparity ranges identified for each of the
resulting N layers.
[0052] In either case however, it will be understood that each
successive layer represents a slice of the disparity map in the
z-axis having thickness dispPerLayer, progressing from the most
distant elements in the map forwards.
[0053] Referring now to FIGS. 3A to 3C, these illustrate a mesh
generation process with (as a non-limiting example) 4 such layers,
labelled 0 to 3 in FIG. 3A.
[0054] Starting with layer 0, only the disparity or depth values
within the range of this layer are considered. For processing
efficiency, this may be achieved by copying only the points of the
disparity map within this range to a temporary disparity map, which
is then subject to a 2D/3D meshing process such as the Delaunay
triangulation process referred to above. In this case the remaining
points in the temporary disparity map are treated as invalid or
empty points as appropriate. It will be appreciated that any of the
optional filtering processes previously described can be applied to
the points of the image as a whole, or on a layer-by-layer basis,
as appropriate.
[0055] Hence in layer 0, only the depth information corresponding
to the doorway in the scene of FIG. 1 is present. A mesh based on
these actual (x,y,z) points (shown with a solid line in FIG. 3B) is
created for layer 0.
[0056] Next, for layer 1, a mesh based on the actual (x,y,z) points
is shown with a solid line in FIG. 3B. Notably, due to the layering
process described above, the mesh for this layer is generated as if
the child was not in the room at all. Consequently the region of
(x,y,z) points missing due to their occlusion by the child in the
captured stereo image are interpolated in a manner consistent with
the actual (x,y,z) points in this layer, and may be treated
automatically by a Delaunay algorithm as a region of the point
cloud with sparse samples. The interpolated section of the mesh is
shown in FIG. 3B with dotted lines.
[0057] In this example layer 2 does not encompass any disparity
values.
[0058] For layer 3, again a mesh based on the actual (x,y,z) points
in this layer is generated. In this case, these correspond to the
foreground object, which is the child.
[0059] The resulting meshes are then merged to form a single
composite digital model of the scene.
[0060] Several optional rules may be implemented at this point to
provide a good overall result, including one or more selected from
the list consisting of:
[0061] Firstly, where the meshes of two layers terminate but have
terminal x, y and z positions within a threshold distance of each
other, then these meshes may be joined. Optionally for layers 0 and
1 the restriction on the z position may be relaxed, since layer 0
may reach to infinity. Hence for example the mesh of layer 0 may
still be joined to the mesh of layer 1, as shown by the dotted line
in FIG. 3C, because they have adjoining x, y values.
[0062] Secondly, where two meshes overlap, duplicate polygons at
the same positions (or within a predetermined tolerance) are
deleted.
[0063] Thirdly, as noted above, where a polygon in a mesh covers a
small distance in the x,y plane, but a large distance along the z
axis (as defined by predetermined absolute or relative thresholds)
then that polygon may be deleted. Put another way, polygons in a
layer mesh having a predetermined angle close to the normal to the
image plane, or similarly, close to parallel to the line of sight
of the camera, may be removed.
[0064] Fourthly, where the meshes of two layers occupy similar x, y
positions but not similar z positions as defined by a predetermined
threshold, then it can be assumed that the meshes represent
discrete objects, as in the child of layer 3 and the wall of layer
1 in the present example. In this case, the foreground mesh may
optionally be closed (represented by the dotted line on the mesh
corresponding to the child in FIG. 3C).
[0065] In a similar manner to the optional point selection
described previously, optionally other discriminators may be used
to improve foreground and background segmentation of this kind,
including but not limited to colour segmentation. For example, if a
first colour is associated with the background polygons, but not
with foreground polygons (and/or vice versa), then for (x,y)
positions close to the edge of the foreground object, the
associated colours can be used to refine the meshes to more closely
segregate the foreground object.
[0066] Finally, during creation of the mesh at each layer,
optionally a rule may be implemented to suppress interpolation of
the mesh for points more than a predetermined distance apart, where
the distance is a function of the layer number. Optionally this
rule may only be enacted after a predetermined proportion of layers
have been meshed, such as 50% or 75%. The purpose of this rule is
to prevent or reduce erroneous interpolation of a mesh between two
people standing in the same foreground layer.
[0067] It will be appreciated that typically the object(s) causing
the most relevant occlusions will be the one or more people
interacting with the console. Consequently for example the console
may use face recognition to identify a plurality of users in the
images and their corresponding depth positions in the disparity
map, and select N or modify the layer ranges to ensure that they
are meshed in a separate layer from the background and preferably
also from each other. More generally, the console may select a
value of N responsive to the maximum distance or minimum disparity
value so that each layer is of a thickness (or has a point
population) sufficient to build a reasonable mesh. In general, the
higher the value of N (i.e. the more layers used), the better the
end result.
[0068] Where two people are in the same layer, recognition that
they are people can also be used to constrain mesh generation,
treating them as a special case and possibly using different mesh
generation rules based upon for example skeletal modelling.
Interpolation between identified people can also therefore be
suppressed in this way.
[0069] It will be appreciated that the layers in the embodiment
described above are non-overlapping. However, referring now to
FIGS. 4A to 4C, in an alternative embodiment the layers are defined
as follows; the first layer 0' encompases start point dispMin to
end point (dispMin+dispPerLayer), the second layer 1' encompasses
start point dispMin to end point (dispMin+(2.times.dispPerLayer)),
and the third layer 2' encompasses start point dispMin to end point
(dispMin+(3.times.dispPerLayer)), and so on. That is to say, the
layers overlap, and starting at the furthest distance they get
progressively deeper to encompass more of the disparity map each
time. In the example above where N=4, then the final layer 3'
encompassing start point dispMin to end point
(dispMin+(4.times.dispPerLayer)) includes all the points in the
disparity map, like the conventional mesh described previously and
illustrated in FIG. 2. The individual meshes can follow similar
rules to those described in the previous embodiment, such as
suppressing interpolation for high disparity points, refining
meshes using colour information, and/or limiting interpolation (or
using different meshing techniques) for identified people in the
images. They can also use the above described optional filters and
foreground separations strategies.
[0070] FIG. 4A illustrates the mesh generated for layer 1'. FIG. 4B
illustrates the mesh generated for layer 3'.
[0071] As in the previous embodiment, the meshes are merged
successively. Hence the mesh of layer 1' is merged with the mesh of
layer 0' to generate a first merged mesh. Then the mesh of layer 2'
is merged with the first merged mesh to generate a second merged
mesh. Then the mesh of layer 3' is merged with the second merged
mesh to generate a third merged mesh. This process can be
implemented as new layer meshes are generated, or once all layer
meshes have been generated.
[0072] Again, during the merging process duplicate polygons from
different meshes that substantially overlap are deleted, preferably
preserving the polygon generated in the mesh of the thinner
(earlier) layer. Again, where a polygon in a mesh covers a small
distance in the x,y plane, but a large distance on the z-axis (as
defined by predetermined thresholds) then that polygon is deleted,
in other words where the polygon is, within a predetermined
tolerance, on the z-plane, or parallel to the line of sight of the
cameras, or substantially normal to the image plane, then it is
deleted. This latter step for example effectively removes the
connection between foreground objects and background objects in the
meshes of the thicker layers.
[0073] FIG. 4C illustrates the merged meshes in the present
example. Here, the left-most section of the mesh corresponds to the
mesh generated for layer 0', which was overlapped by each
successive mesh and so the duplicate polygons were deleted. The
section of the mesh corresponding to the wall was generated for
layer 1', with the interpolated section of the mesh for the wall
shown as a dotted line. The duplicate polygons for the wall also
generated for layers 2' and 3' would have been deleted. Finally,
the mesh for the child was generated for layer 3'. It will be
appreciated that, as noted previously, the mesh for the child does
not overlap that of the wall; whilst it has similar x,y
co-ordinates to a section of the wall, it has different z
co-ordinates and hence does not overlap in 3 dimensions. Meanwhile
the polygons that were nearly normal to the image plane (having a
small x-y distance and a large z distance) have been deleted,
separating the child from the wall. As in the previous embodiment,
optionally the mesh corresponding to the child has been closed,
denoted by the dotted line on the part of the mesh corresponding to
the child.
[0074] Hence the present invention may operate using a series of
either overlapping or non-overlapping layers, successively moving
forward along the z axis. The overall resulting 3D model is similar
using either embodiment. For non-overlapping layers, logic relating
to linking meshes for surfaces that pass through the layer
interfaces may have more significance, whilst for overlapping
layers, logic relating to identifying and deleting duplicate
polygons may have more significance.
[0075] For the meshes from either embodiment, finally an optional
mesh filter may be employed as follows. In a first step the
entertainment device compares neighbouring polygons to determine if
they are substantially on the same plane. For example if 3 polygons
sharing a vertex point lie within a predetermined angle of each
other (for example .+-.1, 2, 4 or 6 degrees, depending on designer
choice) then these polygons can be modified to lie on a plane
derived from the average of each of the polygon's individual
planes. Optionally several passes through the mesh may be performed
in this manner to homogenise the planar orientation of polygons
that are initially only roughly co-planar.
[0076] The purpose of this filtration is to make the surface
smoother and also to make the local normals on the surface more
consistent and closer to that expected by the user, so that light
and/or virtual objects can be made to bounce off that surface in a
more realistic and expected manner.
[0077] Alternatively or in addition, patch based plane detection
(or RANSAC or another plane detection algorithm) is applied to a
relatively large set of vertices (for example, vertices
corresponding to a region of colour in the corresponding image) and
calculates the overall plane. These vertices are then updated to
lie on the plane, thereby removing any bumps in the majority of
that surface.
[0078] Turning now to FIGS. 5A and 5B, in addition to the
generation of the mesh for the digital model of the scene, in
embodiments of the present invention it is also desirable to
generate textures to apply to the mesh.
[0079] It will be appreciated that for regions of the mesh
corresponding to visible elements of one or both of the stereo
images, the texture can be derived from one or both images.
[0080] An efficient way to do this involves treating the generated
3D model as being flat (i.e. ignore depth values) such that it
functions as mosaic joining together the selected points from the
image. The textures of the image for each visible polygon then
correspond to the pixels within the corresponding piece of
mosaic.
[0081] However, it is also desirable to generate textures for those
parts of the mesh occluded from view in the original images, so
that these parts of the model are visible if the viewpoint is
modified by the user.
[0082] Referring to FIG. 5A, by way of example, the circled points
in the figure show different sections of a carpet or rug. In a
colour rendition of the image, the point 1001L is a salmon pink,
whilst 1001R is a beige and green mix. However, the interface
between these two sections of the rug is obscured by the child in
both images.
[0083] Consequently, texture interpolation between two points 1001L
and 1001R may be optionally performed for the corresponding section
of the mesh model as follows.
[0084] In FIG. 5B, the two pixel positions 1001L and 1001R have
colour values labelled `A` and `1` respectively, denoting the
arbitrary colour values at those positions in the current images.
In the texture to be applied to the mesh, three intervening pixels
1002, 1003, 1004 are undefined.
[0085] To interpolate the colour values of these pixels, in an
embodiment of the present invention the colour values `A` and `1`
corresponding to positions 1001L and 1001R are not used.
[0086] Instead, colour values of neighbouring pixels positioned
away from the undefined pixels are used.
[0087] This is because in the image, the missing pixels are
obscured by an unrelated foreground object (the child) and for the
pixels immediately adjacent to this object in the images there is a
significant risk that the pixel colour at positions 1001L and 1001R
is in fact already a combination of the colour of the foreground
and background objects, due to the per-pixel colour sampling in the
CCDs of the video camera source. Rather than propagate this tainted
colour across the undefined pixels, it is assumed that neighbouring
pixels further from the foreground object may be more
representative of the true background colour.
[0088] Hence in an embodiment of the present invention, the three
interpolated pixels may therefore take the following values: [0089]
1002-75% `B`, 25% `2` [0090] 1003-50% `B`, 50% `2` [0091] 1004-25%
`B`, 75% `2`.
[0092] This provides a uniform transition between the colours `B`
and `2` sampled one pixel adjacent to positions 1001L and R.
[0093] Alternatively, successively distant neighbouring pixels may
be used. The purpose of this is to preserve the existing
variability of the texture as well as to blend the colours. In a
transient image, this will make the interpolation less obvious as
the spatial frequencies in the interpolated section will now be
similar to those in the surrounding texture.
[0094] Hence in this embodiment, the three interpolated pixels may
take the following values: [0095] 1002-75% `B`, 25% `4` [0096]
1003-50% `C`, 50% `3` [0097] 1004-25% `D`, 75% `2`.
[0098] The polygon mesh and the texture(s) may then be rendered and
displayed on screen. For the same viewpoint as the original camera,
the resulting render is likely to look nearly identical to the
original image, as only mesh based on actual (x,y,z) points and
texture from visible image data will be used. However, as the
virtual viewpoint is moved, for example as part of a game play
mechanic, or in response to head tracking of a viewer, then
elements of the scene that have been interpolated become
visible.
[0099] Thus more generally, such texture gaps are filled in with
local texture data on a scan-line basis, with the texture on either
side of the gap being mirrored into the gap.
[0100] Optionally where the still image or video was taken using a
camera equipped with a suitable accelerometer and/or gyroscope or
set of accelerometers and/or gyroscopes, then the angle of the
photo with respect to horizontal can be obtained, and this can be
used to adjust the effective scan line used in the gap filling
process. Hence for example if the gap to be filled was 50 pixels
long, and accelerometer data suggested that the camera was at an
angle of 3 degrees to the horizontal, then the gap filling
algorithm may approximate a scan line spanning approximately
50.times.Sin(3) lines of pixels in the captured image. If not
accelerometer data exists, then optionally an effective scanline
may be chosen to run parallel to a nearby edge in the image that is
close to horizontal.
[0101] In an embodiment of the present invention, the mesh and
textures (i.e. the 3D model) generated as described above or by
another suitable method is persistent, and retained from frame to
frame of the captured video.
[0102] In this 3D model, background objects can be measured or
assumed to be stationary; for example an object that (for its
distance) is a threshold amount P larger than a person where P is a
predetermined proportion such 1.5 or 2, and/or which has a flat
surface, and/or does not move over Q successive video frames, where
Q is a predetermined number such as 30 or 90, can be assumed to be
part of the background and assumed to be stationary.
[0103] It will be appreciated that if a background object is
partially occluded by a person, then when that person moves, the
portion of the background object that is revealed can be added to
the model, both in terms of confirmed mesh geometry and confirmed
texture.
[0104] Confirmed mesh and texture values can then be used to
improve the interpolation of the model behind where the user is
currently stood as they move around.
[0105] Where foreground objects are static (for example a desk) and
obscure a background object (for example a wall or carpet) then the
model can extrapolate the wall/floor surfaces and associated
textures.
[0106] Notably, whilst (assuming a fixed camera position) these
extrapolated surfaces may never be seen directly they can affect
the result of placing a virtual light source in the model, or may
be used to constrain or interact with virtual objects such as pets,
or bouncing balls.
[0107] Referring to FIG. 6, for example, the chair in the room
permanently obscures parts of the wall, the doorway, the floor and
the rug. The meshes defining these surfaces behind the chair can be
extrapolated until they meet, and the colour components of the
surfaces can be similarly extrapolated, either with uniform colours
(1012, 1014) or using colour combinations or repeated textures
(1016), for example in a similar manner to that described with
reference to FIG. 5B. In this case, the wall is blue, the floor is
a khaki colour and the rug is a mix of beige and terracotta.
[0108] It will be appreciated therefore that if a virtual white
light source was positioned in the 3D model between the chair and
the wall, whilst the light source itself would be obscured by the
chair, the reflected light would (in this example) have a
blue/green tint. This light would affect the colour of the other
objects in the 3D model if the model was rendered.
[0109] The model of the chair may also cast a shadow from the
virtual light that plays over part of the model of the user.
[0110] Consequently, the model of the scene can be realistically
lit using virtual light sources.
[0111] In a similar way, a ball whose trajectory took it behind the
chair would bounce off the unseen floor and/or wall in a realistic
manner and re-emerge in a direction intuitively expected by the
user.
[0112] In an embodiment of the present invention, the rendered
model is displayed instead of augmenting the original stereo video
or stereo photo. This is particularly the case when the user
changes the desired viewpoint of the image from that of the
original image.
[0113] However, in an embodiment of the present invention, the
original stereo video or stereo photo is augmented using the
rendered model as follows. When an augmentation of the original
image comprises the addition of a virtual light source, this light
source is added or applied to the 3D model as described above. The
model is then rendered (but not displayed) with this light source
at the same viewpoint as the video camera, to calculate how the
light source and its reflections, shadows etc. modify the rendered
textures. These modifications to the rendered textures (i.e. the
colour difference with and without the light source) thereby
generate a red/green/blue colour change map of the effect of the
virtual light source on the scene.
[0114] These red/green/blue colour changes can then be applied to
the original captured video. In this way, the effects of the
virtual light on the virtual model of the scene can be applied to
the real video of the scene for the corresponding video frame, thus
seeming to apply a virtual light source to the original video. For
3D video, the rendering, colour change mapping and augmentation can
be done for each of the left and right viewpoints.
[0115] It will be appreciated therefore that as appropriate the
above described techniques enable a variety of applications.
[0116] In an embodiment of the present invention, a virtual light
source (or a virtual object comprising a lightsource) may be made
to apparently move within a stereoscopic photo or video, and cast
plausible shadows of objects in the scene onto other objects. The
colour of the light source can be seen to affect the scene, and
colours in the scene may affect how reflected light affects other
elements of the scene.
[0117] This may be implemented on a render of the model of the
scene, or the effects of the virtual light on the model may be
transposed to the original photo or video frame to augment it.
[0118] Alternatively or in addition, virtual objects can interact
with the model of the scene. This may take the form of the model
acting as a bounding box for virtual objects and characters, and/or
the surfaces of the model providing surfaces for physics/based
interactions, such as bounding a ball against a wall, or dropping a
ball onto a table and having it bounce off and onto the floor.
Where an element of the scene is mobile (i.e. the user) then motion
data can be accumulated and used in such physics based
interactions, for example giving or adding a new velocity to a ball
(i.e. hitting it in a new direction).
[0119] Again, such interactions may be implemented on a render of
the model of the scene, or the virtual objects, as computed to
interact with the model of the scene, may be rendered appropriately
and then used to augment the original photo or video frame.
[0120] Alternatively or in addition, head tracking of a user may be
employed to detect their current viewpoint with respect to the
displayed image. If this viewpoint is different to that of the
camera that caught the image (or differs by a threshold amount),
then the rendered model of the image is displayed from the user's
detected viewpoint. The subjective effect is therefore that the
user can move their head left, right, up or down and apparently see
the picture be recomposed as if it were a real 3D object on the
other side of the display screen.
[0121] A suitable device for carrying out the techniques and
variants herein under suitable software instruction include but are
not limited to the Sony.RTM. PlayStation 3.RTM. and PS Vita.RTM..
Hence for example other devices may include set-top television
boxes for terrestrial, satellite and/or cable broadcast TV, set-top
boxes for IPTV, PCs and other media consumption devices with
suitable processing power, and Blu-Ray.RTM. players.
[0122] By way of example, FIG. 7 schematically illustrates the
overall system architecture of the Sony.RTM. Playstation 3.RTM.
entertainment device. A system unit 10 is provided, with various
peripheral devices connectable to the system unit.
[0123] The system unit 10 comprises: a Cell processor 100; a
Rambus.RTM. dynamic random access memory (XDRAM) unit 500; a
Reality Synthesiser graphics unit 200 with a dedicated video random
access memory (VRAM) unit 250; and an I/O bridge 700.
[0124] The system unit 10 also comprises a Blu Ray.RTM. Disk
BD-ROM.RTM. optical disk reader 430 for reading from a disk 440 and
a removable slot-in hard disk drive (HDD) 400, accessible through
the I/O bridge 700. Optionally the system unit also comprises a
memory card reader 450 for reading compact flash memory cards,
Memory Stick.RTM. memory cards and the like, which is similarly
accessible through the I/O bridge 700.
[0125] The I/O bridge 700 also connects to four Universal Serial
Bus (USB) 2.0 ports 710; a gigabit Ethernet port 720; an IEEE
802.11b/g wireless network (Wi-Fi) port 730; and a Bluetooth.RTM.
wireless link port 740 capable of supporting up to seven Bluetooth
connections.
[0126] In operation the I/O bridge 700 handles all wireless, USB
and Ethernet data, including data from one or more game controllers
751. For example when a user is playing a game, the I/O bridge 700
receives data from the game controller 751 via a Bluetooth link and
directs it to the Cell processor 100, which updates the current
state of the game accordingly.
[0127] The wireless, USB and Ethernet ports also provide
connectivity for other peripheral devices in addition to game
controllers 751, such as: a remote control 752; a keyboard 753; a
mouse 754; a portable entertainment device 755 such as a Sony
Playstation Portable.RTM. entertainment device; a video camera such
as a stereoscopic version of the PlayStation Eye.RTM. video camera
756; and a microphone headset 757. Such peripheral devices may
therefore in principle be connected to the system unit 10
wirelessly; for example the portable entertainment device 755 may
communicate via a Wi-Fi ad-hoc connection, whilst the microphone
headset 757 may communicate via a Bluetooth link.
[0128] The provision of these interfaces means that the Playstation
3 device is also potentially compatible with other peripheral
devices such as digital video recorders (DVRs), set-top boxes,
digital cameras, portable media players, Voice over IP telephones,
mobile telephones, printers and scanners.
[0129] In addition, a legacy memory card reader 410 may be
connected to the system unit via a USB port 710, enabling the
reading of memory cards 420 of the kind used by the
Playstation.RTM. or Playstation 2.RTM. devices.
[0130] The game controller 751 is operable to communicate
wirelessly with the system unit 10 via the Bluetooth link. However,
the game controller 751 can instead be connected to a USB port,
thereby also providing power by which to charge the battery of the
game controller 751. In addition to one or more analog joysticks
and conventional control buttons, the game controller is sensitive
to motion in 6 degrees of freedom, corresponding to translation and
rotation in each axis. Consequently gestures and movements by the
user of the game controller may be translated as inputs to a game
in addition to or instead of conventional button or joystick
commands. Optionally, other wirelessly enabled peripheral devices
such as the portable entertainment device 755 or the Playstation
Move (RTM) 758 may be used as a controller. In the case of the
portable entertainment device, additional game or control
information (for example, control instructions or number of lives)
may be provided on the screen of the device. In the case of the
Playstation Move, control information may be provided both by
internal motion sensors and by video monitoring of the light on the
Playstation Move device. Other alternative or supplementary control
devices may also be used, such as a dance mat (not shown), a light
gun (not shown), a steering wheel and pedals (not shown) or bespoke
controllers, such as a single or several large buttons for a
rapid-response quiz game (also not shown).
[0131] The remote control 752 is also operable to communicate
wirelessly with the system unit 10 via a Bluetooth link. The remote
control 752 comprises controls suitable for the operation of the
Blu Ray Disk BD-ROM reader 430 and for the navigation of disk
content.
[0132] The Blu Ray Disk BD-ROM reader 430 is operable to read
CD-ROMs compatible with the Playstation and PlayStation 2 devices,
in addition to conventional pre-recorded and recordable CDs, and
so-called Super Audio CDs. The reader 430 is also operable to read
DVD-ROMs compatible with the Playstation 2 and PlayStation 3
devices, in addition to conventional pre-recorded and recordable
DVDs. The reader 430 is further operable to read BD-ROMs compatible
with the Playstation 3 device, as well as conventional pre-recorded
and recordable Blu-Ray Disks.
[0133] The system unit 10 is operable to supply audio and video,
either generated or decoded by the Playstation 3 device via the
Reality Synthesiser graphics unit 200, through audio and video
connectors to a display and sound output device 300 such as a
monitor or television set having a display 305 and one or more
loudspeakers 310. The audio connectors 210 may include conventional
analogue and digital outputs whilst the video connectors 220 may
variously include component video, S-video, composite video and one
or more High Definition Multimedia Interface (HDMI) outputs.
Consequently, video output may be in formats such as PAL or NTSC,
or in 720p, 1080i or 1080p high definition.
[0134] Audio processing (generation, decoding and so on) is
performed by the Cell processor 100. The Playstation 3 device's
operating system supports Dolby.RTM. 5.1 surround sound, Dolby.RTM.
Theatre Surround (DTS), and the decoding of 7.1 surround sound from
Blu-Ray.RTM. disks.
[0135] In the present embodiment, the stereoscopic video camera 756
comprises a pair of charge coupled devices (CCDs) with respective
optics, an LED indicator, and hardware-based real-time data
compression and encoding apparatus so that compressed video data
may be transmitted in an appropriate format such as an intra-image
based MPEG (motion picture expert group) standard for decoding by
the system unit 10. The camera LED indicator is arranged to
illuminate in response to appropriate control data from the system
unit 10, for example to signify adverse lighting conditions.
Embodiments of the stereoscopic video camera 756 may variously
connect to the system unit 10 via a USB, Bluetooth or Wi-Fi
communication port. Embodiments of the video camera may include one
or more associated microphones and are also capable of transmitting
audio data. In embodiments of the video camera, the CCDs may have a
resolution suitable for high-definition video capture. In use,
images captured by the video camera may for example be incorporated
within a game or interpreted as game control inputs.
[0136] In general, in order for successful data communication to
occur with a peripheral device such as a stereoscopic video camera
or remote control via one of the communication ports of the system
unit 10, an appropriate piece of software such as a device driver
should be provided. Device driver technology is well-known and will
not be described in detail here, except to say that the skilled man
will be aware that a device driver or similar software interface
may be required in the present embodiment described.
[0137] In an embodiment of the present invention, the camera 756 is
not necessarily used to capture the stereo image (or may have
captured it previously) and hence may not itself be a stereoscopic
camera, or not currently operating in a stereoscopic mode (as
applicable), but is used to obtain an image of the user(s) for head
tracking. As noted previously, head tracking may be used to
generate a respective viewpoint of the 3D model so that a user can
look around within the scene. Where two or more users are viewing
the scene, then optionally two or more views may be rendered for
respective display to each user (for example using active shutter
glasses).
[0138] In an embodiment of the present invention, augmenting a
first stereoscopic image comprising a pair of images involves
generating a disparity map from the pair of images of the first
stereoscopic image, the disparity map being indicative of distances
in the first stereoscopic image; generating a virtual
three-dimensional model responsive to the distances indicated by
the disparity map, thereby creating an approximate 3D model of the
scene captured in the first stereoscopic image; modelling an
interaction of a virtual object with that three dimensional model;
and outputting for display an image corresponding to the first
stereoscopic image that comprises a visible effect of the
interaction of the virtual object with the three dimensional
model.
[0139] Optionally, the step of generating a three-dimensional model
in turn comprises a sub-step of defining a series of value ranges
corresponding to disparity values of the disparity map, each value
range in the series having an end point corresponding to a greater
disparity than an end point of preceding value ranges in the
series; a sub-step of selecting points in the disparity map falling
within the respective value range; a sub-step of generating a
respective mesh responsive to those selected points; and a sub-step
of merging the resulting series of generated meshes to form the 3D
model of the scene.
[0140] Optionally, the virtual object has one or more physical
attributes associated with it, and the interaction of the virtual
object with the three dimensional model is responsive to the or
each physical attribute.
[0141] Consequently, if the displayed image is an augmented version
of at least one of the pair of images of the first stereoscopic
image the method may comprise the step of augmenting the or each
image of the first stereoscopic image with the virtual object at a
position responsive to its interaction with the three dimensional
model.
[0142] Optionally, the method further comprises a step of
generating at least a first texture from one or both of the pair of
images of the stereoscopic image; a step of applying the texture to
at least a respective part of the three dimensional model; and a
step of rendering (at least in an internal memory, and not
necessarily for display) the textured three dimensional model
together with the virtual object.
[0143] Consequently, if the virtual object has one or more physical
attributes associated with it, and the interaction of the virtual
object with the three dimensional model is responsive to the or
each physical attribute, then the displayed image may comprise the
rendered textured three dimensional model with the virtual object
at a position responsive to its interaction with the three
dimensional model.
[0144] Similarly consequently, the virtual object may comprise a
light source, and the rendered textured three dimensional model may
be illuminated responsive to that light source.
[0145] In this case, optionally if the displayed image is an
augmented version of at least one of the pair of images of the
first stereoscopic image, the method may comprise a step of
calculating a difference map indicating the differences in rendered
pixel values between rendering the textured three dimensional model
with and without the light source of the virtual object; and a step
of applying that difference map to the at least one of the pair of
images of the first stereoscopic image to generate the displayed
image.
[0146] Similarly in this case, if the displayed image comprises the
rendered textured three dimensional model, this may be illuminated
responsive to the light source of the virtual object.
[0147] Again dependent upon the generating and applying a texture
to the model, the rendering of the textured three dimensional model
with the virtual object may be performed for one or more viewpoints
other than those of the pair of images of the first stereoscopic
image, so as to generate a new view of the scene depicted in the
first stereoscopic image.
[0148] In this case, the selection of the viewpoint(s) may be based
upon a step of tracking the position of a user's head with respect
to a display; and a step of calculating the or each viewpoint for
rendering, responsive to the deviation of the user's head from a
default viewpoint (i.e. the viewpoint of the original stereo
image). The effect of this tracking and rendering process is that
as the user moves their head, the image is recomposed for the new
viewpoints (including where necessary filling in occluded pixels as
described previously), so that it looks as though there is a `real`
3D space behind the display screen that can be looked around.
[0149] It will be appreciated that in this case it is not necessary
to include the steps of modelling an interaction of a virtual
object with the three dimensional model or displaying a visible
effect of such an interaction, if only the ability to look at
different viewpoints is desired.
[0150] Meanwhile, in an embodiment of the present invention, an
entertainment device 10 (such as the Sony PS3 or PS Vita) for
augmenting a first stereoscopic image (for example an image
captured from a stereoscopic camera 756 in communication with the
entertainment device, or from a still or video file stored on the
hard disk 400 or BD Rom 440) comprising a pair of images, itself
comprises input means (such as WiFi 730, Bluetooth 740, and USB
710) operable to receive the first stereoscopic image data;
disparity processing means (such as the Cell processor 100 and/or
RSX 200) operable to generate a disparity map from the pair of
images of the first stereoscopic image, the disparity map being
indicative of distances in the first stereoscopic image; virtual
modelling means (such as the Cell processor 100 and/or RSX 200)
operable to generate a virtual three-dimensional model responsive
to the distances indicated by the disparity map; interaction
modelling means (such as the Cell processor 100 and/or RSX 200)
operable to model an interaction of a virtual object with that
three dimensional model; and output means (such as the RSX 200)
operable to output for display an image corresponding to the first
stereoscopic image that comprises a visible effect of the
interaction of the virtual object with the three dimensional
model.
[0151] Optionally, the entertainment device also comprises texture
generation means (such as the Cell processor 100 and/or RSX 200)
operable to generate at least a first texture from one or both of
the pair of images of the stereoscopic image; texturing means (such
as the RSX 200) operable to apply the texture to at least a
respective part of the three dimensional model; and rendering means
(such as the RSX 200) operable to render the textured three
dimensional model together with the virtual object.
[0152] Hence in summary, the above techniques and apparatus enable
the analysis of a stereo image (such as a photo or video frame),
the generation of a 3D model comprising a mesh of polygons
representative of the physical layout of the scene in the stereo
image, and then the output of a stereo image responsive to that 3D
model, either in the form of an augmented version of the original
stereo image (for example with additional virtual lighting or
objects), or a render of the 3D model with textures derived from
the original stereo image, either rendered from the same
viewpoint(s) as the original image, or from another viewpoint that
in turn may be responsive to tracking the head or eye position of a
person viewing the image.
[0153] An issue that may arise however, particularly in the final
case listed above of a render from a different viewpoint, is that
the meshes of the 3D model may not perfectly track the edges of
objects in the scene, if only because the vertex points of the
polygons in the mesh are a subsample of the actual points in the
image. As a result, when generating textures for these meshes from
the image, the textures may undesirably include colour information
from adjacent features of the scene as viewed from the original
viewpoint, because some pixels in the original image(s)
corresponding to positions of polygons in the mesh may in fact
belong to background features.
[0154] This is illustrated in FIG. 8, where a line 1022 represents
the actual edge of an object (for example the rumpled surface of
the child's coat), whilst the triangles form a mesh representing
the child and approximating this edge (in the triangles, the solid
lines represent sides of the triangles on the edge of the coat,
whilst the dashed lines represent sides that are further from the
edge). Because the textures for the mesh are derived from the
original image, the regions 1026 outside the actual edge of the
coat but within the polygons of the mesh representing the coat will
include colour information from the background behind the coat in
the original image.
[0155] Hence for example with reference to the captured stereo
image of FIG. 1, which shows a child standing partially in front of
a brown clock and a blue wall, it is possible that the texture
applied to the mesh representing the child (located in layer 3 in
the above example of mesh generation) will comprise some brown
and/or blue from the clock and the wall behind.
[0156] When the 3D model with these textures is rendered from the
original viewpoint, these errors are unlikely to be noticeable
because the errors in the foreground textures are exactly aligned
in front of the background that the errors are derived from.
[0157] However, if the image is viewed from a different angle, the
errors in the foreground textures are likely to become visible as
they no longer exactly match the background. For example, the
rendered viewpoint of the image may be moved so that the child
appears to be wholly in front of the clock, but he may retain a
partial blue edging originally derived from the wall. Conversely, a
brown edge derived from the clock may remain on part of the child
even as the viewpoint moves the child away from the clock.
[0158] These texture errors spoil the illusion that the user is
genuinely looking around a real scene in miniature on their
display.
[0159] In order to mitigate this problem, in an embodiment of the
present invention the rendered scene is modified as described
below.
[0160] In an embodiment of the present invention, an edge softening
process (implemented for example by the Cell processor under
suitable software instruction) works on a layer by layer and
structure-by-structure basis, or more generally on a foreground to
background basis, when rendering the 3D model.
[0161] Preferably starting with the closest polygons of the mesh
and working back, if the Cell processor finds a polygon that is
connected to other polygons but has at least one side unconnected
(i.e. not shared between two polygons), then this is identified as
an edge polygon. The Cell processor then traverses this edge to
identify all the unconnected sides along it. Referring back to FIG.
8, the solid lines of the triangles in this Figure are thus
unconnected sides in a single edge.
[0162] To mitigate for the fact that there may be texture errors
near these edges as described above, the transparency of the
rendered object can be modified as a function of its distance from
the edge to form a transparency gradient. Hence the object can be
fully transparent at the edge itself, and become fully opaque at a
predefined distance from that edge (as a non-limiting example,
between 1 and 10 pixels from the edge; the value is likely to be
proportional to the resolution of the source image).
[0163] Optionally, rather than using the edge itself as a baseline
for the transparency gradient, a spline may be drawn through the
edge. The spline will substantially follow the edge but not have
sharp points. The transparency may then be modified as a function
of the signed distance from the spline, being transparent at any
point on the edge side of the spline and optionally on the spline
itself, and becoming fully opaque at a predefined distance from
that spline on the object side of the spline (again as a
non-limiting example, between 1 and 10 pixels from the spline).
[0164] In either case, the transparency gradient advantageously
removes or de-emphasises texture errors at or near the edge of the
object. Additionally in the case of the spline embodiment it also
gives the impression that the object is less sharp and angular,
resulting in a more natural representation of the object with
respect to the original image.
[0165] As noted previously herein, the mesh (for example in FIG.
3B) is typically formed of shells (i.e. curved or flat open
surfaces) rather than closed surfaces, and so the polygons with
unconnected sides forming the edges of objects in the original
image typically correspond to the edges of such shells. Where,
optionally, a shell is subsequently closed using additional
polygons to form a closed surface (as in the mesh in layer 3 in
FIG. 3C), then optionally the additional polygons can be ignored
for the purposes of the edge softening process, or if the edge
softening process is performed after a z-cull or similar polygon
pruning process during rendering, they may be automatically
discounted in consequence.
[0166] In the above process, the edge is identified using
unconnected sides of polygons, but similarly it may be detected
using the connectedness of polygon vertices. For example, a polygon
vertex unconnected to any other polygon may represent the end of an
edge. The edge may then be tracked along unconnected sides of the
polygons or similarly along successive vertices that are not
enclosed by polygons sharing the same vertex point.
[0167] In either case, the edge softening process mitigates the
texture errors that can arise from generating textures from an
original image to be applied to a mesh derived from that image in
order to render that mesh from a different viewpoint to that of the
original image.
[0168] Hence, in an embodiment of the present invention the
rendering an image based upon a first stereoscopic image (itself
comprising a pair of images) may comprise generating a virtual
three-dimensional model of the scene depicted in the first
stereoscopic image responsive to distances derived from the first
stereoscopic image, for example using the mesh generation
techniques described herein; detecting one or more free edges in
the three dimensional model; generating one or more textures for
the virtual three-dimensional model from at least one of the pair
of images of the first stereoscopic image; applying the or each
texture to a respective part of the three dimensional model; and
rendering the virtual three dimensional model from a different
viewpoint to that of the first stereoscopic image, where the step
of rendering the virtual three dimensional model in turn comprises
modifying the transparency of rendered pixels of an applied texture
as a function of the pixel's distance from that free edge.
[0169] It will be appreciated that the above steps may be carried
out in a different order and/or at least partially in parallel. For
example, texture application may be part of the rendering process,
and similarly the modification of the transparency of the rendered
pixels will typically comprise setting an alpha (transparency)
value for the pixels during the rendering process.
[0170] Optionally, the step of detecting one or more free edges in
the virtual three dimensional model comprises detecting at least a
first polygon with a side that is not shared with another polygon.
Alternatively, in an instance of the summary embodiment, optionally
the step of detecting one or more free edges in the virtual three
dimensional model comprises detecting at least a first polygon with
a vertex that is not shared with another polygon.
[0171] Optionally, the step of modifying the transparency comprises
generating a gradient of pixel transparency values over a
predetermined distance from a free edge of the virtual three
dimensional model such that pixels are more transparent at the edge
of the virtual three dimensional model. As noted previously, it
will be understood that the gradient proceeds inwards from the edge
of the object towards the body of the object.
[0172] Alternatively or in addition, optionally the step of
modifying the transparency comprises generating a spline fit to a
free edge of the virtual three dimensional model, and generating a
gradient of pixel transparency values over a predetermined distance
from that spline, such that pixels are more transparent at the
spline. As noted previously, the spline will approximate but not
exactly fit the edge defined by the polygon. Consequently rendered
pixels of an applied texture lying between the spline and the free
edge will be transparent, and the gradient will apply at the spline
and progress onwards towards the body of the object.
[0173] Optionally the step of generating a virtual
three-dimensional model of the scene depicted in the first
stereoscopic image comprises in turn the steps of generating a
disparity map from the pair of images of the first stereoscopic
image (the disparity map being indicative of distances in the first
stereoscopic image), defining a series of value ranges
corresponding to disparity values of the disparity map where each
value range in the series having an end point corresponding to a
greater disparity than an end point of preceding value ranges in
the series, selecting points in the disparity map falling within
the respective value range, generating a respective mesh responsive
to those selected points, and merging the resulting series of
generated meshes to form the 3D model of the scene.
[0174] Optionally, the method comprises the step of modelling an
interaction of a virtual object with the virtual three dimensional
model, and the step of rendering the virtual three dimensional
model comprises rendering a visible effect of the interaction of
the virtual object with the three dimensional model.
[0175] Optionally, the step of rendering the virtual three
dimensional model from a different viewpoint comprises the steps of
tracking the position of a user's head with respect to a display,
and calculating the or each viewpoint for rendering responsive to
the deviation of the user's head from a default viewpoint.
[0176] Meanwhile as noted above an entertainment device such as the
Sony PS3 (10) for rendering an image based upon a first
stereoscopic image (comprising a pair of images) may comprise
virtual modelling means (e.g. Cell processor 100) operable to
generate a virtual three-dimensional model of the scene depicted in
the first stereoscopic image, responsive to distances derived from
the first stereoscopic image, model edge detection means (e.g. Cell
processor 100 and/or RSX 200) operable to detect one or more free
edges in the three dimensional model, texture generation means
(e.g. Cell processor 100 and/or RSX 200) operable to generate one
or more textures for the virtual three-dimensional model from at
least one of the pair of images of the first stereoscopic image,
texture application means (e.g. RSX 200 and/or Cell processor 100)
operable to apply the or each texture to a respective part of the
three dimensional model; and rendering means (e.g. RSX 200
optionally in conjunction with the Cell processor 100) operable to
render the virtual three dimensional model from a different
viewpoint to that of the first stereoscopic image, in which the
rendering means is operable to modify the transparency of rendered
pixels of an applied texture as a function of the pixel's distance
from a free edge.
[0177] Optionally, the rendering means is operable to generate a
gradient of pixel transparency values over a predetermined distance
from a free edge of the virtual three dimensional model such that
pixels are more transparent at the edge of the virtual three
dimensional model.
[0178] Alternatively or in addition, optionally the rendering means
is operable to generate a spline fit to a free edge of the virtual
three dimensional model, and to generate a gradient of pixel
transparency values over a predetermined distance from that spline,
such that pixels are more transparent at the spline.
[0179] Optionally, the virtual modelling means comprises disparity
map generating means (e.g. the Cell processor 100) operable to
generate a disparity map from the pair of images of the first
stereoscopic image, the disparity map being indicative of distances
in the first stereoscopic image, range setting means (e.g. the Cell
processor 100) operable to define a series of value ranges
corresponding to disparity values of the disparity map with each
value range in the series having an end point corresponding to a
greater disparity than an end point of preceding value ranges in
the series, selection means (e.g. the Cell processor 100) operable
to select points in the disparity map falling within the respective
value range, mesh generating means (e.g. the Cell processor 100
and/or RSX 200) operable to generate a respective mesh responsive
to those selected points, and mesh merging means (e.g. the Cell
processor 100 and/or RSX 200) operable to merge the resulting
series of generated meshes to form the 3D model of the scene.
[0180] Optionally, the entertainment device comprises a virtual
object interaction modelling means (e.g. the Cell processor 100)
operable to model an interaction of a virtual object with the
virtual three dimensional model, and the rendering means is
operable to render a visible effect of the interaction of the
virtual object with the three dimensional model.
[0181] Similarly optionally, the entertainment device comprises
input means (e.g. USB port 710, Bluetooth port 740, or WiFi port
730) for head tracking information for a user's head (for example
via a video camera or an accelerometer/gyroscope motion sensor
similar to that in the controller 751 and worn by the user, for
example in a pair of active shutter glasses), and calculating means
(e.g. the Cell processor 100) operable to calculate the or each
viewpoint for rendering responsive to the deviation of the user's
head from a default viewpoint, and in which the or each viewpoint
from which the virtual three dimensional model is rendered is based
upon the or each calculated viewpoint.
[0182] It will be appreciated that the computational loads of the
various techniques described herein may be high, particularly for
high definition video images (for example 1920.times.1080
resolution images, or so-called 4K ultra-high definition
images).
[0183] Steps that impose a notable computational load on the device
rendering the image include: [0184] i. generating a disparity map
of the stereoscopic image as described previously; [0185] ii.
selecting sample points from one or both of the images in the
stereoscopic image, responsive to features such as changes in image
colour or corresponding disparity map values (i.e. feature edges)
as described previously and associating with those points depth
information derived from the generated disparity map; [0186] iii.
generating a mesh for each of N layers of varying depth (which may
or may not overlap as described herein) using these sample points
as described previously; [0187] iv. merging the meshes to form a 3D
model as described previously; [0188] v. optionally filtering the
points and/or meshes, for example to generate flatter surfaces, as
described previously; [0189] vi. optionally identifying mesh
polygons (or parts thereof) corresponding to edges of objects;
[0190] vii. generating textures from one or both of the images in
the stereoscopic image, including optionally interpolating textures
for obscured elements of the scene as described previously; [0191]
viii. optionally applying a transparency gradient to the edges
objects (or to a spline tracking such an edge) as described
previously; [0192] ix. optionally modelling interactions with
virtual objects, such as virtual light sources, as described
previously; and [0193] x. rendering the textured 3D model (or an
augmented version of the original image based upon information
derived from the model), optionally from a different viewpoint (for
2D TVs) or pair of viewpoints (for 3D TVs) to that of the original
image, optionally responsive to tracking of a user's head position
with respect to a display, as described previously.
[0194] In principle, an entertainment device or other media player
needs to perform these steps for every frame of the video, and
hence may be expected to do so 25 or 30 times per second,
necessitating considerable processing power (for example a vector
co-processor for mesh generation) that may add significant cost to
a media player, as well as result in higher electricity consumption
and the need for greater heat dissipation, which in turn may
unfavourably impact on other design factors of the device, and also
potentially noise levels.
[0195] Consequently it is desirable to reduce the computational
load, if possible, to that already manageable in the device, or to
reduce it so that the required additional processing power has a
reduced impact on the device of the kind described above.
[0196] Hence in a first embodiment of the present invention, a
video stream (such as a stereoscopic movie or television program)
has associated with it supplementary data as a file or data stream,
which includes information relating to the implementation of some
or all of steps i. to vi. above.
[0197] In other words, much of the processing of the images can be
performed offline in advance by an author or publisher of the video
stream, and provided in association with the video stream at the
point of delivery (for example as a supplementary data file on a
Blu-Ray.RTM. disk, or as a supplementary data file or stream in an
on-line streaming service).
[0198] It will be appreciated that the size of the supplementary
data will vary depending on which of the steps it provides, and
hence may represent a trade-off between the size of the
supplementary data and the remaining computational overhead for the
playback device.
[0199] Optionally therefore, different steps may have respective
supplementary data files or streams, and a playback device can
request those it needs in order to perform at the desired frame
rate. For media with fixed capacity (such as Blu-Ray disks), then
depending on the available space for the supplementary data there
may therefore be a minimum processor requirement for playback using
the techniques described herein.
[0200] In particular, it will be appreciated that supplementary
data comprising selected sample points (including depth information
at those points) may remove the need for the playback device to
generate a displacement map and analyse one or both images of the
stereoscopic image to select the points itself, thus providing
steps i. and ii. above.
[0201] The data size for points relating to a single image will be
much smaller that the data size of the image itself. The
information may be stored, for example on an intra-frame basis,
using a Compuserve GIF style compression (in this case
approximating depth to one of 255 values), or by any other suitable
compression technique, including having successive sets of points
subdividing the image so that the spatial density of points
increases with each set. In this case, the number of points could
thus be varied according to instantaneous bandwidth by determining
the number of sets to transmit (or store). This data may then be
further compressed on an inter-frame basis using any suitable
method if desired, for example based upon inter-frame position
differentials or point motion vectors.
[0202] Similarly, it will be appreciated that supplementary data
comprising a mesh generated from the selected points may remove the
need for any of steps i. to v. to be performed locally by the
playback device, thus removing a considerable proportion of the
computational load.
[0203] The data size for the mesh of the 3D model is likely to be
smaller than that of the image itself, but larger than that of just
the selected points. Various trade-offs between file size and
computational load may thus be envisaged, including for example
providing mesh data for more distant layers in the mesh generation
scheme described previously, and selected point information for
closer layers, so that the local playback device performs some mesh
generation but not all of it, whilst needing to receive less
supplementary data as a result.
[0204] The mesh data itself may compressed using any suitable
technique on an intra-frame basis, such as the OpenCTM (Open
Compressed Triangle Mesh) format. This data may then be further
compressed on an inter-frame basis using any suitable method if
desired, for example based upon inter-frame position differentials
or motion vectors.
[0205] Likewise, it will be understood that further supplementary
data identifying which polygons (or which polygon sides) of the
mesh included in the above supplementary data form edges to which
the edge softening process described herein may be applied. This
supplementary data may optionally also provide pre-computed splines
for this process.
[0206] In an embodiment of the present invention, steps vii to x as
applicable are implemented locally by the playback device.
[0207] For example, as noted previously herein, the textures for
the mesh are derived from one or both images of the stereoscopic
image, and so it would be inefficient to duplicate this information
in the form of pre-prepared textures.
[0208] Notably, if the textures were derived from one image of the
stereoscopic image, then bandwidth could be saved by only storing
or transferring one of the images of the stereoscopic pair (since
the depth information is already present in the selected data
points or the mesh in the supplementary data and so subsequent
stereoscopic rendering is still possible).
[0209] Similarly, it is likely that the virtual object interacting
with the scene is at least partially controlled or influenced by
actions of the user and hence modelling of its interactions cannot
be easily done in advance and included in the supplementary
data.
[0210] Finally, rendering of the 3D model is likely to be
responsive to the position of the viewer, as described previously,
and so will also be performed by the playback device.
[0211] Hence in the above first embodiment of the present
invention, a video file or stream is associated with one or more
supplementary data files or streams which provide one or more of
the selected points, some or all of the mesh, and/or the identity
of edges in the mesh amenable to the edge softening process (and
associated splines as applicable).
[0212] The supplementary data files or streams may be organised in
a similar manner to the video itself; for example, the data may be
organised on a frame-by-frame basis so that a video frame and
associated supplementary data for that video frame can be accessed
substantially in parallel. Similarly, where any of the
supplementary data is compressed with an inter-frame scheme, this
scheme may be limited to groups of pictures corresponding to groups
of pictures in any inter-frame compression scheme being used with
the video data (such as the group of pictures scheme in MPEG
2).
[0213] It was noted above that it would be inefficient to duplicate
the video image(s) as texture data of the 3D model.
[0214] Consequently in a second embodiment of the present
invention, the video stream itself is not supplied at all. Instead,
a textured version of the 3D model is provided. That is to say, a
supplementary data file or stream comprising the mesh and the
texture data (optionally with edge softening pre-applied to
generate transparency gradients as applicable) is provided instead
of the video stream itself. This supplementary data file or stream
may be in two parts, with the textures separate and acting in place
of the video image data, so that the remaining supplementary data
can be used the same way in either embodiment.
[0215] It will be appreciated that such a supplementary data file
would only be accessed by a device capable of receiving this data
format. Hence the playback device may either signal to a server
which form of delivery it requires (i.e. according to the first or
second embodiments above), or the payback device may select the
appropriate data from a recording medium.
[0216] Again, the rendering of the 3D model is likely to be
responsive to the position of the viewer as described previously,
and so will still be performed by the playback device.
[0217] Hence in either case, the final rendering step will be
performed by the playback device, typically in response to the
position of the user's head (if head tracking is not available or a
user elects to disable the feature, then the render can be
performed from a default viewpoint coincident with the original
viewpoint(s) of the video image).
[0218] As noted previously it will be appreciated that the rendered
image may be stereoscopic (i.e. rendered at two viewpoints to form
a stereoscopic output for a 3D TV), but equally it may be
monoscopic (i.e. a single rendered image, for viewing on a 2D TV).
In either case a sense of depth is created by the fact that the
user can look around the displayed scene in a natural manner by
moving their head, as if the scene was really behind the glass of
the display.
[0219] In addition to conventional 2D and 3D televisions, there are
also so-called dual-view televisions. These are typically 3D TVs
with an alternative mode in which, instead of showing left and
right images for stereoscopic viewing, they show two separate and
potentially unrelated images so that several people can watch
different content on the same screen. This is usually achieved
using a different synchronisation pattern for active shutter
glasses used with the TV (for 3D TV, all glasses use a left/right
winking pattern, whereas for dual view, respective glasses blink in
sequence so both eyes of a user see one of several displayed
images).
[0220] In the case of a dual view TV, the playback device can track
the head positions of several users and hence render and output for
display a respective monoscopic viewpoint to them.
[0221] Finally in principle if a dual view TV has a sufficiently
fast refresh rate and brightness, and the playback device has
sufficient rendering power, it can instead display respective
stereoscopic views for multiple users by generating four or more
rendered images.
[0222] Hence, and referring now to FIG. 9, in a summary embodiment
of the present invention a method of real-time video playback (i.e.
using rendering techniques herein at conventional video playback
frame rates) comprises in a first step s10 receiving video image
data, and in a second step s20, receiving supplementary data
relating to at least one step of a process of rendering a 3D model
of a scene depicted in a current frame of the video image data. In
a third step s30, texture information is obtained from the video
image data, and in a fourth step s40, at least a first viewpoint is
selected for rendering the 3D model of the scene. Then a fifth step
s50 comprises rendering the 3D model of the scene depicted in the
current frame of the video image data at the or each selected
viewpoint using the obtained textures.
[0223] In an instance of the summary embodiment, the supplementary
data comprises stereoscopic depth information at a selection of
points in the current frame of the video image data as described
previously herein, from which a mesh of a 3D model can be
constructed.
[0224] In an instance of the summary embodiment, the supplementary
data comprises 3D mesh information forming some or all of the 3D
model of the scene depicted in the current frame of the video
image.
[0225] As described previously herein, in principle a combination
of mesh and point data may be provided.
[0226] In an instance of the summary embodiment, the supplementary
data comprises information identifying edges in the 3D mesh to
which a transparency gradient may be applied. As described
previously herein, this may take the form of parametric
approximations of the identified edges, such as in the form of a
spline.
[0227] In an instance of the summary embodiment, the received video
image data comprises a stereoscopic image pair. It will be
appreciated that the eventual stereoscopic output for display will
be render of the 3D model. Consequently where the textures for the
3D model can be derived from just one image of a stereoscopic image
pair (for example with suitable texture interpolation as described
previously herein) then the second image of the stereoscopic image
pair becomes superfluous and can be omitted to reduce bandwidth for
online streaming or to reduce storage for physical media. However,
for backwards compatibility with devices that do not have the
rendering capabilities described herein, or for improved fidelity
if the rendering function is turned off by a user, then optionally
the video data can retain the both images of the stereoscopic pair
and to enable conventional output of the stereo image.
[0228] In an instance of the summary embodiment, the received video
image data comprises textures corresponding to a current frame of a
source video held by the publisher of the video image data. This
removes then need for the step of deriving the textures from the
video image at playback, by providing the video image in texture
form (e.g. properly fragmented or tessellated and associated with
polygons of the mesh).
[0229] In an instance of the summary embodiment, the step of
selecting at least a first viewpoint comprises the steps of
tracking the position of at least first a user's head with respect
to a display as described previously herein, and calculating the or
each viewpoint for rendering responsive to the deviation of the
user's head from a default viewpoint (such as viewing position
along the a line projecting perpendicular to the centre of the
display, or more generally at a viewing position horizontally
aligned with the centre of the display).
[0230] Meanwhile, and referring now to FIG. 7, in the summary
embodiment of the present invention a non-transitory physical
recording medium, such as an optical disc like a Blu-Ray.RTM. disc
440, or a solid state memory such as a flash drive or SSD drive, or
a hard drive 400 or a ROM 420, comprises video image data
comprising image information for a sequential plurality of video
frames (i.e. typically a film or TV programme), and supplementary
data relating to at least one step of a process of rendering
respective 3D models of respective scenes depicted in respective
frames of the sequential plurality of video frames.
[0231] Hence in combination with the playback device (10), the
recording medium and playback device form a playback system that in
operation outputs one or more images based upon the video data on
the recording medium processed in accordance with techniques as
described herein.
[0232] Similarly, and again and referring to FIG. 7, in the summary
embodiment of the present invention a playback device, such as the
PS3 (10) or any suitable media player for real-time video playback
(i.e. using rendering techniques herein at conventional video
playback frame rates), comprises input means (710, 720, 730, 740,
400, 430, 450) arranged to receive video image data, input means
(710, 720, 730, 740, 400, 430, 450) arranged to receive
supplementary data relating to at least one step of a process of
rendering a 3D model of a scene depicted in a current frame of the
video image data, texture obtaining means (e.g. Cell processor 100
and/or RSX 200 under suitable software instruction) arranged to
obtain texture information from the video image data, viewpoint
selection means (e.g. Cell processor 100 under suitable software
instruction) arranged to select at least a first viewpoint for
rendering the 3D model of the scene, rendering means (e.g. RSX 200
and optionally Cell processor 100 under suitable software
instruction) arranged to render the 3D model of the scene depicted
in the current frame of the video image data at the or each
selected viewpoint using the obtained textures, and output means
(220) arranged to output the or each render for display.
[0233] In an instance of the summary embodiment, the input means
for the video image data and/or the supplementary data is one or
more selected from the list consisting of an internet connection
for receiving streamed data, and a reading means for reading data
from a non-transitory physical recording medium. Hence for example
both video and supplementary data may be streamed over the internet
and input via a network connection such as Ethernet 720 or USB 710
or WiFi 730 ports, or both video and supplementary data may be
stored on a non-transitory physical recording medium such as a
Blu-Ray disc, and input via a BD-Rom reader 430. Alternatively, a
conventional Blu-Ray disc may comprise the video image data (for
example a stereoscopic film), and the supplementary data may be
received over the internet, thereby providing access to the present
techniques for discs in an existing back-catalogue of products, as
well as avoiding the storage issues for supplementary data on the
disc and mitigating the bandwidth needed for download. With this
combination, the supplementary data for the present techniques
could be provided as a premium, and accessible through an online
purchase system, or simply accessible using data unique to the
existing Blu-Ray disk or content (for example submitting Blu-Ray
disc identification or title information to a central server). The
author or publisher of the content can then release the
supplementary data.
[0234] Hence also similarly, and referring now to FIG. 10, in the
summary embodiment of the present invention a method of authoring a
video for real-time video playback comprises in a first step s110
receiving stereoscopic video image data comprising a sequential
plurality of stereoscopic video frames (e.g. a master recording of
a film), in a second step s120 implementing at least one step of a
process of rendering respective 3D models of respective scenes
depicted in respective frames of the sequential plurality of
stereoscopic video frames, and in a third step s130 outputting
video image data comprising image information for a sequential
plurality of video frames and supplementary data generated by the
at least one implemented step of a process of rendering respective
3D models of respective scenes depicted in respective frames of the
sequential plurality of video frames.
[0235] In the summary embodiment, the implementing step in turn
comprises the steps of generating a disparity map of a current
stereoscopic video frame, and selecting sample points from one or
both of the images in the stereoscopic video frame and associating
with those points depth information derived from the generated
disparity map; in this case the outputted supplementary data then
comprises the selected sample points and associated depth
information.
[0236] Alternatively, in the summary embodiment the implementing
step in turn comprises the step of generating a mesh for a 3D model
using the selected sample points and associated depth information;
in this case the outputted supplementary data then comprises the
mesh for the 3D model.
[0237] It will be appreciated that the methods disclosed herein may
be carried out on conventional hardware suitably adapted as
applicable by software instruction or by the inclusion or
substitution of dedicated hardware, such as the PS3.RTM. described
above.
[0238] Thus the required adaptation to existing parts of a
conventional equivalent device may be implemented in the form of a
non-transitory computer program product or similar object of
manufacture comprising processor implementable instructions stored
on a data carrier such as a floppy disk, optical disk, hard disk,
PROM, RAM, flash memory or any combination of these or other
storage media, or realised in hardware as an ASIC (application
specific integrated circuit) or an FPGA (field programmable gate
array) or other configurable circuit suitable to use in adapting
the conventional equivalent device. Separately, if applicable the
computer program may take the form of a transmission via data
signals on a network such as an Ethernet, a wireless network, the
Internet, or any combination of these or other networks.
[0239] The foregoing discussion discloses and describes merely
exemplary embodiments of the present invention. As will be
understood by those skilled in the art, the present invention may
be embodied in other specific forms without departing from the
spirit or essential characteristics thereof. Accordingly, the
disclosure of the present invention is intended to be illustrative,
but not limiting of the scope of the invention, as well as other
claims. The disclosure, including any readily discernible variants
of the teachings herein, defines, in part, the scope of the
foregoing claim terminology such that no inventive subject matter
is dedicated to the public.
* * * * *