U.S. patent application number 13/822064 was filed with the patent office on 2013-06-27 for compression methods and apparatus for occlusion data.
The applicant listed for this patent is Wang Lin Lai, Dong Tian. Invention is credited to Wang Lin Lai, Dong Tian.
Application Number | 20130162774 13/822064 |
Document ID | / |
Family ID | 44583508 |
Filed Date | 2013-06-27 |
United States Patent
Application |
20130162774 |
Kind Code |
A1 |
Tian; Dong ; et al. |
June 27, 2013 |
COMPRESSION METHODS AND APPARATUS FOR OCCLUSION DATA
Abstract
Methods and apparatuses for coding occlusion layers, such as
occlusion video data and occlusion depth data in 3D video, are
disclosed. A decoding method comprising the steps of: extracting an
indicator representative of an original format for received
occlusion data, the original format selected from a one of a sparse
occlusion data format and a filled occlusion data format; arranging
2D data, which is associated with said occlusion data, at location
after temporal and inter-view pictures in a reference picture list;
identifying at least one of an occlusion area macroblock and a
non-occlusion area macroblock for the occlusion data; decoding said
occlusion data to produce decoded occlusion data, wherein said
decoding includes: for each non-occlusion macroblock, when said
indicator indicates the filled occlusion data format, replacing the
occlusion data in said non-occlusion macroblock with a
corresponding macroblock of associated 2D data to produce a decoded
occlusion data; and when said indicator indicates the sparse
occlusion data format, filling said non-occlusion macroblock with
data indicative of a defined characteristic to produce decoded
occlusion data; and otherwise for each occlusion macroblock,
decoding said occlusion macroblock to produce decoded occlusion
data; and outputting the decoded occlusion data.
Inventors: |
Tian; Dong; (Boxborough,
MA) ; Lai; Wang Lin; (Richardson, TX) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Tian; Dong
Lai; Wang Lin |
Boxborough
Richardson |
MA
TX |
US
US |
|
|
Family ID: |
44583508 |
Appl. No.: |
13/822064 |
Filed: |
August 31, 2011 |
PCT Filed: |
August 31, 2011 |
PCT NO: |
PCT/US11/49886 |
371 Date: |
March 11, 2013 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61403345 |
Sep 14, 2010 |
|
|
|
Current U.S.
Class: |
348/43 |
Current CPC
Class: |
H04N 19/17 20141101;
H04N 19/70 20141101; H04N 19/30 20141101; H04N 19/597 20141101;
H04N 13/161 20180501 |
Class at
Publication: |
348/43 |
International
Class: |
H04N 13/00 20060101
H04N013/00 |
Claims
1. A method for processing occlusion data in a sequence of video
data frames, said method comprising: determining a format for said
occlusion data, said format selected from a one of a sparse
occlusion data format and a filled occlusion data format; when said
format for said occlusion data is determined to be said filled
occlusion data format, converting said occlusion data into a sparse
occlusion data format before encoding; arranging 2D data, which is
associated with said occlusion data, at location after temporal and
inter-view pictures in a reference picture list; detecting one or
more depth boundaries in a reconstructed depth map associated with
said occlusion data, said depth map reconstructed from at least
depth data included in said video data frame associated with said
occlusion data; and classifying each macroblock as an occlusion
area macroblock when said macroblock is located at a distance from
said one or more depth boundaries, said distance being less than or
equal to a defined pixel threshold; encoding said occlusion data to
produce encoded occlusion data, wherein for each non-occlusion
macroblock said encoding is realized by skip mode encoding, and
wherein for each occlusion macroblock said encoding is realized by
selecting an encoding mode based on rate distortion cost; and
outputting said encoded occlusion data together with an indicator
representative of said format determined for said occlusion
data.
2. The method as defined in claim 1, wherein said occlusion data
includes one of occlusion video data and occlusion depth data, and
wherein said 2D data includes 2D video data when said occlusion
data is occlusion video data and wherein said 2D data includes
depth data when said occlusion data is occlusion depth data.
3. The method as defined in claim 1, wherein said encoding is
performed in accordance with a video coding standard including one
of H.264/AVC, MVC, and MPEG-2.
4. An apparatus for processing occlusion data in a sequence of
video data frames, said apparatus comprising an encoder for:
determining a format for said occlusion data, said format selected
from a one of a sparse occlusion data format and a filled occlusion
data format; when said format for said occlusion data is determined
to be said filled occlusion data format, converting said occlusion
data into a sparse occlusion data format before encoding; arranging
2D data, which is associated with said occlusion data, at location
after temporal and inter-view pictures in a reference picture list;
detecting one or more depth boundaries in a reconstructed depth map
associated with said occlusion data, said depth map reconstructed
from at least depth data included in said video data frame
associated with said occlusion data; and classifying each
macroblock as an occlusion area macroblock when said macroblock is
located at a distance from said one or more depth boundaries, said
distance being less than or equal to a defined pixel threshold;
encoding said occlusion data to produce encoded occlusion data,
wherein for each non-occlusion macroblock said encoding is realized
by skip mode encoding, and wherein for each occlusion macroblock
said encoding is realized by selecting an encoding mode based on
rate distortion cost; and outputting said encoded occlusion data
together with an indicator representative of said format determined
for said occlusion data.
5. The apparatus as defined in claim 4, wherein said occlusion data
includes one of occlusion video data and occlusion depth data, and
wherein said 2D data includes 2D video data when said occlusion
data is occlusion video data and wherein said 2D data includes
depth data when said occlusion data is occlusion depth data.
6. The apparatus as defined in claim 4, wherein said encoding is
performed in accordance with a video coding standard including one
of H.264/AVC, MVC, and MPEG-2.
7. A method for processing occlusion data in a sequence of video
data frames, said method comprising: extracting an indicator
representative of an original format for received occlusion data,
said original format selected from a one of a sparse occlusion data
format and a filled occlusion data format; arranging 2D data, which
is associated with said occlusion data, at location after temporal
and inter-view pictures in a reference picture list; identifying at
least one of an occlusion area macroblock and a non-occlusion area
macroblock for said occlusion data; decoding said occlusion data to
produce decoded occlusion data, wherein said decoding includes: for
each non-occlusion macroblock, when said indicator indicates said
filled occlusion data format, replacing said occlusion data in said
non-occlusion macroblock with a corresponding macroblock of
associated 2D data to produce a decoded occlusion data; and when
said indicator indicates said sparse occlusion data format, filling
said non-occlusion macroblock with data indicative of a defined
characteristic to produce decoded occlusion data; and otherwise for
each occlusion macroblock, decoding said occlusion macroblock to
produce decoded occlusion data; and outputting said decoded
occlusion data.
8. The method defined in claim 7, said method further including:
detecting one or more depth boundaries in a decoded depth map
associated with said occlusion data, said depth map decoded from at
least depth data included in said video data frame associated with
said occlusion data; classifying each macroblock as an occlusion
area macroblock when said macroblock is located at a distance from
said one or more depth boundaries, said distance being less than or
equal to a defined pixel threshold.
9. The method as defined in claim 7, wherein said occlusion data
includes one of occlusion video data and occlusion depth data, and
wherein said 2D data includes 2D video data when said occlusion
data is occlusion video data and wherein said 2D data includes
depth data when said occlusion data is occlusion depth data.
10. The method as defined in claim 7, wherein said encoding is
performed in accordance with a video coding standard including one
of H.264/AVC, MVC, and MPEG-2.
11. A apparatus for processing occlusion data in a sequence of
video data frames, said apparatus comprising a decoder for:
extracting an indicator representative of an original format for
received occlusion data, said original format selected from a one
of a sparse occlusion data format and a filled occlusion data
format; arranging 2D data, which is associated with said occlusion
data, at location after temporal and inter-view pictures in a
reference picture list; identifying at least one of an occlusion
area macroblock and a non-occlusion area macroblock for said
occlusion data; decoding said occlusion data to produce decoded
occlusion data, wherein said decoding includes: for each
non-occlusion macroblock, when said indicator indicates said tilled
occlusion data format, replacing said occlusion data in said
non-occlusion macroblock with a corresponding macroblock of
associated 2D data to produce a decoded occlusion data; and when
said indicator indicates said sparse occlusion data format, filling
said non-occlusion macroblock with data indicative of a defined
characteristic to produce decoded occlusion data; and otherwise for
each occlusion macroblock, decoding said occlusion macroblock to
produce decoded occlusion data; and outputting said decoded
occlusion data.
12. The apparatus defined in claim 11, said decoder detecting one
or more depth boundaries in a decoded depth map associated with
said occlusion data, said depth map decoded from at least depth
data included in said video data frame associated with said
occlusion data; classifying each macroblock as an occlusion area
macroblock when said macroblock is located at a distance from said
one or more depth boundaries, said distance being less than or
equal to a defined pixel threshold;
13. The apparatus as defined in claim 11, wherein said occlusion
data includes one of occlusion video data and occlusion depth data,
and wherein said 2D data includes 2D video data when said occlusion
data is occlusion video data and wherein said 2D data includes
depth data when said occlusion data is occlusion depth data.
14. The apparatus as defined in claim 11, wherein said encoding is
performed in accordance with a video coding standard including one
of H.264/AVC, MVC, and MPEG-2.
Description
[0001] The present application claims the benefit of priority from
the following co-pending, commonly owned U.S. Provisional Patent
Applications: Ser. No. 61/403,345 entitled "Compression Methods For
Occlusion Data" filed on Sep. 14, 2010 (Attorney Docket No.
PU100192).
[0002] The present application is related to the following
co-pending, commonly owned patent applications: PCT Application No.
PCT/US2010/001286 entitled "3D Video Coding Formats", having an
international filing date of Apr. 30, 2010 (Attorney Docket No.
PU090045); PCT Application No. PCT/US2010/001291 entitled
"Reference Picture Lists for 3DV," having an international filing
date of Apr. 30, 2010 (Attorney Docket No. PU090049); and PCT
Application No. PCT/US2010/001292 entitled "Inter-Layer Dependency
Information for 3DV", having an international filing date of Apr.
30, 2010 (Attorney Docket No. PU100026).
[0003] The present invention relates to video coding systems and,
more particularly, to three dimensional (3D) image coding and
decoding systems.
[0004] Television programming is becoming more widely available in
3D. Sporting events and concerts have been broadcast for home
consumption. As 3D component sales ramp up and as the demand for 3D
grows, it is expected that 3D programming will be offered widely on
most of the popular TV channels in the near future.
[0005] In order to facilitate new video applications such as 3D
television and free-viewpoint video (FVV), 3D video data formats
consisting of both conventional 2D video and depth--generally
referred to as "2D data"--can be utilized such that additional
views can be rendered for the end user or viewer. There are a
number of different 3D video formats including, for example: 2D
plus depth (2D+Z), Layered Depth Video (LDV), Multiview plus Depth
(MVD), Disparity Enhanced Stereo (DES), and Layer Depth Video plus
Right View (LDV+R), to name a few. The 2D plus depth (2D+Z) format
consists of a 2D video element and its corresponding depth map. The
Layered Depth Video (LDV) format includes the 2D+Z format elements
and occlusion video together with occlusion depth. The Multiview
plus Depth (MVD) format consists of a set of multiple 2D+Z
formatted elements, each 2D+Z formatted element related to a
different viewpoint. The Disparity Enhanced Stereo (DES) format is
composed of two LDV formatted elements, wherein each LDV formatted
element is related to one of two different viewpoints. The Layer
Depth Video plus Right View (LDV+R) format is composed of one LDV
formatted element from a left view and the 2D video element from a
right view.
[0006] Coding has been used to protect the data in these various
formats as well as to gain possible transmission or even processing
efficiencies. Coding, as the term is contemplated for use herein,
should be understood to encompass encoding and decoding operations.
It is typically a challenging task to code the 3D content usually
involving multiple views and possibly corresponding depth maps as
well. Each frame of 3D content may require the system to handle a
huge amount of data. Although the coding of such formatted data
remains a subject of ongoing research, at least one framework for
encoding and decoding much of the 3D video content in these formats
is known to have been presented in PCT Application No.
PCT/US2010/001286, which has been identified above. Nonetheless, it
appears that most coding efforts are directed primarily toward the
actual video or textural information as opposed to supplemental
data such as depth and occlusion data.
[0007] Occlusion data, either occlusion video or occlusion depth,
is not directly viewed by, or presented to, an end user viewing a
TV display. Instead, it is used for virtual view rendering purposes
by a receiver. Occlusion data exhibits different characteristics
from normal video or depth information. It typically contains pixel
values (i.e., for occlusion video data) or depth values (i.e., for
occlusion depth data) that are invisible from a TV viewer's
observation point. No techniques are presently known for
efficiently handling and coding occlusion data in spite of the fact
that occlusion data had surfaced in the LDV format within MPEG 3DV
Ad Hoc group at least as early as 2008.
[0008] Some coding experiments on the LDV format were performed
using multi-view video coding (MVC), in which the occlusion data
are treated as a normal 2D view. However, the approach is not an
efficient way to handle the occlusion video data and the occlusion
depth data.
[0009] Limitations in transmission bandwidth, storage capacity, and
processing capacity, for example, in the face of growing demand for
affordable 3D content will continue to underscore the need for
greater efficiency throughout the 3D system. Yet, none of the
techniques known in the art are suitable for coding occlusion data
efficiently. Hence, a more efficient coding technique for occlusion
data, including both occlusion video data and occlusion depth data,
appears to be needed in order to provide greater system
efficiencies in the processing, storage, and transmission of 3D
content.
[0010] The coding treatment for occlusion data so far appears to
ignore the fact that occlusion data is referenced infrequently in
the rendering process, if at all, and only small areas in a frame
of occlusion data is typically used at any single point in the
rendering process. Typically, the occlusion video is referenced
when holes are observed after a view has been warped to a virtual
position. Even then, reference is only made to one or more small
areas of the occlusion video corresponding to the position of the
holes in the warped view. A similar rationale applies to use of
occlusion depth. These observations are then useful in developing
an efficient coding strategy for the occlusion data.
[0011] In accordance with the principles of the present invention,
coding methods for occlusion layers, such as occlusion video data
and occlusion depth data in 3D video, are directed to improving the
transmission and processing efficiency in systems handling this
data. These coding methods for occlusion data include: indication
of occlusion format; conversion of all occlusion data into a sparse
data format; filling non-occlusion areas or macroblocks with a
defined characteristic, such as a single color; rearranging the
placement of the 2D data within the reference picture list; the use
of proximity to depth boundaries to detect occlusion and
non-occlusion areas or macroblocks; the use of skip mode coding for
non-occlusion areas or macroblocks; the use of rate distortion cost
for coding occlusion areas macroblocks; and the coding of a single
occlusion frame while skipping the next n-1 occlusion frames. Each
of these techniques, whether applied separately or in combination,
affords improved and even significantly enhanced coding and
transmission gains for the overall bitstreams of 3D data.
[0012] According to an aspect of the present principles, there is
provided a method for processing occlusion data in a sequence of
video data frames, the method includes the steps of: determining a
format for the occlusion data, the format selected from a one of a
sparse occlusion data format and a filled occlusion data format;
when the format for the occlusion data is determined to be the
filled occlusion data format, converting the occlusion data into a
sparse occlusion data format before encoding; arranging 2D data,
which is associated with the occlusion data, at location after
temporal and inter-view pictures in a reference picture list;
detecting one or more depth boundaries in a reconstructed depth map
associated with the occlusion data, the depth map reconstructed
from at least depth data included in the video data frame
associated with the occlusion data; and classifying each macroblock
as an occlusion area macroblock when the macroblock is located at a
distance from the one or more depth boundaries, the distance being
less than or equal to a defined pixel threshold; encoding the
occlusion data to produce encoded occlusion data, wherein for each
non-occlusion macroblock the encoding is realized by skip mode
encoding, and wherein for each occlusion macroblock the encoding is
realized by selecting an encoding mode based on rate distortion
cost; and outputting the encoded occlusion data together with an
indicator representative of the format determined for the occlusion
data.
[0013] According to an aspect of the present principles, there is
provided an apparatus for processing occlusion data in a sequence
of video data frames, the apparatus includes an encoder for:
determining a format for the occlusion data, the format selected
from a one of a sparse occlusion data format and a filled occlusion
data format; when the format for the occlusion data is determined
to be the filled occlusion data format, converting the occlusion
data into a sparse occlusion data format before encoding; arranging
2D data, which is associated with the occlusion data, at location
after temporal and inter-view pictures in a reference picture list;
detecting one or more depth boundaries in a reconstructed depth map
associated with the occlusion data, the depth map reconstructed
from at least depth data included in the video data frame
associated with the occlusion data; and classifying each macroblock
as an occlusion area macroblock when the macroblock is located at a
distance from the one or more depth boundaries, the distance being
less than or equal to a defined pixel threshold; encoding the
occlusion data to produce encoded occlusion data, wherein for each
non-occlusion macroblock the encoding is realized by skip mode
encoding, and wherein for each occlusion macroblock the encoding is
realized by selecting an encoding mode based on rate distortion
cost; and outputting the encoded occlusion data together with an
indicator representative of the format determined for the occlusion
data.
[0014] According to an aspect of the present principles, there is
provided a method for processing occlusion data in a sequence of
video data frames, the method includes the steps of: extracting an
indicator representative of an original format for received
occlusion data, the original format selected from a one of a sparse
occlusion data format and a filled occlusion data format; arranging
2D data, which is associated with the occlusion data, at location
after temporal and inter-view pictures in a reference picture list;
identifying at least one of an occlusion area macroblock and a
non-occlusion area macroblock for the occlusion data; decoding the
occlusion data to produce decoded occlusion data, wherein the
decoding includes: for each non-occlusion macroblock, when the
indicator indicates the filled occlusion data format, replacing the
occlusion data in the non-occlusion macroblock with a corresponding
macroblock of associated 2D data to produce a decoded occlusion
data; and when the indicator indicates the sparse occlusion data
format, filling the non-occlusion macroblock with data indicative
of a defined characteristic to produce decoded occlusion data; and
otherwise for each occlusion macroblock, decoding the occlusion
macroblock to produce decoded occlusion data; and outputting the
decoded occlusion data.
[0015] According to an aspect of the present principles, there is
provided an apparatus for processing occlusion data in a sequence
of video data frames, the apparatus includes a decoder for:
extracting an indicator representative of an original format for
received occlusion data, the original format selected from a one of
a sparse occlusion data format and a filled occlusion data format;
arranging 2D data, which is associated with the occlusion data, at
location after temporal and inter-view pictures in a reference
picture list; identifying at least one of an occlusion area
macroblock and a non-occlusion area macroblock for the occlusion
data; decoding the occlusion data to produce decoded occlusion
data, wherein the decoding includes: for each non-occlusion
macroblock, when the indicator indicates the filled occlusion data
format, replacing the occlusion data in the non-occlusion
macroblock with a corresponding macroblock of associated 2D data to
produce a decoded occlusion data; and when the indicator indicates
the sparse occlusion data format, filling the non-occlusion
macroblock with data indicative of a defined characteristic to
produce decoded occlusion data; and otherwise for each occlusion
macroblock, decoding the occlusion macroblock to produce decoded
occlusion data; and outputting the decoded occlusion data.
[0016] The details of one or more implementations are set forth in
the accompanying drawings and the description below. Even if
described in one particular manner, it should be clear that
implementations may be configured or embodied in various manners.
For example, an implementation may be performed as a method, or
embodied as an apparatus configured to perform a set of operations,
or embodied as an apparatus storing instructions for performing a
set of operations. Other aspects and features will become apparent
from the following detailed description considered in conjunction
with the accompanying drawings and the claims.
[0017] The above-mentioned and other features and advantages of
this invention, and the manner of attaining them, will become more
apparent and the invention will be better understood by reference
to the following description of embodiments of the invention taken
in conjunction with the accompanying drawings, wherein:
[0018] FIG. 1 is a block diagram for an exemplary 3D video (3DV)
encoder;
[0019] FIG. 2 is a block diagram for an exemplary 3D video (3DV)
decoder;
[0020] FIG. 3 is a block diagram for an exemplary 3D video (3DV)
layer encoder;
[0021] FIG. 4 is a block diagram for an exemplary 3D video (3DV)
layer decoder;
[0022] FIG. 5 shows the components of the LDV format in (a)-(f),
where (c) and (d) represent filled occlusion data and where (e) and
(f) represent sparse occlusion data, which can be employed in place
of (c) and (d), respectively;
[0023] FIGS. 6 and 7 show flowcharts for one embodiment of encoding
and decoding of occlusion data involving an indication of sparse
and filled occlusion data together with a keying technique for the
occlusion data, realized in accordance with the principles of the
present invention;
[0024] FIGS. 8 and 9 show flowcharts for a second embodiment of
encoding and decoding of occlusion data involving the use of skip
mode for occlusion data, realized in accordance with the principles
of the present invention;
[0025] FIGS. 10 and 11 show flowcharts for a third embodiment of
encoding and decoding of occlusion data involving the use of depth
skip mode for occlusion data, realized in accordance with the
principles of the present invention;
[0026] FIGS. 12 and 13 show flowcharts for fourth embodiment of
encoding and decoding of occlusion data involving the use of
updates for occlusion data, realized in accordance with the
principles of the present invention.
[0027] The exemplary embodiments set out herein illustrate
preferred embodiments of the invention, and such exemplary
embodiments are not to be construed as limiting the scope of the
invention in any manner.
[0028] Coding methods for occlusion layers, such as occlusion video
data and occlusion depth data, are described herein directed to
improving the transmission and processing efficiency in systems
handling this data. Several improved coding techniques are
disclosed. Additionally, the description also includes information
about syntaxes for inclusion in frame headers or overhead messages
to communicate details about the actual type of occlusion data and
other information useful in the practice of the present
invention.
[0029] It is intended that the encoding and decoding techniques
described herein are applicable to occlusion data, in general,
whether that data is occlusion depth data or occlusion video data,
unless one specific kind of occlusion data is expressly specified.
Moreover, it is also intended that the encoding and decoding
techniques described herein are applicable to any format of the
occlusion data, in general, whether that data format is sparse or
filled, unless one specific type of occlusion data format is
expressly specified.
[0030] It is important to describe certain terms so that they are
properly understood in the context of this application. Certain
useful terms are defined below as follows:
[0031] "2D data" includes one or both of the 2D video data and
depth data, wherein the term "data" can be used interchangeably
with the term "layer".
[0032] A "2D video" layer is generally used herein to refer to the
traditional video signal.
[0033] A "depth" layer is generally used herein to refer to data
that indicates distance information for the scene objects.
[0034] A "depth map" is a typical example of a depth layer.
[0035] An "occlusion video" layer is generally used herein to refer
to video information that is occluded from a certain viewpoint. The
occlusion video layer typically includes background information for
the 2D video layer.
[0036] An "occlusion depth" layer is generally used herein to refer
to depth information that is occluded from a certain viewpoint. The
occlusion depth layer typically includes background information for
the depth layer.
[0037] A "transparency" layer is generally used herein to refer to
a picture that indicates depth discontinuities or depth boundaries.
A typical transparency layer has binary information, with one of
the two values indicating positions for which the depth has a
discontinuity, with respect to neighboring depth values, greater
than a particular threshold.
[0038] A "3DV view" is defined herein as a data set from one view
position, which is different from the "view" used in MVC. For
example, a 3DV view may include more data than the view in MVC. For
the 2D+Z format, a 3DV view may include two layers: 2D video plus
its depth map. For the LDV format, a 3DV view may include four
layers: 2D video, depth map, occlusion video, and occlusion depth
map. In addition, a transparency map can be another layer data type
within a 3DV view, among others.
[0039] A "3DV layer" is defined as one of the layers of a 3DV view.
Examples of 3DV layers are, for example, 2D view or video, depth,
occlusion video, occlusion depth, and transparency map. Layers
other than 2D view or video are also defined as "3DV supplemental
layers". In one or more embodiments, a 3DV decoder can be
configured to identify a layer and distinguish that layer from
others using a 3dv layer id. In one implementation, 3dv layer id is
defined as in the Table 1. However, it should be noted that the
layers may be defined and identified in other ways, as understood
by those of ordinary skill in the art in view of the teachings
provided herein.
TABLE-US-00001 TABLE 1 3DV layers Value of 3d_layer_id Description
0 2D video 1 Depth 2 Occlusion video 3 Occlusion depth 4
Transparency map .gtoreq.5 Reserved
[0040] In a generic 3DV coder/decoder (codec) framework such as the
one described in PCT Application No. PCT/US2010/001286, as
identified above, occlusion video and occlusion depth are treated
in a specific 3DV layer making it possible to design new or
additional coding modes. In the present description, the 3DV codec
framework from FIGS. 3-6 in PCT Application No. PCT/US2010/001286
is included herein as FIGS. 1-4, respectively. For further details
about this framework, it is recommended that reference be made to
PCT Application No. PCT/US2010/001286.
[0041] FIGS. 1 and 2 illustrate a high-level generic 3DV encoder
300 and decoder 400, respectively. The encoder 300/decoder 400 is
composed of layer encoders/decoders and a 3DV reference buffer. For
example, a 3DV content signal 302, which may include, for example,
2D view, depth, occlusion view, occlusion depth, and transparency
map layers, is input to the various layer encoders as shown in FIG.
1. Specifically, the encoder system/apparatus 300 includes a 2D
layer encoder 304 configured to encode 2D layers, which may be AVC
compatible, an enhanced 2D layer encoder 306 configured to encode
enhanced 2D layers, a depth layer encoder 308 configured to encode
depth layers, an occlusion view layer encoder 310 configured to
encode occlusion view layers, an occlusion depth layer encoder 312
configured to encode occlusion depth layers, and a transparency
layer encoder 314 configured to encode transparency layers. Thus,
each layer can be encoded using a different encoder and/or encoding
technique.
[0042] An enhanced 2D layer is generally used herein to distinguish
such a layer from a layer that is compatible with AVC, MVC, SVC, or
some other underlying standard. For example, enhanced 2D layers are
typically not compatible with MVC because such layers allow new
coding tools, such as, for example, using inter-layer references.
Such layers are, therefore, generally not backward compatible with
MVC.
[0043] Note that the term "enhanced 2D layer` (or supplemental
layer) may also be used to refer to layers that could be coded with
MVC, but which would not be expected to be displayed and so are not
typically described as being coded with MVC. For example, a series
of depth layers could be treated by MVC as a series of pictures and
could be coded by MVC. However, it is not typical to display depth
layers, so it is often desirable to have a different way of
identifying and coding such layers, other than by using MVC.
[0044] Each layer can also use a different reference. The reference
may be from a different layer than the picture/block being encoded
(decoded). The references from different layers may be obtained
from a 3DV Reference Buffer 316 (3DV Reference/Output Buffer 414).
As shown in FIG. 1, each layer encoder is in signal communication
with the 3DV reference buffer 316 to permit various modes of
encoding of the input signal 302 to generate an output signal
318.
[0045] By utilizing the 3DV Reference Buffer 316, each layer of the
3DV format can be encoded using references from its own layer, such
as, for example, temporal references and/or inter-view references
within the same layer with motion and/or disparity compensation,
and/or using inter-layer prediction between the various layers. For
example, an inter-layer prediction may reuse motion information,
such as, for example, motion vector, reference index, etc., from
another layer to encode the current layer, also referred to as
motion skip mode. In this way, the output signal 318 may be
interleaved with various layer information for one or more 3DV
views. The inter-layer prediction may be of any kind of technique
that is based on the access of the other layers.
[0046] With regard to the decoder system/apparatus 400, system 400
includes various layer decoders to which signal 318 may be input as
shown in FIG. 2. In particular, the encoder system/apparatus 400
includes a 2D layer decoder 402, which may be AVC compatible,
configured to decode 2D layers, an enhanced 2D layer decoder 404
configured to decode enhanced 2D layers, a depth layer decoder 406
configured to decode depth layers, an occlusion view layer decoder
408 configured to decode occlusion view layers, an occlusion depth
layer decoder 410 configured to decode occlusion depth layers,
and/or a transparency layer decoder 412 configured to decode
transparency layers.
[0047] As illustrated in FIG. 2, each layer decoder is in signal
communication with a 3DV reference/output buffer 414, which can be
configured to parse decoded layer information received from the
layer decoders and to determine how the layers included in the
input signal fit into a structure that supports 3D processing. Such
3D processing may include, for example, coding of 3D layers as
described herein or rendering (synthesizing) of additional pictures
at a receiver or display unit. Rendering may use, for example,
depth pictures to warp a 2D video and/or occlusion pictures to fill
in holes of a rendered picture with background information.
[0048] In addition, the 3DV reference/output buffer 414 can be
configured to generate an output signal 416 in a 3DV compatible
format for presentation to a user. The formatted 3DV content signal
416 may, of course, include, for example, 2D view, depth, occlusion
view, occlusion depth, and transparency map layers. The output
buffer may be implemented together with the reference buffer, as
shown in FIG. 2, or, alternatively in other embodiments, the
reference and output buffers may be separated.
[0049] Other implementations of the encoder 300 and the decoder 400
may use more or fewer layers. Additionally, different layers than
those shown may be used. It should be clear that the term "buffer",
as used in the 3DV Reference Buffer 316 and in the 3DV
Reference/Output Buffer 414, is an intelligent buffer. Such buffers
may be used, for example, to store pictures, to provide references
(or portions of references), and to reorder pictures for output.
Additionally, such buffers may be used, for example, to perform
various other processing operations such as, for example,
hypothetical reference decoder testing, processing of marking
commands (for example, memory management control operations in
AVC), and decoded picture buffer management.
[0050] FIGS. 3 and 4 respectively depict high level block/flow
diagrams of a general 3DV layer encoder 500 and decoder 600,
respectively, that can be used to implement any one or more of
layer encoders 304-314 and any one or more of layer decoders
402-412, respectfully. It is noted that each of the layer encoders
304-314 can be designed in the same general manner with respect to
their corresponding layers, as, for example, depicted in FIG. 3, to
favor particular purposes. Conversely, the layer encoders may be
configured differently to better utilize their unique
characteristics, as understood in view of the teachings provided
herein. Similarly, decoders 402-412 can be designed in the same
general manner with respect to their corresponding layers, as, for
example, depicted in FIG. 4. Conversely, the layer decoders may be
configured differently to better utilize their unique
characteristics.
[0051] It should be noted that with regard to an MVC encoder, the
input is composed of multiple views. Each view is a traditional 2D
video. Thus, compared to an AVC encoder, the typical MVC encoder
includes additional blocks such as a disparity estimation block, a
disparity compensation block, and an inter-view reference buffer.
Analogously, FIGS. 3 and 4 include blocks for 3DV references and
inter-layer prediction. With a 3DV encoder, the input is composed
of multiple 3D views. As stated above, each 3D view can comprise
several layers. Accordingly, the encoding method for each layer can
be designed differently to utilize their unique features.
Consequently, a 3DV encoder can be divided into layer encoders, as
shown in FIG. 1. However, the layer encoders may also be closely
coupled. The techniques used in the layer encoders may be tailored
as desired for a given system. Since each layer appears as a video
signal, the layers can have a similar structure at a high level as
shown in FIG. 3. It should be noted the layer encoders can be
differently designed at lower, more specific levels. Of course, one
embodiment may also use a single encoder configured to encode all
layers.
[0052] With regard to the high level diagram illustrated in FIG. 3,
3DV layer encoder 500 may include a layer partitioner 504
configured to receive and partition 3DV view layers from each other
for a 3DV view i within input signal 502. The partitioner 504 is in
signal communication with an adder or combiner 506, with a
displacement (motion/disparity) compensation module 508, and with a
displacement (motion/disparity) estimation module 510, each of
which receives a set of partitioned layers from partitioner 504.
Another input to the adder 506 is one of a variety of possible
reference picture information received through switch 512.
[0053] For example, if a mode decision module 536 in signal
communication with the switch 512 determines that the encoding mode
should be intra-prediction with reference to the same block or
slice currently being encoded, then the adder receives its input
from intra-prediction module 530. Alternatively, if the mode
decision module 536 determines that the encoding mode should be
displacement compensation and estimation with reference to a block
or slice, of the same frame or 3DV view or 3DV layer currently
being processed or of another previously processed frame or 3DV
view or 3DV layer, that is different from the block or slice
currently being encoded, then the adder receives its input from
displacement compensation module 508, as shown in FIG. 3. Further,
if the mode decision module 536 determines that the encoding mode
should be 3DV inter-layer prediction with reference to a 3DV layer,
of the same frame or 3DV view currently being processed or another
previously processed frame or 3DV view, that is different from the
layer currently being processed, then the adder receives its input
from the 3DV inter-layer prediction module 534, which is in signal
communication with 3DV Reference Buffer 532.
[0054] The adder 506 provides a signal including 3DV layer(s) and
prediction, compensation, and/or estimation information to the
transform module 514, which is configured to transform its input
signal and provide the transformed signal to quantization module
516. The quantization module 516 is configured to perform
quantization on its received signal and output the quantized
information to an entropy encoder 518. The entropy encoder 518 is
configured to perform entropy encoding on its input signal to
generate bitstream 520. The inverse quantization module 522 is
configured to receive the quantized signal from quantization module
516 and perform inverse quantization on the quantized signal. In
turn, the inverse transform module 524 is configured to receive the
inverse quantized signal from module 522 and perform an inverse
transform on its received signal. Modules 522 and 524 recreate or
reconstruct the signal output from adder 506.
[0055] The adder or combiner 526 adds (combines) signals received
from the inverse transform module 524 and the switch 512 and
outputs the resulting signals to intra prediction module 530 and
deblocking filter 528. Further, the intra prediction module 530
performs intra-prediction, as discussed above, using its received
signals. Similarly, the deblocking filter 528 filters the signals
received from adder 526 and provides filtered signals to 3DV
reference buffer 532.
[0056] The 3DV reference buffer 532, in turn, parses its received
signal. The 3DV reference buffer 532 aids in inter-layer and
displacement compensation/estimation encoding, as discussed above,
by elements 534, 508, and 510. The 3DV reference buffer 532
provides, for example, all or part of various 3DV layers.
[0057] With reference again to FIG. 4, the 3DV layer decoder 600
can be configured to receive bitstream 318 using bitstream receiver
602, which in turn is in signal communication with bitstream parser
604 and provides the bitstream to parser 604. The bit stream parser
604 can be configured to transmit a residue bitstream 605 to
entropy decoder 606, transmit control syntax elements 607 to mode
selection module 622, transmit displacement (motion/disparity)
vector information 609 to displacement compensation
(motion/disparity) module 618 and transmit coding information 611
from 3DV layers other than the 3DV layer currently decoded to 3DV
inter-layer prediction module 620. The inverse quantization module
608 can be configured to perform inverse quantization on an entropy
decoded signal received from the entropy decoder 606. In addition,
the inverse transform module 610 can be configured to perform an
inverse transform on an inverse quantized signal received from
inverse quantization module 608 and to output the inverse
transformed signal to adder or combiner 612.
[0058] Adder 612 can receive one of a variety of other signals
depending on the decoding mode employed. For example, the mode
decision module 622 can determine whether 3DV inter-layer
prediction, displacement compensation, or intra prediction encoding
was performed on the currently processed block by the encoder 500
by parsing and analyzing the control syntax elements 607. Depending
on the determined mode, model selection control module 622 can
access and control switch 623, based on the control syntax elements
607, so that the adder 612 can receive signals from the 3DV
inter-layer prediction module 620, the displacement compensation
module 618 or the intra prediction module 614.
[0059] Here, the intra prediction module 614 can be configured to,
for example, perform intra prediction to decode a block or slice
using references to the same block or slice currently being
decoded. In turn, the displacement compensation module 618 can be
configured to, for example, perform displacement compensation to
decode a block or a slice using references to a block or slice, of
the same frame or 3DV view or 3DV layer currently being processed
or of another previously processed frame or 3DV View or 3DV layer,
that is different from the block or slice currently being decoded.
Further, the 3DV inter-layer prediction module 620 can be
configured to, for example, perform 3DV inter-layer prediction to
decode a block or slice using references to a 3DV layer, of the
same frame or 3DV view currently processed or of another previously
processed frame or 3DV view, that is different from the layer
currently being processed.
[0060] After receiving prediction or compensation information
signals, the adder 612 can add the prediction or compensation
information signals with the inverse transformed signal for
transmission to a deblocking filer 602. The deblocking filter 602
can be configured to filter its input signal and output decoded
pictures. The adder 612 can also output the added signal to the
intra prediction module 614 for use in intra prediction. Further,
the deblocking filter 602 can transmit the filtered signal to the
3DV reference buffer 616. The 3DV reference buffer 316 can be
configured to parse its received signal to permit and aid in
inter-layer and displacement compensation decoding, as discussed
above, by elements 618 and 620, to each of which the 3DV reference
buffer 616 provides parsed signals. Such parsed signals may be, for
example, all or part of various 3DV layers.
[0061] It should be understood that systems/apparatuses 300, 400,
500, and 600 can be configured differently and can include
different elements as understood by those of ordinary skill in the
art in view of the teachings disclosed herein.
[0062] Occlusion data plays a key role in Layered Depth Video (LDV)
format. FIG. 5 shows the components of the LDV format in (a)-(f).
There are four components in the LDV video format: color video
(FIG. 5(a)), depth (FIG. 5(b)), occlusion video (FIGS. 5(c/e)), and
occlusion depth (FIGS. 5(d/f)). Color video is shown in FIG. 5(a);
depth is shown in FIG. 5(b). FIG. 5(c) shows the occlusion video
with the non-occlusion area filled by the corresponding pixels from
color video. FIG. 5(d) depicts the occlusion depth with the
non-occlusion area being filled by the corresponding depth samples
from the depth. FIGS. 5(c) and (d) represents filled occlusion
video and depth data, respectively. In an alternate configuration,
FIGS. 5(e) and (f) represents sparse occlusion video and depth
data, respectively. The sparse occlusion data can be used in place
of the filled occlusion data or vice versa.
[0063] In FIGS. 5(e) and 5(f), the non-occlusion area is shown as
black for occlusion video and white for occlusion depth. Normally,
occlusion data will be represented as shown in FIGS. 5(c) and (d),
which is known as filled occlusion data herein. When the occlusion
data has the non-occlusion area(s) filled by a certain uniform
color as shown in FIGS. 5(e) and (f), such as black or white, this
representation is known as sparse occlusion data herein.
[0064] For the purpose of rendering video for a viewer, it should
be understood that the sparse occlusion data is considered to be
equivalent to the counterpart filled occlusion data because the
non-occlusion area is generally not referred to in 3D warping and
hole filling operations at all. So it is possible to encode either
the filled occlusion data or the sparse occlusion data in the LDV
format without any confusion or loss of generality.
[0065] Sparse and filled occlusion data are equivalent to each
other and interchangeable in terms of rendering. However, a
rendering process may need to know if a pixel belongs to occlusion
area or non-occlusion area such as when performing a hole filling
process in rendering. In such a case, when a hole pixel resides in
an occlusion area, the occlusion data can be used to fill the hole
pixel. Otherwise, neighboring background pixels can be used to fill
the hole pixel.
[0066] As noted above, the indication of the occlusion format is
useful at least in assisting the determination of occlusion area or
non-occlusion area. An indication of occlusion data format can be
included in a high level syntax for the 3D video signal. As used
herein, "high level syntax" refers to syntax present in the
bitstream that resides hierarchically above the macroblock layer.
For example, high level syntax, as used herein, may refer, but is
not limited, to syntax at the slice header level, Supplemental
Enhancement Information (SEI) level, Picture Parameter Set (PPS)
level, Sequence Parameter Set (SPS) level, View Parameter Set
(VPS), and Network Abstraction Layer (NAL) unit header level. Table
2 presents an example of modified SPS to include such an indicator
flag, where the extended SPS for 3DV sequences is employed as an
example.
TABLE-US-00002 TABLE 2 Modified SPS
seq_parameter_set_3dv_extension( ) { C Descriptor
num_3dv_layer_minus1 ue(v) for( i = 0; i <=
num_3dv_layer_minus1; i++ ) 3dv_layer_id[ i ] ue(v) for( i = 1; i
<= num_3dv_layer_minus1; i++ ) { num_3dv_layer_refs_l0[ i ]
ue(v) for( j = 0; j < num_3dv_layer_refs_l0[ i ]; j++ )
3dv_layer_ref_l0[ i ][ j ] ue(v) num_3dv_layer_refs_l1[ i ] ue(v)
for( j = 0; j < num_3dv_layer_refs_l1[ i ]; j++ )
3dv_layer_ref_l1[ i ][ j ] ue(v) } occlusion_data_format u(2) }
The semantics for all the shaded entries in Table 2 above have been
completely described in the commonly owned and co-pending PCT
Application No. PCT/US2010/001286 (Attorney Docket No. PU090045) on
at least pages 50-55 with respect to Table 13 therein. The
semantics of the remaining entry occlusion_data_format are as
follows: [0067] value of 0 indicates the coded occlusion
video/depth is filled occlusion data; [0068] value of 1 indicates
the coded occlusion video/depth is sparse occlusion data; and
[0069] values larger than 1 are reserved at this time.
[0070] FIGS. 6 and 7 show flowcharts for one embodiment of encoding
and decoding of occlusion data involving an indication of sparse
and filled occlusion data together with a keying technique for the
occlusion data. The steps of these processes will be described in
more detail immediately below.
[0071] The encoding method in FIG. 6 starts at step S601. Control
is passed directly t step S602. At step S602, a determination is
made about the input occlusion data format originally received by
the encoder. Although other techniques may be employed for this
determination, one exemplary straightforward technique analyzes an
indicator or an indication of the occlusion data format that is
associated with the received video frame. One embodiment of the
indicator is shown above as the occlusion_data_format entry in the
high level syntax. The indicator characterizes the associated
occlusion data as being in a "filled" format or in a "sparse"
format. In some cases, this indicator is also referred to as a
flag. When the indicator indicates that sparse occlusion data is
received by the encoder, control is transferred to step S603. When
the indicator indicates that filled occlusion data is received by
the encoder, control is transferred to step S604.
[0072] In step S603, the sparse occlusion data is encoded using a
standard video encoding technique to produce encoded occlusion
data. Standard video encoding techniques include, but are not
limited to, Multiview Video Coding (MVC), H.264/Advanced Video
Coding (AVC), and MPEG coding including at least MPEG-2. These
coding techniques are standardized and are understood to be well
known to persons of ordinary skill in this technical field. No
further description of these techniques will be presented herein.
Control is transferred to step S605.
[0073] In step S605, the bitstream is prepared for transmission.
The bitstream includes the encoded occlusion data together with the
indicator of occlusion data format (i.e., the indication of sparse
or filled) for the originally received occlusion data. Control is
transferred to step S606, where the encoding method ends.
[0074] In step S604, the received occlusion data is processed to
change the occlusion data format from a filled format to a sparse
format. When the received occlusion data is represented in the
sparse format, each non-occlusion area is represented as a defined
characteristic, such a defined color or data value. This is
accomplished by replacing data samples in the non-occlusion area by
a defined characteristic such as a defined color or a defined depth
level such that sparse occlusion data format results. The process
is similar to color keying techniques wherein a color in one image
is used to reveal another image behind. The change in
representation to a sparse occlusion data format is more preferable
than the converse (i.e., sparse format changed to filled format)
because of efficiencies that arise from the standard coding
techniques.
[0075] Efficiencies are obtained through conventional encoding
because most of the non-occlusion area uniformly represented with
certain uniform color can be coded in skip mode. In skip mode
encoding, a macroblock is coded as a skipped macroblock thereby
reducing the amount of data in the encoded occlusion data output by
the encoder. When skip mode coding is used, the decoder decodes the
macroblock by referring to motion vectors of the surrounding
macroblocks and/or partitions within surrounding macroblocks. Skip
mode coding is understood to be well known to persons of ordinary
skill in this technical field. No further description of this
coding technique will be presented herein. Control is then
transferred to step S603.
[0076] In this step, it is necessary to identify at least one
occlusion area and at least one non-occlusion area for the
occlusion data. These occlusion areas will be mutually exclusive of
each other. Identification allows the non-occlusion areas to be
filled with a defined characteristic, such as the defined
color.
[0077] One exemplary technique for performing such an
identification of occlusion or non-occlusion areas includes the use
of the depth data, which is from the same frame as the occlusion
data, for detecting one or more depth discontinuities in the video
data frame associated with the occlusion data. The area along each
detected depth discontinuity is then classified as an occlusion
area in the occlusion data. Other techniques may be utilized to
perform the detection and/or classification described herein.
[0078] In another exemplary technique, the video data is input
together with the filled occlusion data. Non-occlusion areas are
exposed by calculating the difference frame between the video frame
and the filled occlusion video frame. Samples in a non-occlusion
area will have a value of zero or close to zero within the
difference frame.
[0079] The decoding method in FIG. 7 starts at step S701. Control
is transferred directly to step S702. In step S702, the indicator
or flag representing the occlusion data format for the occlusion
data originally received at the encoder is extracted. This flag or
indicator identifies the occlusion data format as either the sparse
occlusion data format or the filled occlusion data format. It
should be recalled that the encoder actually outputs encoded
occlusion data in the sparse data format as described above in
reference to the encoding method in FIG. 6. Control is then
transferred to step S703.
[0080] In step 703, the sparse occlusion data is decoded using a
standard video decoding technique to produce decoded occlusion
data. Standard video decoding techniques include, but are not
limited to, Multiview Video Coding (MVC), H.264/Advanced Video
Coding (AVC), and MPEG coding including at least MPEG-2. Control is
transferred to step S704.
[0081] In step S704, a determination is made concerning the
occlusion data format for the occlusion data originally received at
the encoder. This determination is based at least in part on the
flag or indicator extracted in step S702. When the indicator
indicates that sparse occlusion data was originally received by the
encoder (FIG. 6), control is transferred to step S705. When the
indicator indicates that filled occlusion data was originally
received by the encoder (FIG. 6), control is transferred to step
S706.
[0082] In step S705, the decoded occlusion data is output in either
a sparse occlusion data format (from step S704) or a filled
occlusion data format (from step S706). The method ends at step
S707.
[0083] Step S706 is entered because it had been determined in step
S704 that the occlusion data originally received by the encoder was
in a filled occlusion data format as identified by the received
flag or indicator extracted in step S702. As mentioned above, step
S704 outputs decoded occlusion data in the sparse data format. In
order to convert the sparse occlusion data format to the originally
received filled occlusion data format, it is necessary to fill the
non-occlusion area, identified by the defined characteristic such
as the defined color, for example, with the collocated data sample
in the corresponding video or depth component of the frame. When
the occlusion data is the occlusion video, then the corresponding
video component from the same frame is used for filling the
non-occlusion area data samples in the decoded occlusion data.
Similarly, when the occlusion data is the occlusion depth
component, then the corresponding depth component from the same
frame is used for filling the non-occlusion area data samples in
the decoded occlusion data. When the decoded occlusion data is
converted back into the proper originally received format, control
is transferred to step S705.
[0084] In another embodiment of the present invention, the location
of the occlusion data, which can be either occlusion video or
occlusion depth, is changed in the reference picture list.
Construction of the reference picture list typically appends the
inter-layer reference pictures after the temporal pictures and the
inter-view reference pictures in the reference picture list.
Examples of various reference picture lists are described in PCT
Application No. PCT/US2010/001291, which has been identified above.
In this regard, see also commonly owned U.S. Patent Application
Serial No. 2010/0118933 for Pandit et al. In the present invention,
when encoding occlusion data, the reference picture from the video
layer is positioned at location 0 in the reference picture list. In
other words, when encoding occlusion data, the 2D data having the
same timestamp (i.e., the same video frame) is placed at location 0
in the reference picture list.
[0085] When occlusion data is encoded using this reordered
reference picture list, it is possible to obtain some coding
efficiency in dealing with the blocks in the non-occlusion area. It
should be noted that the encoding described herein can be applied
to either the occlusion video data or the occlusion depth data and
that data can be in either a sparse occlusion data format or a
filled occlusion data format. The coding efficiency is gained
because skip mode encoding can be applied during encoding of the
non-occlusion areas so that the depth or video data corresponding
to the non-occlusion area(s) is directly copied without any further
modification to the data in the non-occlusion area. This efficiency
is made possible by having the non-occlusion area information
immediately available from the occlusion video or depth data at
location 0 in the reference picture list.
[0086] Identification of the non-occlusion areas is achieved
through any of the techniques discussed above in reference to step
S604 in FIG. 6. Any well known techniques for determining and
identifying non-occlusion areas, and even occlusion areas, are
contemplated for the same use herein. When a block (i.e.,
macroblock) of video data is identified as being in, or associated
with, a non-occlusion area, the encoder selects skip mode encoding
for that block. When a block of video data is identified as being
in or associated with an occlusion area, the encoder selects a
coding mode for that block based on the rate distortion cost (i.e.,
RD cost). The RD cost of an encoding solution often accounts for
the distortion in the encoded macroblock and counts the actual bits
that would be generated for the encoding solution. The computation
and use of RD cost in video encoding is believed to be a well known
process and is not described in any further detail herein.
[0087] For the decoder realized in accordance with this aspect of
the present invention, data from the video reference frame is
copied to the non-occlusion block. If a sparse occlusion data
format is desired at the decoder, the copy process in the decoder
is skipped and the decoder simply fills the block by the defined
characteristic, such as the defined color described above.
[0088] FIGS. 8 and 9 show flowcharts for an embodiment of encoding
and decoding for occlusion data based on a reordering of the
reference picture list and the use of skip mode for
encoding/decoding certain occlusion data, as discussed above.
[0089] The encoding method in FIG. 8 commences at step S801.
Control is immediately transferred to step S802.
[0090] In step S802, the reference picture list is arranged by
placing the 2D data having the same timestamp at location 0. The
term "2D data" is understood to include one or both of the 2D video
data and the depth data. Control is then transferred to step
S803.
[0091] It is to be understood that the preferred embodiment of the
present invention is realized by processing the received occlusion
data to change the occlusion data format from a filled format to a
sparse format. This has been described above with respect to FIG.
6. When the received occlusion data is represented in the sparse
format, each non-occlusion area is represented as a defined
characteristic, such a defined color or data value. This is
accomplished by replacing data samples in the non-occlusion area by
a defined characteristic such as a defined color or a defined depth
level such that sparse occlusion data format results. The process
is similar to color keying techniques wherein a color in one image
is used to reveal another image behind. The change in
representation to a sparse occlusion data format is more preferable
than the converse (i.e., sparse format changed to filled format)
because of efficiencies that arise from the standard coding
techniques.
[0092] In step S803, encoding of the data is performed. When the
block of data being encoded is identified as being in a
non-occlusion area, encoding is performed using skip mode encoding
for that block. Otherwise, for a block of data identified as not
being in a non-occlusion area (i.e., being in an occlusion area),
the coding mode is selected on the conventional basis of rate
distortion cost (RD cost). Control is then transferred to step
S804.
[0093] In step S804, the bitstream is prepared for output
transmission. The bitstream includes the encoded occlusion data
together with the indicator or flag occlusion data format (i.e.,
the indication of sparse or filled) for the originally received
occlusion data. This indicator has been described in detail above
with respect to FIG. 6, for example. Control is transferred to step
S805, where the encoding method ends.
[0094] The decoding method in FIG. 9 commences at step S901.
Control is immediately transferred to step S902.
[0095] In step S902, the reference picture list is again arranged
by placing the 2D data having the same timestamp at location 0. As
noted above, the term "2D data" is understood to include one or
both of the 2D video data and the depth data. Control is then
transferred to step S903.
[0096] In step S903, all macroblocks in the slice or picture are
decoded in the conventional video decoding manner. Control is then
transferred to step S904.
[0097] In step S904, on the basis of the indicator or flag received
with the video data, one of two possible techniques are used for
the occlusion data. When the indicator identifies the occlusion
data format as sparse for the originally received occlusion data,
the non-occlusion areas are filled with the defined characteristic,
such as the defined color or defined depth value. When the
indicator identifies the occlusion data format as filled for the
originally received occlusion data, the non-occlusion areas are
filled with the data samples from the corresponding portions of the
2D video. Control is then transferred to step S905, where the
decoding method ends for this data.
[0098] It is recognized herein that, for the revised reference
picture list construction described in the embodiment above, the
reference picture index is not necessarily optimized for coding the
occlusion blocks. This issue regarding optimization arises because
blocks in occlusion areas are likely to use a temporal reference
picture for best matching instead of inter-layer reference picture.
On the other hand, it is not necessarily good for the blocks in
non-occlusion areas to put the layer reference picture at the end
of the reference picture list as is shown in PCT Application No.
PCT/US2010/001291, identified above. Thus, the rearrangement of the
reference picture list may not alone provide a completely suitable
and effective solution for encoding/decoding blocks, both blocks
associated with occlusion areas and blocks associated with
non-occlusion areas.
[0099] Another embodiment of an encoder and decoder method for
occlusion data involves the use of depth and the detection of depth
boundaries. This embodiment is depicted in FIGS. 10 and 11. FIGS.
10 and 11 show flowcharts for encoding and decoding of occlusion
data involving the use of depth skip mode encoding for certain
occlusion data. As explained above, the techniques herein are
applicable to both occlusion video data and occlusion depth data,
interchangeably.
[0100] In order to favor the coding of both the occlusion area
blocks and the non-occlusion area blocks, for this embodiment of
the present invention, the reference picture list is arranged by
appending inter-layer reference pictures at the end of the
reference picture list. Examples of such a reference picture list
are described in PCT Application No. PCT/US2010/001291.
[0101] During the encoding process, boundary detection is performed
on the reconstructed depth samples to determine the proximity of
the current macroblock to a detected depth boundary, usually
measured in pixels. The reconstructed depth samples are available
usually at the output of deblocking filter 528 in the encoder of
FIG. 3. The reconstructed depth samples are used in the encoder
because the encoder and decoder must use substantially the same
information for boundary detection, and because the reconstructed
depth samples (map) are the only samples available in the decoder.
The decoder does not have the original depth data present in the
encoder. So it would not be proper for the encoder to utilize the
original depth samples for boundary detection, if one maintains the
constraint that the encoder and decoder must use substantially the
same depth information.
[0102] If it is determined that a macroblock is within l pixels of
a detected depth boundary, then this macroblock is marked as an
occlusion area macroblock, and the encoding mode is selected using
rate distortion (RD) cost as explained above. On the other hand, if
it is determined that a macroblock is not within l pixels of a
detected depth boundary, then the inter-layer skip encoding mode
will be used to encode this macroblock.
[0103] In decoding, the blocks encoded via skip mode encoding
utilize the depth data in the following way. Distance between the
macroblock and the depth boundary is determined. For any macroblock
that was skipped in the encoding process, when distance from the
macroblock to the nearest detected depth boundary is at or within
(i.e., less than) a threshold of l pixels, that macroblock is
identified as a temporally skipped block. Otherwise, when the
distance from the skipped macroblock to the nearest detected depth
boundary is greater than (i.e., beyond) the threshold of l pixels,
that macroblock is identified as non-occlusion area macroblock, and
it is further deemed to be an inter-layer skipped macroblock.
[0104] Detection of the depth boundary is important to the
operation of the codec embodiment. It is noted that the depth
boundary should be detected in the decoder preferably using the
same algorithm as was used in the encoder. This insures that the
reconstructed depth samples have identical reconstruction at the
encoder and at the decoder. Depth boundary detection may be
accomplished by any number of well known techniques. These well
known techniques will not be described further herein.
[0105] The encoding method in FIG. 10 commences at step S1001.
Control is immediately transferred to step S1002. At step S1002,
the reference picture list is arranged by placing the 2D data
having the same timestamp after both the temporal and inter-view
reference pictures in the reference picture list. Control is then
transferred to step S1003.
[0106] In step S1003, one or more depth boundaries are detected
from a reconstructed depth map. The distance from each macroblock
to the closest depth boundary is measured. When the distance from a
macroblock to its closest depth boundary is less than or equal to l
pixels, the macroblock is marked as an occlusion area macroblock.
Otherwise, the macroblock is a non-occlusion area macroblock. Since
the mark or flag identifies the macroblock as being an occlusion
area macroblock, the absence of the mark or flag automatically
identifies the associated macroblock as being a non-occlusion area
macroblock. It should be noted that a two state flag will suffice
to identify each macroblock properly as either a non-occlusion area
macroblock (e.g., flag=0) or an occlusion area macroblock (e.g.,
flag=1). Control is then transferred to step S1004.
[0107] In step S1004, the flag or mark for the macroblock is read.
When the mark indicates that the macroblock is a non-occlusion area
macroblock, the conventional skip mode encoding is used to encode
the macroblock. When the mark indicates that the macroblock is an
occlusion area macroblock, an encoding mode is selected and used
based on conventional rate distortion cost (RD cost). Control is
then transferred to step S1005.
[0108] In step S1005, the bitstream is prepared for output
transmission. The bitstream includes the encoded occlusion data
together with the indicator or flag occlusion data format (i.e.,
the indication of sparse or filled) for the originally received
occlusion data. This indicator has been described in detail above
with respect to FIG. 6, for example. Control is transferred to step
S1006, where the encoding method ends.
[0109] The decoding method in FIG. 11 commences at step S1101.
Control is immediately transferred to step S1102.
[0110] At step S1102, the reference picture list is arranged by
placing the 2D data having the same timestamp after both the
temporal and inter-view reference pictures in the reference picture
list. Control is then transferred to step S1103.
[0111] In step S1103, just as in step S1002 for the encoding
method, one or more depth boundaries are detected from a
reconstructed depth map. The distance from each macroblock to the
closest depth boundary is measured. When the distance from a
macroblock to its closest depth boundary is less than or equal to l
pixels, the macroblock is marked as an occlusion area macroblock.
Otherwise, the macroblock is a non-occlusion area macroblock. Since
the mark or flag identifies the macroblock as being an occlusion
area macroblock, the absence of the mark or flag automatically
identifies the associated macroblock as being a non-occlusion area
macroblock. As described above with respect to FIG. 10, a two state
flag will suffice to identify each macroblock properly as either a
non-occlusion area macroblock (e.g., flag=0) or an occlusion area
macroblock (e.g., flag=1). Control is then transferred to step
S1104.
[0112] Macroblock decoding is then performed in step S1104.
Decoding is performed initially on the basis of the indicators or
flags received with the video data: one flag or mark indicating the
macroblock as being a non-occlusion/occlusion area macroblock and
the other indicator or flag identifying the occlusion data format
as sparse or filled. First, all macroblocks in the slice or picture
are decoded in the conventional video decoding manner, similar to
the step S903 shown in FIG. 9.
[0113] When a skipped macroblock is identified by one flag that
indicates a non-occlusion area macroblock and the other indicator
that identifies the occlusion data format as sparse for the
originally received occlusion data, the non-occlusion areas are
filled with the defined characteristic, such as the defined color
or defined depth value. When a skipped macroblock is identified by
one flag that indicates a non-occlusion area macroblock and the
other indicator that identifies the occlusion data format as filled
for the originally received occlusion data, the non-occlusion areas
are filled with the data samples from the corresponding portions of
the 2D video. For all other macroblocks, conventional decoding is
used, as noted above. Control is then transferred to step S1105,
where the decoding method ends for this data.
[0114] FIGS. 12 and 13 show flowcharts for another embodiment of
encoding and decoding of occlusion data employing an update
mechanism, realized in accordance with the principles of the
present invention, involving the use of an update mechanism for
occlusion data.
[0115] In this embodiment, it is expected that the occlusion frames
are substantially identical or constant from one frame to the next
over a defined period of time (or frames). On the encoder side, the
occlusion data may be obtained by using one representative
occlusion data frame. Alternatively, a number of consecutive
occlusion data frames from a video scene may be merged in a
combinatorial manner to realize the representative occlusion data
frame. For both encoding and decoding, the representative occlusion
data frame is then valid for a defined number of frames (i.e.,
period of time) until it is replaced by a new representative
occlusion data frame. This method can be applied on either the
occlusion video data or occlusion depth data.
[0116] In order to realize this technique, it is necessary to
determine the number of frames n over which the representative
occlusion data frame is valid until the next update. Additionally,
it is necessary to include that number of frames n over which the
representative occlusion data frame is valid until the next update
in a syntax transmitted via a message from the encoder to the
decoder so that the decoder can operate properly. While the frames
over which the representative occlusion data frame is valid are
generally intended to be consecutive, it is contemplated that the
frames may even be non-consecutive under certain circumstances. For
example, when two scenes are switched frequently, the occlusion
data for one scene can be used for the frames related to that scene
in the alternating scene sequence. Since those frames are
alternated with frames from a second scene, the number n for the
period actually covers non-consecutive frames.
[0117] FIG. 12 shows a flowchart for realizing encoding occlusion
data employing the use of an update mechanism. The method starts at
step S1201, for which control is passed to step S1202.
[0118] In step S1202, the time period n is determined. This time
period is generally expressed as an integer number of frames. It
represents the period over which a single representative occlusion
data frame (video or depth) is valid. Control is passed to step
S1203.
[0119] In step S1203, the representative occlusion data frame is
encoded. No encoding or transmission is performed on the next n-1
consecutive occlusion data frames. They are effectively skipped.
The representative occlusion data frame may be one occlusion data
frame selected from n consecutive occlusion data frames in time
period n over which the representative occlusion data frame is
valid. As noted above, the representative occlusion data frame may
be a combination of the characteristics of two or more occlusion
data frames selected from n consecutive occlusion data frames in
time period n over which the representative occlusion data frame is
valid. Control is passed to step S1204.
[0120] In step S1204, the encoded representative occlusion data
frame is transmitted along with a syntax message indicating the
period, n. Control is passed to step S1205.
[0121] In decision step S1205, it is determined whether the period
n has expired so that a new representative occlusion data frame can
be encoded to update and replace the current representative
occlusion data frame. If the time period has expired and there is
another representative occlusion data frame ready for encoding,
then control is passed back to step S1202. If there are no more
occlusion data frames ready for encoding, then control is passed to
step S1206 where the process ends.
[0122] In this embodiment, a decoded occlusion frame will remain
valid for its associated frame and all n-1 subsequent consecutive
frames in decoding order until another representative occlusion
frame is decoded to update and replace the prior representative
occlusion frame.
[0123] The decoding process starts at step S1301, where control is
passed to step S1302. In step S1302, the syntax message is decoded
to determine the period n. Control is passed to step S1303.
[0124] In step S1303, the representative occlusion data frame is
decoded. That representative occlusion data frame is then
maintained as valid for period n, that is, for the next n-1
consecutive frames. Control is passed to step S1304.
[0125] In decision step S1304, it is determined whether the period
n has expired so that a new representative occlusion data frame can
be decoded to update and replace the current representative
occlusion data frame. If the time period n has expired and there is
another representative occlusion data frame ready for decoding,
then control is passed back to step S1302. If there are no more
occlusion data frames ready for decoding, then control is passed to
step S1305 where the process ends.
[0126] The methods described herein are contemplated for use in
computer processor based implementations, or on computer readable
storage media, or in other apparatus such as the coding/decoding
apparatus depicted in FIGS. 1-4 herein.
[0127] The above descriptions and illustrations of the coding and
decoding occlusion data are exemplary of the various embodiments of
the present invention. Certain modifications and variations such as
the use of different types of occlusion data, different orders of
performing certain encoding or decoding steps, or even omitting one
or more steps in a method, may also be used to practice of the
present invention.
[0128] All examples and conditional language recited herein are
intended for pedagogical purposes to aid the reader in
understanding the present principles and the concepts contributed
by the inventor to furthering the art, and are to be construed as
being without limitation to such specifically recited examples and
conditions.
[0129] Moreover, all statements herein reciting principles,
aspects, and embodiments of the present invention, as well as
specific examples thereof, are intended to encompass both
structural and functional equivalents thereof. Additionally, it is
intended that such equivalents include both currently known
equivalents as well as equivalents developed in the future,
including any elements developed at any that perform the same
function, regardless of structure.
[0130] A number of implementations have been described herein.
Nevertheless, it will be understood that various modifications may
be made. For example, elements of different implementations may be
combined, supplemented, modified, or removed to produce other
implementations. Additionally, one of ordinary skill will
understand that other structures and processes may be substituted
for those disclosed and the resulting implementations will perform
at least substantially the same function(s), in at least
substantially the same way(s), to achieve at least substantially
the same result(s) as the implementations disclosed. In particular,
although illustrative embodiments have been described herein with
reference to the accompanying drawings, it is to be understood that
the present principles is not limited to those precise embodiments,
and that various changes and modifications may be effected therein
by one of ordinary skill in the pertinent art without departing
from the scope or spirit of the present principles. Accordingly,
these and other implementations are contemplated by this
application and are within the scope of the following claims.
* * * * *