U.S. patent application number 12/951903 was filed with the patent office on 2011-03-17 for encoding a transparency (alpha) channel in a video bitstream.
This patent application is currently assigned to Apple Inc.. Invention is credited to Barin Geoffry Haskell, David William Singer.
Application Number | 20110064142 12/951903 |
Document ID | / |
Family ID | 43303129 |
Filed Date | 2011-03-17 |
United States Patent
Application |
20110064142 |
Kind Code |
A1 |
Haskell; Barin Geoffry ; et
al. |
March 17, 2011 |
Encoding a Transparency (ALPHA) Channel in a Video Bitstream
Abstract
Disclosed herein is a technique for delimiting the alpha channel
at the NAL layer in codecs like H.264 to facilitate the optional
nature of the alpha channel. In coded video sequences that include
alpha, there is one alpha picture for every primary coded (e.g.,
luma-chroma) picture, and the coded alpha picture is contained in
the same access unit as its corresponding primary coded picture.
The alpha coded slice NAL units of each access unit are sent after
the NAL units of the primary coded picture and redundant coded
pictures, if any. The presence or absence of the alpha NAL units
does not affect the decoding of the remaining NAL units in any
way.
Inventors: |
Haskell; Barin Geoffry;
(Mountain View, CA) ; Singer; David William; (San
Francisco, CA) |
Assignee: |
Apple Inc.
Cupertino
CA
|
Family ID: |
43303129 |
Appl. No.: |
12/951903 |
Filed: |
November 22, 2010 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11095013 |
Mar 31, 2005 |
7852353 |
|
|
12951903 |
|
|
|
|
Current U.S.
Class: |
375/240.25 ;
375/240.01; 375/E7.027 |
Current CPC
Class: |
H04N 19/21 20141101;
G09G 5/377 20130101; G09G 2340/02 20130101; H04N 19/188 20141101;
H04N 19/44 20141101; H04N 19/103 20141101; G09G 2340/125
20130101 |
Class at
Publication: |
375/240.25 ;
375/240.01; 375/E07.027 |
International
Class: |
H04N 7/12 20060101
H04N007/12 |
Claims
1. A method to decode an image data stream, the method comprising:
receiving, by a decoder, an image data stream having one or more
syntax structures corresponding to an encoded foreground image and
one or more syntax structures corresponding to an encoded
transparency mask, wherein the one or more syntax structures
corresponding to the encoded transparency mask are delimited from
and located after the one or more syntax structures corresponding
to the foreground image in the image data stream; decoding, by the
decoder, the one or more syntax structures corresponding to the
foreground image to produce a decoded foreground image; decoding,
by the decoder, the one or more syntax structures corresponding to
the transparency mask to produce a decoded transparency mask; and
developing a display image as a function of the decoded foreground
image and the decoded transparency mask.
2. The method of claim 1, wherein the act of receiving comprises
receiving, by the decoder, an image data stream encoded in
accordance with ITU Recommendation H.264.
3. The method of claim 2, wherein the image data stream comprises a
NAL unit format.
4. The method of claim 2, wherein the image data stream comprises a
byte stream format.
5. The method of claim 3, further comprising receiving, by the
decoder, one or more syntax structures corresponding to an access
unit delimiter NAL, wherein the one or more syntax structures
corresponding to the access unit delimiter NAL is located before
the one or more syntax structures corresponding to the encoded
foreground image in the image data stream.
6. The method of claim 1, wherein the act of developing further
comprises developing a display image as a function of the decoded
foreground image, the decoded transparency mask and a background
image.
7. The method of claim 6, wherein the background image is received,
by the decoder, separately from the encoded foreground image and
the encoded transparency mask.
8. The method of claim 6, wherein the background image is created
by the decoder and is not received as part of the image data
stream.
9. The method of claim 1, wherein there is one and only one
transparency mask for each foreground image.
10. A method to decode an image data stream, the method comprising:
extracting, by a decoder, a syntax structure corresponding to an
encoded foreground image from an image data stream; extracting, by
the decoder, a syntax structure corresponding to an encoded
transparency mask from the image data stream, wherein the syntax
structure corresponding to the transparency mask is delimited from
and located after the syntax structure corresponding to the
foreground image in the image data stream; decoding, by the
decoder, the syntax structure corresponding to the encoded
foreground image to generate a decoded foreground image; decoding,
by the decoder, the syntax structure corresponding to the encoded
transparency mask to generate a decoded transparency mask; and
developing, by the decoder, a display image as a function of the
decoded foreground image and the decoded transparency mask.
11. A method to encode images, the method comprising: encoding, by
an encoder, a foreground image into one or more syntax structures;
encoding, by the encoder, a transparency mask into one or more
syntax structures; combining, by the encoder, the one or more
syntax structures corresponding to the encoded foreground image and
the one or more syntax structures corresponding to the encoded
transparency mask into an image data stream, wherein the one or
more syntax structures corresponding to the encoded transparency
mask are delimited from and located after the one or more syntax
structures corresponding to the encoded foreground image in the
image data stream.
12. The method of claim 11, wherein the image data stream is
encoded, by the encoder, in accordance with ITU Recommendation
H.264.
13. The method of claim 12, wherein the image data stream is
formatted, by the encoder, in a NAL unit stream format.
14. The method of claim 12, wherein the image data stream is
formatted, by the encoder, in a byte stream format.
15. The method of claim 13, further comprising: generating, by the
encoder, a SEI NAL unit; and placing, by the encoder, the SEI NAL
unit into the image data stream before the one or more syntax
structures corresponding to the encoded foreground image.
16. The method of claim 11, further comprising: encoding, by the
encoder, a background image into one or more syntax structures; and
placing, by the encoder, the one or more syntax structures
corresponding to the encoded background image into the image data
stream after the one or more syntax structures corresponding to the
encoded foreground image and before the one or more syntax
structures corresponding to the transparency mask.
17. The method of claim 11, wherein the encoder encodes one and
only one transparency mask for each foreground image.
18. A method to decode an image comprising: receiving, by a
decoder, an image data stream encoded in accordance with ITU
Recommendation H.264, the image data stream including a primary
encoded picture, a delimiter following the primary encoded picture
and an encoded transparency picture following the delimiter;
decoding, by the decoder, the encoded primary picture to produce a
decoded primary picture; decoding, by the decoder, the encoded
transparency picture to produce a decoded transparency picture; and
developing, by the decoder, a display image as a function of the
decoded primary picture and the decoded transparency picture.
19. The method of claim 18, wherein the act of receiving further
comprises receiving, by the decoder, an encoded redundant picture
following the encoded primary picture and before the encoded
transparency picture.
20. The method of claim 19, further comprising decoding, by the
decoder, the redundant picture.
21. The method of claim 20, wherein the act of developing further
comprises developing, by the decoder, the display image as a
function of the redundant picture.
22. The method of claim 18, wherein the act of developing
comprises: generating, by the decoder, a background picture; and
developing, by the decoder, the display image as a function of the
decoded primary picture and the background picture and the decoded
transparency picture.
23. The method of claim 18, wherein the act of receiving further
comprises receiving, by the decoder, a unit delimiter NAL before
the encoded primary picture.
Description
[0001] This application is a continuation of, and claims priority
to, U.S. patent application Ser. No. 11/095,013, entitled "Encoding
a Transparency (Alpha) Channel in a Video Bitstream," filed Mar.
31, 2005 and which is hereby incorporated by reference.
BACKGROUND
[0002] Within the last several years, digitization of video images
has become increasingly important. In addition to their use in
global communication (e.g., videoconferencing), digital video
recording (DVDs, SVCDs, PVRs, etc.) has also become increasingly
popular. In each of these applications, video (and accompanying
audio) information is transmitted across communication links or
stored in electronic form.
[0003] Efficient transmission, reception, and storage of video data
typically requires encoding and compression of video (and
accompanying audio) data. Video compression coding is a method of
encoding digital video data such that less memory is required to
store the video data and less transmission bandwidth is required to
transmit the video data. Various compression/decompression (CODEC)
schemes are frequently used to compress video frames to reduce
required transmission bit rates.
[0004] Several approaches and standards to encoding and compressing
source video signals exist. Historically, video compression
standards have been designed for a particular application, such as
ITU-T standards H.261 and H.263, which are used extensively in
video conferencing applications, and the various standards
promulgated by the Moving Picture Experts' Group (e.g., MPEG-1 and
MPEG-2), which are typically used in consumer electronics
applications. With the proliferation of various devices requiring
some form of video compression, harmonization between these two
groups of standards has been sought. To some extent, such
standardization has been achieved by the ITU-T H.264 standard,
which shares various common elements with the Moving Picture
Experts' Group MPEG-4 standard, colloquially known as Advanced
Video Coding or AVC. Each of these standards is incorporated by
reference in its entirety.
[0005] In some cases, it is desirable to construct an image (or
sequence of images) as a composite or an overlay combination of two
different images. One example would be the weatherman on the
nightly news standing in front of the computer-generated weather
map. In this example, the video of the weatherman is recorded in
front of a solid color background, e.g., a blue or green screen. In
the resulting digital video images, the blue or green pixels
(corresponding to the background) are set to have an alpha value
corresponding to complete transparency, while the remaining pixels
(which make up the image of the weatherman himself) have an alpha
value corresponding to complete opacity. This image is then
overlaid onto the computer-generated weather map images. As a
result, the pixels having a fully transparent alpha value (the
background) allow the underlying weather map to show, while the
pixels having a fully opaque alpha value (the weatherman) prevent
the background from showing, and instead show the image of the
weatherman. The result is the familiar image of a weatherman who
appears to be standing in front of a full screen weather map
image.
[0006] There are a variety of other applications of transparency in
image and video processing. A common element of all such
applications, known to those skilled in the art, is that each pixel
has a transparency, or "alpha" value associated with it. The alpha
values are preferably arranged to create a "mask" image, which is
preferably included in a separate channel. As originally drafted,
the AVC standard did not include the capabilities for an alpha
channel. However, to accommodate a wider variety of applications,
it was agreed to add an alpha channel to the standard. However,
accommodating the additional information required extensions to the
standard, and it was desired to do so in a way that provided the
most convenient and efficient processing of video having an alpha
channel while simultaneously not unduly complicating the standard
in the case of video that does not include an alpha channel.
[0007] Therefore, what is needed in the art is an extension to
various video coding standards that allows for the convenient and
efficient transmission of an alpha channel, while still preserving
the optional nature of such a channel and not creating undue
overhead in the video codec.
SUMMARY
[0008] Disclosed herein is a technique for delimiting the alpha
channel at the NAL layer in codecs like H.264 to facilitate the
optional nature of the alpha channel. In coded video sequences that
include alpha, there is one alpha picture for every primary coded
(e.g., luma-chroma) picture, and the coded alpha picture is
contained in the same access unit as its corresponding primary
coded picture. The alpha coded slice NAL units of each access unit
are sent after the NAL units of the primary coded picture and
redundant coded pictures, if any. The presence or absence of the
alpha NAL units does not affect the decoding of the remaining NAL
units in any way.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] FIG. 1 illustrates the sequence of syntax structures for one
coded image sequence according to the teachings of the present
disclosure.
DETAILED DESCRIPTION
[0010] A technique for incorporating an alpha channel in a video
codec is described herein. The following embodiments of the
invention, described in terms of the ITU-T H.264/AVC standard are
illustrative only and should not be considered limiting in any
respect.
[0011] As noted above, a decision has been made by the relevant
standards making bodies to add an optional alpha channel to the
AVC/H.264 standards. Particularly, this optional channel is part of
the AVC/H.264 Fidelity and Range Extensions ("Frext"). The alpha
image or channel (i.e., transparency mask) is encoded using the
standard luma coding algorithm, which is generally known to those
skilled in the art. Disclosed herein is a technique for
incorporating the encoded alpha channel within the video bitstream.
In short, the coded alpha channel is transmitted with the video to
keep the two together for ease of management, transport,
processing, etc.
[0012] An AVC bitstream containing encoded video can be in one of
two formats: the NAL (Network Abstraction Layer) unit stream
format, or the byte stream format. The NAL unit stream format
consists of a sequence of syntax structures called NAL units, each
of which contain an indication of the type of data to follow and
bytes containing that data. The sequence of NAL units is ordered in
processing order for the decoder. Various constraints are imposed
on the decoding order and on the contents of the NAL units by the
standard.
[0013] The byte stream format can be constructed from the NAL unit
stream format by ordering the NAL units in decoding order and
prefixing each NAL unit with a start code prefix and, optionally,
one or more zero-valued bytes to form a stream of bytes.
Conversely, the NAL unit stream format can be extracted from the
byte stream format by searching for the location of the unique
start code prefix pattern within the stream of bytes. In either
case, the start code prefix is a unique sequence of three bytes
having the value 0x000001h. As noted, the location of a start code
prefix can be used by a decoder to identify the beginning of a new
NAL unit and, therefore, the end of a previous NAL unit. To ensure
that a coincidental string of bytes within the NAL unit having a
value equal to 0x000001h is not misinterpreted as a start code,
emulation prevention bytes, which are bytes having a value 0x03h,
may be included within the NAL unit. The byte stream format is
specified in Annex B of H.264.
[0014] The video source represented by the bitstream is a sequence
of frames (for non-interlaced video), fields (for interlaced
video), or both. Collectively, these images are referred to as
pictures. Each picture is comprised of one or more sample arrays.
The sample arrays may be either monochrome (luma only) or color
(luma and chroma, RGB, or other color sampling). The picture may
also include an optional auxiliary array, which is a monochrome
array used for various features, such as alpha blending, as in the
context of the present disclosure. Details of the coding scheme for
the various array types are generally known to those skilled in the
art, and is specified in the various video coding standards, and
are therefore not reproduced here in detail.
[0015] In coded video sequences that include alpha (transparency),
there is one alpha picture for every primary coded (e.g.,
luma-chroma) picture. Preferably, the coded alpha picture is
contained in the same access unit as its corresponding primary
coded picture. Particularly, the alpha channel is delimited at the
NAL layer to facilitate its optional nature. The alpha coded slice
NAL units of each access unit are preferably sent after the NAL
units of the primary coded picture and redundant coded pictures, if
any. (A slice is a group of one or more macroblocks that may be
some fraction of a picture or an entire picture.) The presence or
absence of the alpha NAL units does not affect the decoding of the
remaining NAL units in any way.
[0016] When an alpha coded picture is present, it follows the
primary coded picture, i.e., foreground image and all redundant
coded pictures (if present) of the same access unit. If any access
unit of the sequence contains an alpha coded picture, then each
access unit of the sequence shall contain exactly one alpha coded
picture. NAL units of an alpha coded picture shall be considered
non-VCL NAL units. When an end of sequence NAL unit is present, it
shall follow the primary coded picture and all redundant coded
pictures (if any) and the alpha coded picture (if any) of the same
access unit.
[0017] An example sequence of NAL units is illustrated in FIG. 1.
When an access unit delimiter NAL unit 101 is present, it is the
first NAL unit. There are at most one access unit delimiter in any
access unit. If any SEI (supplemental enhancement information,
defined in Annex D of H.264) NAL units 102 are present, they
precede the primary coded picture 103. The primary coded picture
103 precedes the corresponding redundant coded picture(s) 104.
Following the redundant coded picture(s) 104 are auxiliary coded
picture(s) 105, including an alpha picture. When an end of sequence
NAL unit 106 is present, it follows the primary coded picture, all
redundant coded pictures (if any) and all coded slices of an
auxiliary coded picture (e.g., alpha picture), if any, without
partitioning NAL units. Finally, an end of stream NAL unit 107
follows the end of sequence NAL unit 106.
[0018] Some restrictions are placed on how the alpha pictures are
encoded. For example, if the primary coded picture is an
instantaneous decoding refresh (IDR) picture, then the alpha
picture is inferred to be an alpha IDR picture with the same
identifier ("idr_pic_id" in H.264). An IDR picture is a picture
that is intra coded, i.e. coded only with reference to itself and
for which all subsequent pictures can be decoded without reference
to a picture prior to the IDR picture.
[0019] An additional constraint on alpha coded pictures is that an
alpha picture shall contain all macroblocks of a complete coded
picture. Thus, there is either one complete alpha picture or no
alpha picture for each primary coded picture, or, as an alternative
view, there must be no alpha picture or a complete alpha picture
for each primary coded picture. Furthermore, decoded slices within
the same alpha picture shall cover the entire picture area and
shall not overlap.
[0020] Still another constraint on alpha coded pictures is that the
"redundant_pic_id" variable is equal to 0 in an alpha slice. This
indicates that redundant alpha pictures are never sent.
[0021] Another constraint on the alpha picture is that the chroma
(i.e., color) format of monochrome is inferred for an alpha slice.
As is readily understood, the alpha picture specifies a level of
transparency, and thus there is no color-dependent information in
the alpha picture.
[0022] Yet another constraint on alpha coded pictures is that the
bit depth of the luma component of an alpha slice is specified by
the sequence parameter set extension rather than the sequence
parameter set. This enables the bit depth of the alpha picture to
differ from the bit depth of the luma component of the primary
coded picture.
[0023] In all other ways, the coding of alpha pictures follows the
same constraints specified for redundant coded pictures.
Furthermore, the decoding of alpha pictures is optional. NAL units
having the nal_unit_type corresponding to alpha pictures (i.e., a
nal_unit_type value of 19 for H.264) may be discarded without
affecting the decoding of remaining NAL units.
[0024] Once the encoded pictures are received at the decoder, alpha
composition is normally performed with a background picture B
(which may already be present at the decoder or may be separately
transmitted), a foreground picture F, and an alpha picture A, all
of the same size. For purposes of the following discussion, it is
assumed that the background and foreground pictures are in
luma-chroma format and that the chroma arrays of B and F have been
upsampled to the same resolution as luma. For purposes of the
foregoing, individual samples of B, F, and A are denoted by b, f
and a, respectively. Luma and chroma samples are denoted by
subscripts Y (luma) and Cb, Cr (chroma).
[0025] To reconstruct the image, the variables BitDepthAlpha and
MaxAlpha are defined as follows:
BitDepthAlpha=bit_depth_alpha_minus8+8
MaxAlpha=(1<<BitDepthAlpha)-1
Thus, as is known to those skilled in the art, samples d of the
displayed picture D may be calculated as:
d.sub.y=(a*f.sub.Y+(MaxAlpha-a)*b.sub.Y+MaxAlpha/2)/MaxAlpha
d.sub.CB=(a*f.sub.CB+(MaxAlpha-a)*b.sub.CB+MaxAlpha/2)/MaxAlpha
d.sub.CR=(a*f.sub.CR+(MaxAlpha-a)*b.sub.CR+MaxAlpha/2)/MaxAlpha
[0026] Alternatively, the samples of pictures D, F and B could also
be composed of red, green and blue component values, in which case,
reconstruction of the image would be slightly different, but still
understood by one skilled in the art. Additionally, for purposes of
the foregoing, it has been assumed that each component has the same
bit depth in each of the pictures D, F and B. However, different
components, e.g. Y and Cb, need not have the same bit depth.
[0027] A picture format that is useful for editing or direct
viewing, and that is commonly used, is called pre-multiplied black
video. If the foreground picture was F, then the pre-multiplied
black video S is given by the following:
s.sub.Y=(a*f.sub.Y)/MaxAlpha
s.sub.CB=(a*f.sub.CB)/MaxAlpha
s.sub.CR=(a*f.sub.CR)/MaxAlpha
[0028] Pre-multiplied black video has the characteristic that the
picture S will appear correct if displayed against a black
background. For a non-black background B, the composition of the
displayed picture D may be calculated as:
d.sub.Y=s.sub.Y+((MaxAlpha-a)*b.sub.Y+MaxAlpha/2)/MaxAlpha
d.sub.CB=s.sub.CB+((MaxAlpha-a)*b.sub.CB+MaxAlpha/2)/MaxAlpha
d.sub.CR=s.sub.CR+((MaxAlpha-a)*b.sub.CR+MaxAlpha/2)/MaxAlpha
[0029] Note that if the pre-multiplied black video S is further
processed, for example by filtering or enhancement operations, to
obtain a processed pre-multiplied black video S', then the
possibility of composition overflow must be taken into account. In
particular, the luma of S' should not increase to the point where
the composed luma would overflow, and the chroma of S' should not
increase to the point where the corresponding composed red, green
or blue components would overflow.
[0030] For red, green, blue composition a simple restriction may be
used. Let MaxPel be the maximum value of the red, green, blue
components (also usually equal to the maximum value of luma). Then
any possible overflow of the composition is avoided by enforcing
the following restriction for every sample of each red, green, blue
component:
s'<(a*MaxPel)/MaxAlpha
Another alternative is to wait until the actual composition, and
clip the composed image. All of such techniques are generally
understood to one skilled in the art, as well as other techniques
not specifically mentioned here. It is not intended that the
invention be limited to only the image reconstruction techniques
described herein.
[0031] Furthermore, while the invention has been disclosed with
respect to a limited number of embodiments, numerous modifications
and variations will be appreciated by those skilled in the art. For
example, the invention is not limited to any particular codec,
device, combination of hardware and/or software, nor should it be
considered restricted to either a multi purpose or single purpose
device. It is intended that all such variations and modifications
fall with in the scope of the following claims.
* * * * *