U.S. patent application number 13/550384 was filed with the patent office on 2013-01-17 for signaling picture size in video coding.
This patent application is currently assigned to QUALCOMM INCORPORATED. The applicant listed for this patent is Ying Chen, Marta Karczewicz, Ye-Kui Wang. Invention is credited to Ying Chen, Marta Karczewicz, Ye-Kui Wang.
Application Number | 20130016769 13/550384 |
Document ID | / |
Family ID | 47518908 |
Filed Date | 2013-01-17 |
United States Patent
Application |
20130016769 |
Kind Code |
A1 |
Chen; Ying ; et al. |
January 17, 2013 |
SIGNALING PICTURE SIZE IN VIDEO CODING
Abstract
A video encoder is configured to determine a picture size for
one or more pictures included in a video sequence. The picture size
associated with the video sequence may be a multiple of an aligned
coding unit size for the video sequence. In one example, the
aligned coding unit size for the video sequence may comprise a
minimum coding unit size where the minimum coding unit size is
selected from a plurality of smallest coding unit sizes
corresponding to different pictures in the video sequence. A video
decoder is configured to obtain syntax elements to determine the
picture size and the aligned coding unit size for the video
sequence. The video decoder decodes the pictures included in the
video sequence with the picture size, and stores the decoded
pictures in a decoded picture buffer.
Inventors: |
Chen; Ying; (San Diego,
CA) ; Karczewicz; Marta; (San Diego, CA) ;
Wang; Ye-Kui; (San Diego, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Chen; Ying
Karczewicz; Marta
Wang; Ye-Kui |
San Diego
San Diego
San Diego |
CA
CA
CA |
US
US
US |
|
|
Assignee: |
QUALCOMM INCORPORATED
San Diego
CA
|
Family ID: |
47518908 |
Appl. No.: |
13/550384 |
Filed: |
July 16, 2012 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61508659 |
Jul 17, 2011 |
|
|
|
61530819 |
Sep 2, 2011 |
|
|
|
61549480 |
Oct 20, 2011 |
|
|
|
Current U.S.
Class: |
375/240.02 ;
375/E7.132 |
Current CPC
Class: |
H04N 19/157 20141101;
H04N 19/172 20141101; H04N 19/177 20141101; H04N 19/46 20141101;
H04N 19/122 20141101; H04N 19/96 20141101 |
Class at
Publication: |
375/240.02 ;
375/E07.132 |
International
Class: |
H04N 7/26 20060101
H04N007/26 |
Claims
1. A method of encoding video data comprising: determining a
smallest coding unit size for each of a plurality of pictures
defining a video sequence, wherein a smallest coding unit size is
selected from a plurality of possible coding unit sizes including a
maximum possible coding unit size; determining an aligned coding
unit size for the video sequence based on the plurality of possible
coding unit sizes; determining a picture size associated with the
video sequence, wherein the picture size associated with the video
sequence is an multiple of the aligned coding unit size; and
signaling the aligned coding unit size value in sequence level
syntax information.
2. The method of claim 1, wherein the aligned coding unit size is
the maximum possible coding unit size.
3. The method of claim 2, wherein the plurality of possible coding
unit sizes includes a maximum coding unit size of 64.times.64
pixels.
4. The method of claim 1, wherein determining a smallest coding
unit size for each of the plurality of pictures includes
determining a first smallest coding unit size of 4.times.4 pixels
for a first picture and determining a second smallest coding unit
size of 8.times.8 pixels for a second picture; and wherein the
aligned coding unit size for the video sequence is 4.times.4
pixels.
5. The method of claim 1, wherein determining a smallest coding
unit size for each of the plurality of pictures includes
determining a first smallest coding unit size of 4.times.4 pixels
for a first picture and determining a second smallest coding unit
size of 8.times.8 pixels for a second picture; and wherein the
aligned coding unit size for the video sequence is 8.times.8
pixels.
6. The method of claim 1, wherein the picture size specifies a
picture size of a decoded picture stored in a decoded picture
buffer.
7. A device configured to encode video data comprising: means for
determining a smallest coding unit size for each of a plurality of
pictures defining a video sequence, wherein a smallest coding unit
size is selected from a plurality of possible coding unit sizes
including a maximum possible coding unit size; means for
determining an aligned coding unit size for the video sequence
based on the plurality of possible coding unit sizes; means for
determining a picture size associated with the video sequence,
wherein the picture size associated with the video sequence is an
multiple of the aligned coding unit size; and means for signaling
the aligned coding unit size value in sequence level syntax
information.
8. The device of claim 7, wherein the aligned coding unit size is
the maximum possible coding unit size.
9. The method of claim 7, wherein the plurality of possible coding
unit sizes includes a maximum coding unit size of 64.times.64
pixels.
10. The device of claim 7, wherein determining a smallest coding
unit size for each of a plurality of pictures includes determining
a first smallest coding unit size of 4.times.4 pixels for a first
picture and determining a second smallest coding unit size of
8.times.8 pixels for a second picture; and wherein the aligned
coding unit size for the video sequence is 4.times.4 pixels.
11. The device of claim 7, wherein determining a smallest coding
unit size for each of a plurality of pictures includes determining
a first smallest coding unit size of 4.times.4 pixels for a first
picture and determining a second smallest coding unit size of
8.times.8 pixels for a second picture; and wherein the aligned
coding unit size for the video sequence is 8.times.8 pixels.
12. The device of claim 7, wherein the picture size specifies a
picture size of a decoded picture stored in a decoded picture
buffer.
13. A device comprising a video encoder configured to: determine a
smallest coding unit size for each of a plurality of pictures
defining a video sequence, wherein a smallest coding unit size is
selected from a plurality of possible coding unit sizes including a
maximum possible coding unit size; determine an aligned coding unit
size for the video sequence based on the plurality of possible
coding unit sizes; determine a picture size associated with the
video sequence, wherein the picture size associated with the video
sequence is an multiple of the aligned coding unit size; and signal
the aligned coding unit size value in sequence level syntax
information.
14. The device of claim 13, wherein the aligned coding unit size is
the maximum possible coding unit size.
15. The device of claim 14, wherein the plurality of possible
coding unit sizes includes a maximum coding unit size of
64.times.64 pixels.
16. The device of claim 13, wherein determining a smallest coding
unit size for each of a plurality of pictures includes determining
a first smallest coding unit size of 4.times.4 pixels for a first
picture and determining a second smallest coding unit size of
8.times.8 pixels for a second picture; and wherein the aligned
coding unit size for the video sequence is 4.times.4 pixels.
17. The device of claim 13, wherein determining a smallest coding
unit size for each of a plurality of pictures includes determining
a first smallest coding unit size of 4.times.4 pixels for a first
picture and determining a second smallest coding unit size of
8.times.8 pixels for a second picture; and wherein the aligned
coding unit size for the video sequence is 8.times.8 pixels.
18. The device of claim 13, wherein the picture size specifies a
picture size of a decoded picture stored in a decoded picture
buffer.
19. A computer readable medium comprising instructions stored
thereon that when executed cause a processor to: determine a
smallest coding unit size for each of a plurality of pictures
defining a video sequence, wherein a smallest coding unit size is
selected from a plurality of possible coding unit sizes including a
maximum possible coding unit size; determine an aligned coding unit
size for the video sequence based on the plurality of possible
coding unit sizes; determine a picture size associated with the
video sequence, wherein the picture size associated with the video
sequence is an multiple of the aligned coding unit size; and signal
the aligned coding unit size value in sequence level syntax
information.
20. The computer readable medium of claim 19, wherein the aligned
coding unit size is the maximum possible coding unit size.
21. The computer readable medium of claim 20, wherein the plurality
of possible coding unit sizes includes a maximum coding unit size
of 64.times.64 pixels.
22. The computer readable medium of claim 19, wherein determining a
smallest coding unit size for each of a plurality of pictures
includes determining a first smallest coding unit size of 4.times.4
pixels for a first picture and determining a second smallest coding
unit size of 8.times.8 pixels for a second picture; and wherein the
minimum coding unit size for the video sequence is 4.times.4
pixels.
23. The computer readable medium of claim 19, wherein determining a
smallest coding unit size for each of a plurality of pictures
includes determining a first smallest coding unit size of 4.times.4
pixels for a first picture and determining a second smallest coding
unit size of 8.times.8 pixels for a second picture; and wherein the
minimum coding unit size for the video sequence is 8.times.8
pixels.
24. The device of claim 19, wherein the picture size specifies a
picture size of a decoded picture stored in a decoded picture
buffer.
25. A method of decoding video data comprising: obtaining a coded
video sequence including a first picture coded using a first
smallest coding unit size and a second picture coded using second
smallest coding unit size; obtaining a picture size of a decoded
picture to be stored in a decoded picture buffer wherein the
picture size is a multiple of one of the first coding unit size,
the second coding unit size, or a maximum coding unit size; and
storing the decoded picture in a decoded picture buffer.
26. The method of claim 25, wherein the first smallest coding unit
size is 4.times.4 pixels and the second coding unit size is
8.times.8 pixels and the pictures size is a multiple of the first
coding unit size.
27. The method of claim 25, wherein the first smallest coding unit
size is 4.times.4 pixels and the second coding unit size is
8.times.8 pixels and the pictures size is a multiple of the second
coding unit size.
28. The method of claim 25, wherein the maximum coding unit size is
64.times.64 pixels and the pictures size is a multiple of the
maximum coding unit size.
29. A device configured to decode video data comprising: means for
obtaining a coded video sequence including a first picture coded
using a first smallest coding unit size and a second picture coded
using second smallest coding unit size; means for obtaining a
picture size of a decoded picture to be stored in a decoded picture
buffer wherein the picture size is a multiple of one of the first
coding unit size, the second coding unit size, or a maximum coding
unit size; and means for storing the decoded picture in a decoded
picture buffer.
30. The device of claim 29, wherein the first smallest coding unit
size is 4.times.4 pixels and the second coding unit size is
8.times.8 pixels and the pictures size is a multiple of the first
coding unit size.
31. The device of claim 29, wherein the first smallest coding unit
size is 4.times.4 pixels and the second coding unit size is
8.times.8 pixels and the pictures size is a multiple of the second
coding unit size.
32. The device of claim 29, wherein the maximum coding unit size is
64.times.64 pixels and the pictures size is a multiple of the
maximum coding unit size.
33. A device comprising a video decoder configured to: obtain a
coded video sequence including a first picture coded using a first
smallest coding unit size and a second picture coded using second
smallest coding unit size; obtain a picture size of a decoded
picture to be stored in a decoded picture buffer wherein the
picture size is a multiple of one of the first coding unit size,
the second coding unit size, or a maximum coding unit size; and
store the decoded picture in a decoded picture buffer.
34. The device of claim 33, wherein the first smallest coding unit
size is 4.times.4 pixels and the second coding unit size is
8.times.8 pixels and the pictures size is a multiple of the first
coding unit size.
35. The device of claim 33, wherein the first smallest coding unit
size is 4.times.4 pixels and the second coding unit size is
8.times.8 pixels and the pictures size is a multiple of the second
coding unit size.
36. The device of claim 33, wherein the maximum coding unit size is
64.times.64 pixels and the pictures size is a multiple of the
maximum coding unit size.
37. A computer readable medium comprising instructions stored
thereon that when executed cause a processor to: obtain a coded
video sequence including a first picture coded using a first
smallest coding unit size and a second picture coded using second
smallest coding unit size; obtain a picture size of a decoded
picture to be stored in a decoded picture buffer wherein the
picture size is a multiple of one of the first coding unit size,
the second coding unit size, or a maximum coding unit size; and
store the decoded picture in a decoded picture buffer.
38. The computer readable medium of claim 37, wherein the first
smallest coding unit size is 4.times.4 pixels and the second coding
unit size is 8.times.8 pixels and the pictures size is a multiple
of the first coding unit size.
39. The computer readable medium of claim 37, wherein the first
smallest coding unit size is 4.times.4 pixels and the second coding
unit size is 8.times.8 pixels and the pictures size is a multiple
of the first coding unit size.
40. The computer readable medium of claim 37, wherein the maximum
coding unit size is 64.times.64 pixels and the pictures size is a
multiple of the maximum coding unit size.
Description
[0001] This application claims the benefit of:
[0002] U.S. Provisional Application No. 61/508,659, filed Jul. 17,
2011;
[0003] U.S. Provisional Application No. 61/530,819, filed Sep. 2,
2011; and
[0004] U.S. Provisional Application No. 61/549,480 filed Oct. 20,
2011, each of which are hereby incorporated by reference in their
entirety.
TECHNICAL FIELD
[0005] This disclosure relates to the field of video coding.
BACKGROUND
[0006] Digital video capabilities can be incorporated into a wide
range of devices, including digital televisions, digital direct
broadcast systems, wireless broadcast systems, personal digital
assistants (PDAs), laptop or desktop computers, tablet computers,
e-book readers, digital cameras, digital recording devices, digital
media players, video gaming devices, video game consoles, cellular
or satellite radio telephones, so-called "smart phones," video
teleconferencing devices, video streaming devices, and the like.
Digital video devices implement video compression techniques, such
as those described in the standards defined by MPEG-2, MPEG-4,
ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding
(AVC), the High Efficiency Video Coding (HEVC) standard presently
under development, and extensions of such standards. Video devices
may transmit, receive, encode, decode, and/or store digital video
information more efficiently by implementing such video compression
techniques.
[0007] Video compression techniques perform spatial (intra-picture)
prediction and/or temporal (inter-picture) prediction to reduce or
remove redundancy inherent in video sequences. For block-based
video coding, a video slice (i.e., a video frame or a portion of a
video frame) may be partitioned into video blocks, which may also
be referred to as treeblocks, coding units (CUs) and/or coding
nodes. Video blocks in an intra-coded (I) slice of a picture are
encoded using spatial prediction with respect to reference samples
in neighboring blocks in the same picture. Video blocks in an
inter-coded (P or B) slice of a picture may use spatial prediction
with respect to reference samples in neighboring blocks in the same
picture or temporal prediction with respect to reference samples in
other reference pictures. Pictures may be referred to as frames,
and reference pictures may be referred to a reference frames.
[0008] Spatial or temporal prediction results in a predictive block
for a block to be coded. Residual data represents pixel differences
between the original block to be coded and the predictive block. An
inter-coded block is encoded according to a motion vector that
points to a block of reference samples forming the predictive
block, and the residual data indicating the difference between the
coded block and the predictive block. An intra-coded block is
encoded according to an intra-coding mode and the residual data.
For further compression, the residual data may be transformed from
the pixel domain to a transform domain, resulting in residual
transform coefficients, which then may be quantized. The quantized
transform coefficients, initially arranged in a two-dimensional
array, may be scanned in order to produce a one-dimensional vector
of transform coefficients, and entropy coding may be applied to
achieve even more compression.
SUMMARY
[0009] In general, this disclosure describes techniques for coding
video data included in pictures or frames of a video sequence. In
particular, this disclosure describes techniques where a picture
size for a group of pictures in the video sequence may be coded
based on an aligned coding unit size for the video sequence. The
aligned coding unit size for the video sequence may be selected
from several possible coding unit sizes supported by the video
coding scheme. The techniques of this disclosure include signaling
an aligned coding unit size for one or more of the pictures in the
video sequence, and coding a size for the one or more pictures as a
multiple of the smallest coding unit.
[0010] In one example of the disclosure, a method for encoding
video data comprises determining a smallest coding unit size for
each of a plurality of pictures defining a video sequence, wherein
a smallest coding unit size is selected from a plurality of
possible coding unit sizes including a maximum possible coding unit
size; determining an aligned coding unit size for the video
sequence based on the plurality of possible coding unit sizes;
determining a picture size associated with the video sequence,
wherein the picture size associated with the video sequence is an
multiple of the aligned coding unit size; and signaling the aligned
coding unit size value in sequence level syntax information.
[0011] In another example, a method of decoding video data
comprises obtaining a coded video sequence including a first
picture coded using a first smallest coding unit size and a second
picture coded using second smallest coding unit size; obtaining a
picture size of a decoded picture to be stored in a decoded picture
buffer wherein the picture size is a multiple of one of the first
coding unit size, the second coding unit size, or a maximum coding
unit size; and storing the decoded picture in a decoded picture
buffer.
[0012] In another example, an apparatus for encoding video data
comprises a video encoding device configured to determine a
smallest coding unit size for each of a plurality of pictures
defining a video sequence, wherein a smallest coding unit size is
selected from a plurality of possible coding unit sizes including a
maximum possible coding unit size; determine an aligned coding unit
size for the video sequence based on the plurality of possible
coding unit sizes; determine a picture size associated with the
video sequence, wherein the picture size associated with the video
sequence is an multiple of the aligned coding unit size; and signal
the aligned coding unit size value in sequence level syntax
information.
[0013] In another example, an apparatus for decoding video data
comprises a video decoding device configured to obtain a coded
video sequence including a first picture coded using a first
smallest coding unit size and a second picture coded using second
smallest coding unit size; obtain a picture size of a decoded
picture to be stored in a decoded picture buffer wherein the
picture size is a multiple of one of the first coding unit size,
the second coding unit size, or a maximum coding unit size; and
store the decoded picture in a decoded picture buffer.
[0014] In another example, a device for encoding video data
comprises means for determining a smallest coding unit size for
each of a plurality of pictures defining a video sequence, wherein
a smallest coding unit size is selected from a plurality of
possible coding unit sizes including a maximum possible coding unit
size; means for determining an aligned coding unit size for the
video sequence based on the plurality of possible coding unit
sizes; means for determining a picture size associated with the
video sequence, wherein the picture size associated with the video
sequence is an multiple of the aligned coding unit size; and means
for signaling the aligned coding unit size value in sequence level
syntax information.
[0015] In another example, a device for decoding video data
comprises means for obtaining a coded video sequence including a
first picture coded using a first smallest coding unit size and a
second picture coded using second smallest coding unit size; means
for obtaining a picture size of a decoded picture to be stored in a
decoded picture buffer wherein the picture size is a multiple of
one of the first coding unit size, the second coding unit size, or
a maximum coding unit size; and means storing the decoded picture
in a decoded picture buffer.
[0016] In another example, a computer-readable storage medium
comprises instructions stored thereon that, when executed, cause a
processor of a device for encoding video data to determine a
smallest coding unit size for each of a plurality of pictures
defining a video sequence, wherein a smallest coding unit size is
selected from a plurality of possible coding unit sizes including a
maximum possible coding unit size; determine an aligned coding unit
size for the video sequence based on the plurality of possible
coding unit sizes; determine a picture size associated with the
video sequence, wherein the picture size associated with the video
sequence is an multiple of the aligned coding unit size; and signal
the aligned coding unit size value in sequence level syntax
information.
[0017] In another example, a computer-readable storage medium
comprises instructions stored thereon that, when executed, cause a
processor of a device for decoding video data to obtain a coded
video sequence including a first picture coded using a first
smallest coding unit size and a second picture coded using second
smallest coding unit size; obtain a picture size of a decoded
picture to be stored in a decoded picture buffer wherein the
picture size is a multiple of one of the first coding unit size,
the second coding unit size, or a maximum coding unit size; and
store the decoded picture in a decoded picture buffer.
[0018] The details of one or more examples are set forth in the
accompanying drawings and the description below. Other features,
objects, and advantages will be apparent from the description and
drawings, and from the claims.
BRIEF DESCRIPTION OF DRAWINGS
[0019] FIG. 1 is a block diagram illustrating an example video
encoding and decoding system.
[0020] FIG. 2 is a block diagram illustrating an example video
encoder that may implement the techniques described in this
disclosure.
[0021] FIG. 3 is a flowchart illustrating an example technique for
encoding video data according to the techniques of this
disclosure.
[0022] FIG. 4 is a block diagram illustrating an example video
decoder that may implement the techniques described in this
disclosure.
[0023] FIG. 5 is a flowchart illustrating an example technique for
decoding video data according to the techniques of this
disclosure.
DETAILED DESCRIPTION
[0024] A video sequence may include a group of pictures. Each
picture in the group of pictures may have a smallest coding unit
size. In one example, the smallest coding unit size may be a
rectangle or square with one of the following pixel or sample
dimensions: four pixels, eight pixels, 16 pixels, 32 pixels, and 64
pixels. In order to increase coding efficiency of the video
sequence, it may be useful to determine the smallest coding unit
size for the video sequence and specify a picture size for the
group of pictures where the picture size is a multiple of the
minimum of the smallest coding unit size for the video
sequence.
[0025] FIG. 1 is a block diagram illustrating one example of a
video encoding and decoding system 10 that may implement techniques
of this disclosure. As shown in FIG. 1, system 10 includes a source
device 12 that transmits encoded video to a destination device 16
via a communication channel 15. Source device 12 and destination
device 16 may comprise any of a wide range of devices. In some
cases, source device 12 and destination device 16 may comprise
wireless communication device handsets, such as so-called cellular
or satellite radiotelephones. The techniques of this disclosure,
however, which apply generally to the encoding and decoding may be
applied to non-wireless devices including video encoding and/or
decoding capabilities. Source device 12 and destination device 16
are merely examples of coding devices that can support the
techniques described herein.
[0026] In the example of FIG. 1, source device 12 may include a
video source 20, a video encoder 22, a modulator/demodulator
(modem) 23 and a transmitter 24. Destination device 16 may include
a receiver 26, a modem 27, a video decoder 28, and a display device
30.
[0027] Video source 20 may comprise a video capture device, such as
a video camera, a video archive containing previously captured
video, a video feed from a video content provider or another source
of video. As a further alternative, video source 20 may generate
computer graphics-based data as the source video, or a combination
of live video, archived video, and computer-generated video. In
some cases, if video source 20 is a video camera, source device 12
and destination device 16 may form so-called camera phones or video
phones. In each case, the captured, pre-captured or
computer-generated video may be encoded by video encoder 22.
[0028] In some examples (but not all cases), once the video data is
encoded by video encoder 22, the encoded video information may then
be modulated by modem 23 according to a communication standard,
e.g., such as code division multiple access (CDMA), orthogonal
frequency division multiplexing (OFDM) or any other communication
standard or technique. The encoded and modulated data can then be
transmitted to destination device 16 via transmitter 24. Modem 23
may include various mixers, filters, amplifiers or other components
designed for signal modulation. Transmitter 24 may include circuits
designed for transmitting data, including amplifiers, filters, and
one or more antennas. Receiver 26 of destination device 16 receives
information over channel 15, and modem 27 demodulates the
information. The video decoding process performed by video decoder
28 may include reciprocal techniques to the encoding techniques
performed by video encoder 22.
[0029] Communication channel 15 may comprise any wireless or wired
communication medium, such as a radio frequency (RF) spectrum or
one or more physical transmission lines, or any combination of
wireless and wired media. Communication channel 15 may form part of
a packet-based network, such as a local area network, a wide-area
network, or a global network such as the Internet. Communication
channel 15 generally represents any suitable communication medium,
or collection of different communication media, for transmitting
video data from source device 12 to destination device 16. Again,
FIG. 1 is merely on example and the techniques of this disclosure
may apply to video coding settings (e.g., video encoding or video
decoding) that do not necessarily include any data communication
between the encoding and decoding devices. In other examples, data
could be retrieved from a local memory, streamed over a network, or
the like. An encoding device may encode and store data to memory,
and/or a decoding device may retrieve and decode data from memory.
In many cases, the encoding and decoding is performed by unrelated
devices that do not communicate with one another, but simply encode
data to memory and/or retrieve and decode data from memory. For
example, after video data has been encoded, the video data may be
packetized for transmission or storage. The video data may be
assembled into a video file conforming to any of a variety of
standards, such as the International Organization for
Standardization (ISO) base media file format and extensions
thereof, such as AVC.
[0030] In some cases, video encoder 22 and video decoder 28 may
operate substantially according to a video compression standard
such as the emerging HEVC standard. However, the techniques of this
disclosure may also be applied in the context of a variety of other
video coding standards, including some old standards, or new or
emerging standards. Although not shown in FIG. 1, in some cases,
video encoder 22 and video decoder 28 may each be integrated with
an audio encoder and decoder, and may include appropriate MUX-DEMUX
units, or other hardware and software, to handle encoding of both
audio and video in a common data stream or separate data streams.
If applicable, MUX-DEMUX units may conform to the ITU H.223
multiplexer protocol, or other protocols such as the user datagram
protocol (UDP).
[0031] Video encoder 22 and video decoder 28 each may be
implemented as one or more microprocessors, digital signal
processors (DSPs), application specific integrated circuits
(ASICs), field programmable gate arrays (FPGAs), discrete logic,
software, hardware, firmware or combinations thereof. Each of video
encoder 22 and video decoder 28 may be included in one or more
encoders or decoders, either of which may be integrated as part of
a combined encoder/decoder (CODEC) in a respective mobile device,
subscriber device, broadcast device, server, or the like. In this
disclosure, the term coder refers to an encoder, a decoder, or
CODEC, and the terms coder, encoder, decoder and CODEC all refer to
specific machines designed for the coding (encoding and/or
decoding) of video data consistent with this disclosure. In this
disclosure, the term "coding" may refer to either or both of
encoding and/or decoding.
[0032] In some cases, source device 12 and destination device 16
may operate in a substantially symmetrical manner. For example,
each of source device 12 and destination device 16 may include
video encoding and decoding components. Hence, system 10 may
support one-way or two-way video transmission between source device
12 and destination device 16, e.g., for video streaming, video
playback, video broadcasting, or video telephony.
[0033] Video encoder 22 and video decoder 28 may perform predictive
coding in which a video block being coded is compared to one or
more predictive candidates in order to identify a predictive block.
Video blocks may exist within individual video frames or pictures
(or other independently defined units of video, such as slices).
Frames, slices, portions of frames, groups of pictures, or other
data structures may be defined as units of video information that
include video blocks. The process of predictive coding may be intra
(in which case the predictive data is generated based on
neighboring intra data within the same video frame or slice) or
inter (in which case the predictive data is generated based on
video data in previous or subsequent frames or slices). Video
encoder 22 and video decoder 28 may support several different
predictive coding modes. Video encoder 22 may select a desirable
video coding mode. In predictive coding, after a predictive block
is identified, the differences between the current video block
being coded and the predictive block are coded as a residual block,
and prediction syntax (such as a motion vector in the case of inter
coding, or a predictive mode in the case of intra coding) is used
to identify the predictive block. In some cases, the residual block
may be transformed and quantized. Transform techniques may comprise
a DCT process or conceptually similar process, integer transforms,
wavelet transforms, or other types of transforms. In a DCT process,
as an example, the transform process converts a set of pixel values
(e.g., residual pixel values) into transform coefficients, which
may represent the energy of the pixel values in the frequency
domain. Video encoder 22 and video decoder 28 may apply
quantization to the transform coefficients. Quantization generally
involves a process that limits the number of bits associated with
any given transform coefficient.
[0034] Following transform and quantization, video encoder 22 and
video decoder 28 may perform entropy coding on the quantized and
transformed residual video blocks. Video encoder 22 may generate
syntax elements as part of the encoding process to be used by video
decoder 28 in the decoding process. Video encoder 22 may also
entropy encode syntax elements and include syntax elements in the
encoded bitstream. In general, entropy coding comprises one or more
processes that collectively compress a sequence of quantized
transform coefficients and/or other syntax information. Video
encoder 22 and video decoder 28 may perform scanning techniques on
the quantized transform coefficients in order to define one or more
serialized one-dimensional vectors of coefficients from
two-dimensional video blocks. The scanned coefficients may then
entropy coded along with any syntax information, e.g., via content
adaptive variable length coding (CAVLC), context adaptive binary
arithmetic coding (CABAC), or another entropy coding process.
[0035] In some examples, as part of the encoding process, video
encoder 22 may decode encoded video blocks in order to generate the
video data that is used for subsequent prediction-based coding of
subsequent video blocks. This is often referred to as a decoding
loop of the encoding process, and generally mimics the decoding
that is performed by a decoder device. In the decoding loop of an
encoder or a decoder, filtering techniques may be used to improve
video quality, and e.g., smooth pixel boundaries and possibly
remove artifacts from decoded video. This filtering may be in-loop
or post-loop. With in-loop filtering, the filtering of
reconstructed video data occurs in the coding loop, which means
that the filtered data is stored by an encoder or a decoder for
subsequent use in the prediction of subsequent image data. In
contrast, with post-loop filtering the filtering of reconstructed
video data occurs out of the coding loop, which means that
unfiltered versions of the data are stored by an encoder or a
decoder for subsequent use in the prediction of subsequent image
data. The loop filtering often follows a separate deblock filtering
process, which typically applies filtering to pixels that are on or
near boundaries of adjacent video blocks in order to remove
blockiness artifacts that manifest at video block boundaries.
[0036] Efforts are currently in progress to develop a new video
coding standard, currently referred to as High Efficiency Video
Coding (HEVC). The upcoming standard is also referred to as H.265.
A recent draft of the HEVC standard, referred to as "HEVC Working
Draft 3" or "WD3," is described in document JCTVC-E603, Wiegand et
al., "High efficiency video coding (HEVC) text specification draft
3," Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16
WP3 and ISO/IEC JTC1/SC29/WG11, 5th Meeting: Geneva, CH, 16-23
Mar., 2011, which is hereby incorporated by reference in its
entirety. The standardization efforts are based on a model of a
video coding device referred to as the HEVC Test Model (HM). The HM
presumes several capabilities of video coding devices over devices
configured to code video data according to ITU-T H.264/AVC. For
example, whereas H.264 provides nine intra-prediction encoding
modes, HM provides as many as thirty-four intra-prediction encoding
modes. Video encoder 22 may operate on blocks of video data
consistent with the HEVC standard and HEVC Test Model.
[0037] The HEVC standard includes specific terms and block sizes
for blocks of video data. In particular, HEVC includes the terms
largest coding unit (LCU), coding unit (CU), prediction unit (PU),
and transform unit (TU). LCUs, CUs, PUs, and TUs are all video
blocks within the meaning of this disclosure. This disclosure also
uses the term block to refer to any of a LCU, CU, PU, or TU. In
HEVC, syntax elements may be defined at the LCU level, the CU
level, the PU level and the TU level. In HEVC, an LCU refers to the
largest sized coding unit which is a largest coding unit in terms
of number of pixels supported in a given situation. In general, in
HEVC a CU has a similar purpose to a macroblock of H.264, except
that a CU does not have a size distinction. Thus, a CU may be split
into sub-CUs and an LCU may be partitioned into smaller CUs.
Further, the CUs may be partitioned into prediction units (PUs) for
purposes of prediction. A PU may represent all or a portion of the
corresponding CU, and may include data for retrieving a reference
sample for the PU. PUs may have square or rectangular shapes. TUs
represent a set of pixel difference values or pixel residuals that
may be transformed to produce transform coefficients, which may be
quantized. Transforms are not fixed in the HEVC standard, but are
defined according to transform unit (TU) sizes, which may be the
same size as a given CU, or possibly smaller.
[0038] In HEVC, an LCU may be associated with a quadtree data
structure. Further, in some examples residual samples corresponding
to a CU may be subdivided into smaller units using a quadtree
partitioning scheme which includes a quadtree structure known as
"residual quad tree" (RQT). In general, a quadtree data structure
includes one node per CU, where a root node may correspond to the
LCU. For example, CU.sub.0 may refer to the LCU, and CU.sub.1
through CU.sub.4 may comprise sub-CUs of the LCU. If a CU is split
into four sub-CUs, the node corresponding to the CU includes four
leaf nodes, each of which corresponds to one of the sub-CUs. Each
node of the quadtree data structure may provide syntax data for the
corresponding CU. For example, a node in the quadtree may include a
split flag in the CU-level syntax to indicate whether the CU
corresponding to the node is split into sub-CUs. Syntax elements
for a CU may be defined recursively, and may depend on whether the
CU is split into sub-CUs. If a CU is not split further, it is
referred as a leaf-CU. In this disclosure, four sub-CUs of a
leaf-CU may also be referred to as leaf-CUs although there is no
explicit splitting of the original leaf-CU. For example, if a CU at
16.times.16 size is not split further, the four 8.times.8 sub-CUs
will also be referred to as leaf-CUs although the 16.times.16 CU
was never split.
[0039] The leaf nodes or leaf-CUs of the RQT may correspond to TUs.
That is, a leaf-CU may include a quadtree indicating how the
leaf-CU is partitioned into TUs. A leaf-CU may include one or more
transform units (TUs). This disclosure may refer to the quadtree
indicating how an LCU is partitioned as a CU quadtree and the
quadtree indicating how a leaf-CU is partitioned into TUs as a TU
quadtree. The root node of a TU quadtree generally corresponds to a
leaf-CU, while the root node of a CU quadtree generally corresponds
to an LCU. TUs of the TU quadtree that are not split are referred
to as leaf-TUs. A split flag may indicate whether a leaf-CU is
split into four transform units. Then, each transform unit may be
split further into four sub TUs. When a TU is not split further, it
may be referred to as a leaf-TU.
[0040] Further, the leaf nodes or leaf-CUs may include one or more
prediction units (PUs). For example, when the PU is inter-mode
encoded, the PU may include data defining a motion vector for the
PU. The data defining the motion vector may describe, for example,
a horizontal component of the motion vector, a vertical component
of the motion vector, a resolution for the motion vector (e.g.,
one-quarter pixel precision or one-eighth pixel precision), a
reference frame to which the motion vector points, and/or a
reference list (e.g., list 0 or list 1) for the motion vector. Data
for the leaf-CU defining the PU(s) may also describe, for example,
partitioning of the CU into one or more PUs. Partitioning modes may
differ depending on whether the CU is uncoded, intra-prediction
mode encoded, or inter-prediction mode encoded. For intra coding, a
PU may be treated the same as a leaf transform unit described
below.
[0041] Generally, for intra coding in HEVC, all the leaf-TUs
belonging to a leaf-CU share the same intra prediction mode. That
is, the same intra-prediction mode is generally applied to
calculate predicted values for all TUs of a leaf-CU. For intra
coding, video encoder 22 may calculate a residual value for each
leaf-TU using the intra prediction mode, as a difference between
the portion of the predictive values corresponding to the TU and
the original block. The residual value may be transformed,
quantized, and scanned. For inter coding in HEVC, video encoder 22
may perform prediction at the PU level and may calculate a residual
for each PU. The residual values corresponding to a leaf-CU may be
transformed, quantized, and scanned. For inter coding, a leaf-TU
may be larger or smaller than a PU. For intra coding, a PU may be
collocated with a corresponding leaf-TU. In some examples, the
maximum size of a leaf-TU may be the size of the corresponding
leaf-CU.
[0042] As described above, the HEVC standard allows for
transformations according to transformation units (TUs), which may
be different for different CUs. The TUs are typically sized based
on the size of PUs within a given CU defined for a partitioned LCU,
although this may not always be the case. The TUs are typically the
same size or smaller than the PUs. Pixel difference values
associated with the TUs may be transformed to produce transform
coefficients, which may be quantized. Further, quantization may be
applied according to a quantization parameter (QP) defined at the
LCU level. Accordingly, the same level of quantization may be
applied to all transform coefficients in the TUs associated with
different PUs of CUs within an LCU. However, rather than signal the
QP itself, a change or difference (i.e., a delta) in the QP may be
signaled with the LCU to indicate the change in QP relative to that
of a previous LCU.
[0043] Video encoder 22 may perform video encoding of pictures,
frames, slices, portions of frames, groups of pictures, or other
video data by using LCUs, CUs, PUs and TUs defined according to the
HEVC standard as units of video coding information.
[0044] For example, video encoder 22 may encode one or more
pictures of video data comprising largest coding units (LCUs),
wherein the LCUs are partitioned into a set of block-sized coded
units (CUs) according to a quadtree partitioning scheme. Video
encoder 22 and video decoder 28 may use CUs that have varying sizes
consistent with the HEVC standard. For example, video encoder 22
may use possible CU sizes of 64.times.64, 32.times.32, 16.times.16,
8.times.8 and 4.times.4 pixels. For a given video sequence, video
encoder 22 may use a maximum CU size of 64.times.64 pixels for all
pictures in the video sequence while some pictures in the video
sequence may be encoded using a smallest possible CU size of
4.times.4 pixels while other pictures in the video sequence may be
encoded using a smallest CU size of 8.times.8 pixels.
[0045] As described above, references in this disclosure to a CU
may refer to a largest coding unit of a picture or a sub-CU of an
LCU. Video encoder 22 may split an LCU into sub-CUs, and each
sub-CU may be further split into sub-CUs. Video encoder 22 may
include syntax data for a bitstream defined to indicate a maximum
number of times an LCU is split. The number of time a LCU is split
may be referred to as CU depth.
[0046] Further, video encoder 22 may also define a smallest coding
unit (SCU) for each picture in a video sequence. An SCU may refer
to the smallest coding unit size used to code a picture when
several possible CU sizes are available. For example, video encoder
22 may be configured to use one of possible CU sizes 64.times.64,
32.times.32, 16.times.16, 8.times.8 and 4.times.4 pixels to encode
pictures in a video sequence. In one example, all pictures in the
video sequence may be encoded using the same SCU size, e.g.,
4.times.4 pixels or 8.times.8 pixels. In other examples, some
pictures in the video sequence may be encoded using a SCU size of
4.times.4 pixels while other pictures in the video sequence may be
encoded using a SCU size of 8.times.8 pixels. Thus, in this
example, pictures in the video sequence may have respective SCUs of
4.times.4 pixels and 8.times.8 pixels, i.e., the SCU size may
change among frames. Video encoder 22 may determine a minimum SCU
or a maximum SCU for a video sequence. In this example, the minimum
SCU would be 4.times.4, while the maximum SCU would be
8.times.8.
[0047] Video encoder 22 may include various levels of syntax data
within a bitstream that defines sizes of LCUs, CUs, PUs, TUs, and
SCUs. For example, video encoder 22 may signal the size of LCU
using sequence level syntax.
[0048] In addition to signaling the size of CUs used to encode a
picture in a video sequence, video encoder 22 may use various
techniques to signal the size of a picture in the video sequence.
The size of a picture associated with a video sequence may be equal
to a picture size of a decoded picture stored in a decoded picture
buffer (DPB). Pictures may have a unit size, such as a block of a
selected height and width. The picture size may be picture sizes
supported by HEVC or another video standard e.g., picture sizes may
include 320.times.240, 1920.times.1080, and 7680.times.4320.
Further, video encoder 22 may signal syntax elements for coding
texture view components in a slice header. Thus, video encoder 22
may signal the size of a picture associated with a video sequence
and/or a minimum smallest coding unit size associated with the
video sequence using various syntax elements. Likewise, video
decoder 28 may obtain various syntax elements indicating the size
of a picture associated with a coded video sequence and/or a
minimum smallest coding unit size associated with the coded video
sequence and use such syntax elements in decoding the coded video
sequence. In one example, video encoder 22 may signal the minimum
smallest coding unit size and the size of a picture associated with
a video sequence in sequence level syntax information wherein the
picture size is a multiple of the minimum smallest coding unit
size. In one example, video decoder 28 may obtain a coded video
sequence including one or more coded pictures and a minimum
smallest coding unit size for the video sequence in sequence level
syntax information. Video decoder 28 may decode the coded pictures
in the coded video sequence and store the decoded pictures in a
decoded picture buffer with a picture size equal to a multiple of
the minimum smallest coding unit size.
[0049] In some video compression techniques that utilize fixed
sized macroblocks (e.g., 16.times.16), the size of a picture may be
signaled in the unit of macroblocks. When the width or height is
not equal to a multiple of the fixed sized macroblock, a cropping
window may be used. For example, a 1920.times.1080 picture can be
coded as 1920.times.1088 in the bitstream, but the cropping window
signals the real window to make the picture to be displayed as
1920.times.1080. In other techniques, the size of a picture may be
signaled in the unit of pixel. One example of signaling the size of
a picture in the unit of pixel is provided by the HEVC
standard.
[0050] In one example, video encoder 22 and video decoder 28 may
code video data where a size of a coded picture in a picture
sequence is defined in terms of a particular type of coded unit
(CU). The particular types of coded blocks may be a LCU, a SCU, a
minimum smallest CU, or a maximum smallest CU of each picture in
the sequence of pictures, as described above. More specifically,
video encoder 22 may indicate a unit used to signal a size of a
picture relative to a size of a coding unit (CU) of the picture. In
one example, the unit may be equal to a size of the smallest CU
size that is allowed in the coded video sequence. In some cases,
the smallest CU size is the same for all pictures in the video
sequence. In other cases, the smallest CU size of each picture in
the video sequence may be different. In that case, the smallest CU
size for each picture in a video sequence may not be smaller than
the smallest possible CU size for the video sequence. In another
example, the unit indicated by video encoder 22 may be equal to a
size of the largest coding unit (LCU) for a group of pictures. In
some examples, a cropping window may be applied to the picture by
video encoder 22 or video decoder 28 to reduce the size of the
picture. The cropping window may crop at least one of a right side
or a bottom side of a picture, for example.
[0051] In another example, video encoder 22 may signal a picture
size relative to an aligned CU (ACU) size. An aligned CU size may
be a CU size that is used to specify a picture size of a decoded
picture stored in a decoded picture buffer (DPB). Such a picture
size may have a width and height both as multiplications of the
width and height of the aligned CU size. Similarly, the picture
height may be a multiplication of a height of an aligned CU. The
size (width and height) of the aligned CU can be signaled in the
same way as in the other alternatives. For example video encoder 22
may signal an aligned CUs at various levels of syntax.
[0052] An ACU size may defined accord to the following examples: if
all pictures in a video sequence have the same SCU size, the ACU
may be defined as the SCU size. If, on the other hand, pictures in
the video sequence have different SCU sizes, the ACU may be defined
as the maximum or minimum SCU size among all pictures. Regardless
of how the ACU is defined, the ACU size may be explicitly signaled
by video encoder 22 in a sequence parameter set (SPS) or picture
parameter set (PPS) associated with the video sequence. In some
cases, the ACU size may be restricted such that it is equal to or
less than the LCU size for a video sequence and equal to or larger
than a SCU size for a video sequence.
[0053] Further, in some examples, video encoder 22 may signal a
picture size in a unit of LCU or a unit of SCU. In some examples,
the unit used to signal a size of a coded picture may be signaled
in a SPS. This unit may be equal to the size of a smallest CU size
that is allowed for the coded video sequence. In the PPS, the
relative size of the smallest CU size for pictures referring to
this PPS may be signaled by video encoder 22. In the case where all
pictures in a video sequence have the same SCU size, additional
signaling of the relative size of the smallest CU may not be
necessary in the PPS. In the case where the smallest CU size varies
between pictures in a video sequence, a relative smallest CU size
for a portion of the pictures in the video sequence may be signaled
in the PPS where the relative smallest CU size is larger than the
minimum smallest CU for the video sequence. The relative smallest
CU size may be signaled in the PPS as a difference between the
relative smallest CU size for the portion of the pictures and the
minimum smallest CU size for the video sequence.
[0054] Alternatively, the picture size can be signaled by video
encoder 22 with a unit of LCU in SPS. However, since the cropping
window can be further signaled by video encoder 22, using the
cropping window may help video decoder identify the picture size,
as long as the ACU size is known.
[0055] Alternatively, when the SCU size varies for the pictures in
the video sequence, the unit may be equal to the size of a maximum
smallest CU size allowed in pictures in the coded video sequence.
In an example where the maximum CU size is 64.times.64 pixels, and
some pictures have CU size of 4.times.4 pixels while and others
have a smallest CU size of 8.times.8 pixels, the unit of the
picture size may be 8.times.8 pixels. In this example, if a picture
has a size of 64.times.65 pixels, a picture size would as signaled
by video encoder 22 as 8 times 8 pixels by 9 times 8 pixels. Pixels
in a picture that exceeded the 64.times.65 pixel size may be
cropped using frame cropping syntax elements.
[0056] In some examples, a maximum CU size is 64.times.64 pixels
and some pictures have a smallest possible CU size of 4.times.4
pixels while others have a smallest CU size of 8.times.8 pixels.
For this example, if the particular type of CU is the minimum
possible smallest CU, the unit for the picture size is of 4.times.4
pixels. Continuing with the example, if the particular type of CU
is the maximum possible smallest CU, the unit for the picture size
is of 8.times.8 pixels.
[0057] Tables 1-7 below provide example syntax that may be
implemented by video encoder 22 and video decoder 28 to perform
techniques described herein. Example syntax may be implemented by
video encoder 22 and video decoder 28 using hardware, software,
firmware, or any combination thereof.
[0058] As described above, video encoder 22 may signal the unit
used to signal the size of the coded picture in the SPS (Sequence
Parameter set). In one example, this unit may be equal to the size
of the smallest CU size that is allowed in the coded video
sequence. In this example, if the smallest CU size may vary in a
coded bitstream between pictures in a group of pictures, the
smallest CU size shall not be smaller than the size of this unit.
Table 1 below provides an example of SPS raw byte sequence payload
(RBSP) syntax used to signal the minimum smallest CU size for the
coded picture in the video sequence. In the picture parameter set
(PPS), the relative size of the smallest CU size for pictures
referring to this PPS may be signaled.
TABLE-US-00001 TABLE 1 Sequence parameter set RBSP syntax De-
seq_parameter_set_rbsp( ) { scriptor profile_idc u(8)
reserved_zero_8bits /* equal to 0 */ u(8) level_idc u(8)
seq_parameter_set_id ue(v) max_temporal_layers_minus1 u(3)
bit_depth_luma_minus8 ue(v) bit_depth_chroma_minus8 ue(v)
pcm_bit_depth_luma_minus1 u(4) pcm_bit_depth_chroma_minus1 u(4)
log2_max_frame_num_minus4 ue(v) pic_order_cnt_type ue(v) if(
pic_order_cnt_type = = 0 ) log2_max_pic_order_cnt_lsb_minus4 ue(v)
else if( pic_order_cnt_type = = 1 ) {
delta_pic_order_always_zero_flag u(1) offset_for_non_ref_pic se(v)
num_ref_frames_in_pic_order_cnt_cycle ue(v) for( i = 0; i <
num_ref_frames_in_pic_order_cnt_cycle; i++ ) offset_for_ref_frame[
i ] se(v) } max_num_ref_frames ue(v)
gaps_in_frame_num_value_allowed_flag u(1) log2_
max_coding_block_size_minus3 ue(v)
log2_diff_max_pic_alligned_min_coding_block_size ue(v)
log2_min_transform_block_size_minus2 ue(v)
log2_diff_max_min_transform_block_size ue(v)
log2_min_pcm_coding_block_size_minus3 ue(v)
max_transform_hierarchy_depth_inter ue(v)
max_transform_hierarchy_depth_intra ue(v)
chroma_pred_from_luma_enabled_flag u(1)
loop_filter_across_slice_flag u(1)
sample_adaptive_offset_enabled_flag u(1)
adaptive_loop_filter_enabled_flag u(1) pcm_loop_filter_disable_flag
u(1) cu_qp_delta_enabled_flag u(1) temporal_id_nesting_flag u(1)
rbsp_trailing_bits( ) }
[0059] In Table 1, syntax element log2_max_coding_block_size_minus3
may specify the maximum size of a coding block. A variable
Log2MaxCUSize may be set equal to:
log2_max_coding_block_size_minus3+3.
[0060] In Table 1, syntax element
log2_diff max_pic_alligned_min_coding_block_size may specify the
difference between the minimum size of a coding block in the whole
coded video sequence and the maximum size of a coding block. In
some cases, a group of pictures may be defined such that a picture
in a group of pictures shall not have a smallest coding CU size
smaller than the difference value.
[0061] A variable Log2SeqMinCUSize may be set equal to
log2_minmax_coding_block_size_minus3+3-log2_diff_max_pic_alligned_min_co-
ding_block_size.
[0062] This value may range from 0 to log2 max coding block size
minus3. Variables Log2MaxCUSize and Log2SeqMinCUSize may be used be
video encoder 22 and video decoder 28 to process for video
coding.
[0063] It should be noted that Table 1 includes syntax elements
pic_width_in_luma_samples, pic_height_in_luma_samples, and
log2_min_coding_block_size_minus3, which appear in Table 1 with a
strikethrough. These syntax elements represent an alternative
example where the size of a picture may be signaled by the video
encoder 22 in the unit of pixels. In one example, where a picture
size has a width and height both as multiplications of the width
and height of the ACU size, where the ACU size is equal to the
minimum SCU of a video sequence, as described above, video decoder
28 may determine whether a bitstream is conforming based on whether
a condition the values of pic_width_in_luma_samples,
pic_height_in_luma_samples, are integer multiples of
log2_min_coding_block_size_minus3.
[0064] Table 2, below, provides another example of a SPS RBSP
syntax, in accordance with the techniques that may be performed by
video encoder 22 and video decoder 28.
TABLE-US-00002 TABLE 2 Sequence parameter set RBSP syntax De-
seq_parameter_set_rbsp( ) { scriptor profile_idc u(8)
reserved_zero_8bits /* equal to 0 */ u(8) level_idc u(8)
seq_parameter_set_id ue(v) max_temporal_layers_minus1 u(3)
pic_width_in_alligned_scu ue(v) pic_height_in_alligned_scu ue(v)
bit_depth_luma_minus8 ue(v) bit_depth_chroma_minus8 ue(v)
pcm_bit_depth_luma_minus1 u(4) pcm_bit_depth_chroma_minus1 u(4)
log2_max_frame_num_minus4 ue(v) pic_order_cnt_type ue(v) if(
pic_order_cnt_type = = 0 ) log2_max_pic_order_cnt_lsb_minus4 ue(v)
else if( pic_order_cnt_type = = 1 ) {
delta_pic_order_always_zero_flag u(1) offset_for_non_ref_pic se(v)
num_ref_frames_in_pic_order_cnt_cycle ue(v) for( i = 0; i <
num_ref_frames_in_pic_order_cnt_cycle; i++ ) offset_for_ref_frame[
i ] se(v) } max_num_ref_frames ue(v)
gaps_in_frame_num_value_allowed_flag u(1)
log2_max_coding_block_size_minus3 ue(v)
log2_diff_max_pic_alligned_min_coding_block_size ue(v)
log2_min_transform_block_size_minus2 ue(v)
log2_diff_max_min_transform_block_size ue(v)
log2_min_pcm_coding_block_size_minus3 ue(v)
max_transform_hierarchy_depth_inter ue(v)
max_transform_hierarchy_depth_intra ue(v)
chroma_pred_from_luma_enabled_flag u(1)
loop_filter_across_slice_flag u(1)
sample_adaptive_offset_enabled_flag u(1)
adaptive_loop_filter_enabled_flag u(1) pcm_loop_filter_disable_flag
u(1) cu_qp_delta_enabled_flag u(1) temporal_id_nesting_flag u(1)
rbsp_trailing_bits( ) }
[0065] According to Table 2, a width and height of a picture may be
indicated by video encoder 22 relative to a width and height of an
aligned CU. As described above, an aligned CU may be a CU that is
used by video encoder 22 and video decoder 28 to specify a picture
size. That is, the picture width may be a multiplication of a width
of an aligned CU. As described above, the aligned CU size may be a
size used to specify a picture size of a decoded picture stored in
a decoded picture buffer (DPB). In some examples, a picture may
contain one or more complete aligned CUs. In some examples, the
aligned CU is an aligned smallest CU (SCU).
[0066] Table 2 specifies a height of a picture as pic_height_in_
alligned_scu and a width of the picture as
pic_width_in_alligned_scu. pic_width_in_alligned_cu may specify the
width of the pictures in the coded video sequence in a unit of
aligned CUs. pic_height_in_alligned_cu may specify the height of
the pictures in the coded video sequence in a unit of aligned
CUs.
[0067] log2_max_coding_block_size_minus3 may specify the maximum
size of a coding block. A variable Log2MaxCUSize may be set equal
to log2_max_coding_block_size_minus3+3.
[0068] log2_diff max_pic_alligned_min_coding_block_size may specify
a difference between a minimum size of a coding block in the whole
coded video sequence and a maximum size of a coding block. In some
examples, any picture may not have a smallest coding CU size
smaller than that.
[0069] Table 3 below provides additional syntax elements for PPS
RBSP that may be implemented by video encoder 22 and video decoder
28 in conjunction with SPS RBSP provided in either Table 1 or Table
2.
TABLE-US-00003 TABLE 3 Picture parameter set RBSP syntax De-
pic_parameter_set_rbsp( ) { scriptor pic_parameter_set_id ue(v)
seq_parameter_set_id ue(v) entropy_coding_mode_flag u(1)
num_temporal_layer_switching_point_flags ue(v) for( i = 0; i <
num_temporal_layer_switching_point_flags; i++ )
temporal_layer_switching_point_flag[ i ] u(1)
num_ref_idx_l0_default_active_minus1 ue(v)
num_ref_idx_l1_default_active_minus1 ue(v) pic_init_qp_minus26 /*
relative to 26 */ se(v) pic_scu_size_delta ue(v)
constrained_intra_pred_flag u(1) slice_granularity u(2)
shared_pps_info_enabled_flag u(1) if( shared_pps_info_enabled_flag
) if( adaptive_loop_filter_enabled_flag ) alf_param( ) if(
cu_qp_delta_enabled_flag ) max_cu_qp_delta_depth u(4)
rbsp_trailing_bits( ) }
[0070] In Table 3, pic_scu_size_delta may specify a minimum size of
a coding unit of the pictures referring to this picture parameter
set. This value may range from 0 to log2_diff_max_pic alligned min
coding block size.
[0071] The variable Log2MinCUSize may be set equal to
Log2SeqMinCUSize+pic_scu_size_delta. Alternatively, if the size of
the aligned CU is a maximum of the smallest CU sizes of all the
pictures, the variable Log2MinCUSize may be set equal to
Log2SeqMinCUSize-pic_scu_size_delta. Alternatively, if the aligned
CU size can be of any possible CU size, in this case,
pic_scu_size_delta can be a signed value (se(v)) and the variable
Log2MinCUSize may be set equal to
Log2SeqMinCUSize-pic_scu_size_delta.
[0072] In addition to the examples descried above, in one example,
an LCU size for a video sequence may be defined as N by N and the
ACU size, selected according to one of the examples described
above, may be defined as M by M. In this case, the picture size may
be signaled by video encoder 22 in the unit of LCU size may be
defined as WL by HL. Thus, the picture size with respect to the
aligned CU size may be derived by video decoder 28 according to the
following equation: (WL*N-crop_right_offset+M-1)/M*M by
(HL*N-crop_bottom_offset+M-1)/M*M, wherein crop right offset and
crop_bottom_offset are signaled by video encoder 22 in the cropping
window and are the numbers of pixels cropped from the right and
bottom boundary, respectively. It should be noted that WL may be
the value of the pic_width_in_LCU and WH is the value of
pic_height_in_LCU in the Table 5 below. It should also be noted
that the operations (e.g., divisions) in the equation above may be
integer calculations.
[0073] Table 4 below provides another example of additional syntax
elements for seq parameter_set_rbsp( ) In this example, the sizes
of one or more pictures may be signaled by video encoder 22
relative to a size of a largest coding unit (LCU). The sizes of the
one or more pictures may be signaled by video encoder 22 in the
sequence parameter set, for example.
[0074] The picture size may also be signaled by video encoder 22
with a num right offset ACU and num_bottom_offset_ACU, so the
picture size is (WL*N-M*num_right _offset_ACU) by
(HL*N-M*num_bottom_offset_ACU). These two parameters can be
signaled in SPS or PPS. The decoded picture is to be stored in the
decoded picture buffer with a picture with respect to the aligned
CU, which is (WL*N-num crop acu_right*M) by
(HL*N-num_crop_acu_right*M).
[0075] In some examples, a cropping window may be further signaled
by video encoder 22. A cropping window may define at least a right
side or a bottom side of a picture or other to be cropped. However,
since the cropping window can be further signaled, the cropping
window may be used to identifying the picture size when the aligned
CU size is known.
TABLE-US-00004 TABLE 4 Sequence parameter set RBSP syntax De-
seq_parameter_set_rbsp( ) { scriptor profile_idc u(8)
reserved_zero_8bits /* equal to 0 */ u(8) level_idc u(8)
seq_parameter_set_id ue(v) max_temporal_layers_minus1 u(3)
pic_width_in_LCU ue(v) pic_height_in_LCU ue(v) num_crop_acu_right
ue(v) num_crop_acu_bottom ue(v) bit_depth_luma_minus8 ue(v)
bit_depth_chroma_minus8 ue(v) pcm_bit_depth_luma_minus1 u(4)
pcm_bit_depth_chroma_minus1 u(4) log2_max_frame_num_minus4 ue(v)
pic_order_cnt_type ue(v) if( pic_order_cnt_type = = 0 )
log2_max_pic_order_cnt_lsb_minus4 ue(v) else if( pic_order_cnt_type
= = 1 ) { delta_pic_order_always_zero_flag u(1)
offset_for_non_ref_pic se(v) num_ref_frames_in_pic_order_cnt_cycle
ue(v) for( i = 0; i < num_ref_frames_in_pic_order_cnt_cycle; i++
) offset_for_ref_frame[ i ] se(v) } max_num_ref_frames ue(v)
gaps_in_frame_num_value_allowed_flag u(1)
log2_max_coding_block_size_minus3 ue(v)
log2_diff_max_pic_alligned_min_coding_block_size ue(v)
log2_min_transform_block_size_minus2 ue(v)
log2_diff_max_min_transform_block_size ue(v)
log2_min_pcm_coding_block_size_minus3 ue(v)
max_transform_hierarchy_depth_inter ue(v)
max_transform_hierarchy_depth_intra ue(v)
chroma_pred_from_luma_enabled_flag u(1)
loop_filter_across_slice_flag u(1)
sample_adaptive_offset_enabled_flag u(1)
adaptive_loop_filter_enabled_flag u(1) pcm_loop_filter_disable_flag
u(1) cu_qp_delta_enabled_flag u(1) temporal_id_nesting_flag u(1)
frame_cropping_flag u(1) if( frame_cropping_flag ) {
frame_crop_left_offset ue(v) frame_crop_right_offset ue(v)
frame_crop_top_offset ue(v) frame_crop_bottom_offset ue(v) }
rbsp_trailing_bits( ) }
[0076] In the example show in Table 4, a size of a picture in terms
of width and height are given in terms of a largest coding unit
(LCU). That is, pic_width_in_LCU may specify a size in pixels of
one or more pictures relative to an LCU. Similarly,
pic_height_in_LCU may specify a size in pixels of one or more
pictures relative to an LCU. The syntax element num_crop_acu_right
may be signaled in the cropping window and define a number of
pixels to be cropped on a right side of a picture or other video
block. Similarly, the syntax element num_crop_acu_bottom may be
signaled in the cropping window and define a number of pixels to be
cropped on a bottom side of a picture or other video block. In
other examples, other sides of cropping windows are signaled.
[0077] An example is provided for illustrative purposes only. In
this example, the LCU size is N by N and the aligned CU size is M
by M. The picture size is signaled in terms of a unit of LCU size
is given as WL by HL. In this example, WL is the value of
pic_width_in_LCU and HL is the value of pic_height_in_LCU.
Crop_right_offset may define a number of pixels to crop on a right
side, and may be equal to num_crop_acu_right. Crop_bottom_offset
may define a number of pixels to crop on a bottom side, and may be
equal to num_crop_acu_bottom.
[0078] From the picture size relative to the LCU size and the
aligned CU size, the picture size with respect to the aligned CU
(ACU) size can be determined from the following equations
width of picture relative to ACU = ( WL * N - crop_right _offset +
M - 1 ) M M ( 1 ) height of picture relative to ACU = ( HL * N -
crop_bottom _offset + M - 1 ) M M ( 2 ) ##EQU00001##
[0079] It should be note that the operations in equations 1 and 2
may be integer calculations.
[0080] Table 5 provides yet another example of additional syntax
elements for pic_parameter set rbsp( ). In this example, at least
one of num_right_offset_ACU and num_bottom_offset_ACU may be
signaled. Table 6 shows num_right_offset_ACU and
num_bottom_offset_ACU being signaled in the SPS, however, these
values may be signaled elsewhere. For example, at least one of
num_right_offset_ACU and num_bottom_offset_ACU may be signaled in a
PPS.
TABLE-US-00005 TABLE 5 Sequence parameter set RBSP syntax De-
seq_parameter_set_rbsp( ) { scriptor profile_idc u(8)
reserved_zero_8bits /* equal to 0 */ u(8) level_idc u(8)
seq_parameter_set_id ue(v) max_temporal_layers_minus1 u(3)
pic_width_in_LCU ue(v) pic_height_in_LCU ue(v) num_crop_acu_right
ue(v) num_crop_acu_bottom ue(v) bit_depth_luma_minus8 ue(v)
bit_depth_chroma_minus8 ue(v) pcm_bit_depth_luma_minus1 u(4)
pcm_bit_depth_chroma_minus1 u(4) log2_max_frame_num_minus4 ue(v)
pic_order_cnt_type ue(v) if( pic_order_cnt_type = = 0 )
log2_max_pic_order_cnt_lsb_minus4 ue(v) else if( pic_order_cnt_type
= = 1 ) { delta_pic_order_always_zero_flag u(1)
offset_for_non_ref_pic se(v) num_ref_frames_in_pic_order_cnt_cycle
ue(v) for( i = 0; i < num_ref_frames_in_pic_order_cnt_cycle; i++
) offset_for_ref_frame[ i ] se(v) } max_num_ref_frames ue(v)
gaps_in_frame_num_value_allowed_flag u(1)
log2_max_coding_block_size_minus3 ue(v)
log2_diff_max_pic_alligned_min_coding_block_size ue(v)
log2_min_transform_block_size_minus2 ue(v)
log2_diff_max_min_transform_block_size ue(v)
log2_min_pcm_coding_block_size_minus3 ue(v)
max_transform_hierarchy_depth_inter ue(v)
max_transform_hierarchy_depth_intra ue(v)
chroma_pred_from_luma_enabled_flag u(1)
loop_filter_across_slice_flag u(1)
sample_adaptive_offset_enabled_flag u(1)
adaptive_loop_filter_enabled_flag u(1) pcm_loop_filter_disable_flag
u(1) cu_qp_delta_enabled_flag u(1) temporal_id_nesting_flag u(1)
frame_cropping_flag u(1) if( frame_cropping_flag ) {
frame_crop_left_offset ue(v) frame_crop_right_offset ue(v)
frame_crop_top_offset ue(v) frame_crop_bottom_offset ue(v) }
rbsp_trailing_bits( ) }
[0081] The value num_crop_acu_right in Table 5 may specify a number
of aligned CU sizes to be cropped from the LCU aligned picture from
the right. The cropped picture may be stored in the DPB. The value
num_crop_acu_bottom may specify a number of aligned CU sizes to be
cropped from the LCU aligned picture from the bottom, to get the
picture to be stored in the DPB.
[0082] In an example corresponding with Table 5, the picture size
can also be signaled with a num_right_offset_ACU and
num_bottom_offset_ACU. The picture size may be determined as:
width of picture relative to
ACU=WL(N-M)(num_right_offset.sub.--ACU) (3)
height of picture relative to
ACU=WL(N-M)(num_bottom_offset.sub.--ACU) (4)
[0083] A decoded picture that may be stored in a decoded picture
buffer with a picture size with respect to the aligned CU may be
given as follows:
(WL*N-num_crop_acu_right*M) by (HL*N-num_crop_acu_bottom*M) (5)
[0084] Thus, a size (height and width in pixels) of the aligned CU
may be signaled in the same way as in the examples above with
respect to the picture size. For example, if all pictures have the
same smallest CU (SCU) size, the size of the aligned CU may be the
SCU size. As another example, if the pictures have different SCU
sizes, the aligned CU size may be a maximum or a minimum SCU size
among all the pictures. The aligned CU size may be signaled
explicitly in at least one of the SPS or in the PPS. The aligned CU
size may be equal or less than a size of the LCU and equal or
larger than the size of the SCU.
[0085] Table 6 below provides one example of frame cropping syntax
that may be used in conjunction with any of the example embodiments
described above. In one example, the cropping window may be in
sequence parameter set and follow the same semantics as those in
H.264/AVC).
TABLE-US-00006 TABLE 6 Frame Cropping Syntax frame_cropping_flag 0
u(1) if( frame_cropping_flag ) { frame_crop_left_offset 0 ue(v)
frame_crop_right_offset 0 ue(v) frame_crop_top_offset 0 ue(v)
frame_crop_bottom_offset 0 ue(v) }
[0086] FIG. 2 is a block diagram illustrating an example video
encoder that may be configured to perform the techniques described
in this disclosure. Video encoder 50 may be configured to determine
a smallest coding unit size for each of a plurality of pictures
defining a video sequence, wherein a smallest coding unit size is
selected from a plurality of possible coding unit sizes. Further,
video encoder 50 may be configured to determine a minimum coding
unit size for the video sequence based on the smallest coding unit
determined for each of the plurality pictures defining the video
sequence. In addition, video encoder 50 may be configured to
determine a picture size associated with the video sequence,
wherein the picture size associated with the video sequence is an
multiple of the minimum coding unit size value. Moreover, video
encoder 50 may be configured to signal the minimum coding unit size
value in sequence level syntax information.
[0087] Video encoder 50 may correspond to video encoder 22 of
device 20, or a video encoder of a different device. As shown in
FIG. 2, video encoder 50 may include a prediction encoding module
32, quadtree partition module 31, adders 48 and 51, and a memory
34. Video encoder 50 may also include a transform module 38 and a
quantization module 40, as well as an inverse quantization module
42 and an inverse transform module 44. Video encoder 50 may also
include an entropy coding module 46, and a filter module 47, which
may include deblock filters and post loop and/or in loop filters.
The encoded video data and syntax information that defines the
manner of the encoding may be communicated to entropy encoding
module 46, which performs entropy encoding on the bitstream.
[0088] As shown in FIG. 2, prediction encoding module 32 may
support a plurality of different coding modes 35 used in the
encoding of video blocks. Prediction encoding module 32 may also
comprise a motion estimation (ME) module 36 and a motion
compensation (MC) module 37.
[0089] During the encoding process, video encoder 50 receives input
video data. Quadtree partition module 31 may partition units of
video data into smaller units. For example, quadtree partition
module 31 may break an LCU into smaller CU's and PU's according to
HEVC partitioning described above. Prediction encoding module 32
performs predictive coding techniques on video blocks (e.g. CUs and
PUs). For inter coding, prediction encoding module 32 compares CUs
or PUs to various predictive candidates in one or more video
reference frames or slices (e.g., one or more "list" of reference
data) in order to define a predictive block. For intra coding,
prediction encoding module 32 generates a predictive block based on
neighboring data within the same video frame or slice. Prediction
encoding module 32 outputs the prediction block and adder 48
subtracts the prediction block from the CU or PU being coded in
order to generate a residual block. At least some video blocks may
be coded using advanced motion vector prediction (AMVP) described
in HEVC.
[0090] In some cases, prediction encoding module may include a
rate-distortion (R-D) module that compares coding results of video
blocks (e.g., CUs or PUs) in different modes. In this case,
prediction encoding module 32 may also include a mode selection
module to analyze the coding results in terms of coding rate (i.e.,
coding bits required for the block) and distortion (e.g.,
representing the video quality of the coded block relative to the
original block) in order to make mode selections for video blocks.
In this way, the R-D module may provide analysis of the results of
different modes to allow the mode selection module to select the
desired mode for different video blocks.
[0091] Referring again to FIG. 2, after prediction encoding module
32 outputs the prediction block, and after adder 48 subtracts the
prediction block from the video block being coded in order to
generate a residual block of residual pixel values, transform
module 38 applies a transform to the residual block. The transform
may comprise a discrete cosine transform (DCT) or a conceptually
similar transform such as that defined by the ITU H.264 standard or
the HEVC standard. So-called "butterfly" structures may be defined
to perform the transforms, or matrix-based multiplication could
also be used. In some examples, consistent with the HEVC standard,
the size of the transform may vary for different CUs, e.g.,
depending on the level of partitioning that occurs with respect to
a given LCU. Transform units (TUs) may be defined in order to set
the transform size applied by transform module 38. Wavelet
transforms, integer transforms, sub-band transforms or other types
of transforms could also be used. In any case, transform module 38
applies the transform to the residual block, producing a block of
residual transform coefficients. The transform, in general, may
convert the residual information from a pixel domain to a frequency
domain.
[0092] Quantization module 40 then quantizes the residual transform
coefficients to further reduce bit rate. Quantization module 40,
for example, may limit the number of bits used to code each of the
coefficients. In particular, quantization module 40 may apply the
delta QP defined for the LCU so as to define the level of
quantization to apply (such as by combining the delta QP with the
QP of the previous LCU or some other known QP). After quantization
is performed on residual samples, entropy coding module 46 may scan
and entropy encode the data.
[0093] CAVLC is one type of entropy coding technique supported by
the ITU H.264 standard and the emerging HEVC standard, which may be
applied on a vectorized basis by entropy coding module 46. CAVLC
uses variable length coding (VLC) tables in a manner that
effectively compresses serialized "runs" of coefficients and/or
syntax elements. CABAC is another type of entropy coding technique
supported by the ITU H.264 standard or the HEVC standard, which may
be applied on a vectorized basis by entropy coding module 46. CABAC
may involve several stages, including binarization, context model
selection, and binary arithmetic coding. In this case, entropy
coding module 46 codes coefficients and syntax elements according
to CABAC. Many other types of entropy coding techniques also exist,
and new entropy coding techniques will likely emerge in the future.
This disclosure is not limited to any specific entropy coding
technique.
[0094] Following the entropy coding by entropy encoding module 46,
the encoded video may be transmitted to another device or archived
for later transmission or retrieval. The encoded video may comprise
the entropy coded vectors and various syntax information. Such
information can be used by the decoder to properly configure the
decoding process. Inverse quantization module 42 and inverse
transform module 44 apply inverse quantization and inverse
transform, respectively, to reconstruct the residual block in the
pixel domain. Summer 51 adds the reconstructed residual block to
the prediction block produced by prediction encoding module 32 to
produce a reconstructed video block for storage in memory 34.
Memory 34 may include a decoded picture buffer and reconstructed
video blocks may form a decoded picture. Prior to such storage,
however, filter module 47 may apply filtering to the video block to
improve video quality. The filtering applied by filter module 47
may reduce artifacts and smooth pixel boundaries. Moreover,
filtering may improve compression by generating predictive video
blocks that comprise close matches to video blocks being coded.
[0095] FIG. 3 is a flowchart illustrating an example technique for
encoding video data that may be performed by video encoder 22 or
video encoder 50. Video encoder 20 or video encoder 50 may
determine a smallest coding unit size for each of a plurality of
pictures defining a video sequence (302). In some cases, a smallest
coding unit size may be selected from a plurality of possible
coding unit sizes. For example, the smallest coding unit may be one
of 4.times.4, 8.times.8, 16.times.16, 32.times.32 or 64.times.64,
where 64.times.64 is the maximum possible coding unit size. Video
encoder 20 or video encoder 50 may determine an aligned coding unit
size for the video sequence from the determined smallest coding
units (304). Video encoder 20 of video encoder 50 may determine the
aligned coding size based on the techniques described above. Video
encoder 20 or video encoder 50 determine a picture size associated
with the video sequence, wherein the picture size associated with
the video sequence is an multiple of the aligned coding unit size
value (306). In some cases the picture size associated with the
video sequence may be a picture size of a decoded picture stored in
a decoded picture buffer. Video encoder 20 or video encoder 50 may
signal the aligned coding unit size value in sequence level syntax
information (308).
[0096] FIG. 4 is a block diagram illustrating an example of a video
decoder 60, which decodes a video sequence that is encoded in the
manner described herein. The techniques of this disclosure may be
performed by video decoder 60 in some examples. Video decoder 60
may be configured to obtain a coded video sequence including a
first picture coded using a first smallest coding unit size and a
second picture coded using second smallest coding unit size.
Further, video decoder 60 may be configured to obtain a picture
size of a decoded picture to be stored in a decoded picture buffer
wherein the picture size is a multiple of one of the first coding
unit size and the second coding unit size. In addition, video
decoder 60 may be configured to store the decoded picture in a
decoded picture buffer.
[0097] Video decoder 60 includes an entropy decoding module 52,
which performs the reciprocal decoding function of the encoding
performed by entropy encoding module 46 of FIG. 2. In particular,
entropy decoding module 52 may perform CAVLC or CABAC decoding, or
any other type of entropy decoding used by video encoder 50. Video
decoder 60 also includes a prediction decoding module 54, an
inverse quantization module 56, an inverse transform module 58, a
memory 62, and a summer 64. In particular, like video encoder 50,
video decoder 60 includes a prediction decoding module 54 and a
filter module 57. Prediction decoding module 54 of video decoder 60
may include motion compensation module 86, which decodes inter
coded blocks and possibly includes one or more interpolation
filters for sub-pixel interpolation in the motion compensation
process. Prediction decoding module 54 may also include an intra
prediction module for decoding intra modes. Prediction decoding
module 54 may support a plurality of modes 35. Filter module 57 may
filter the output of summer 64, and may receive entropy decoded
filter information so as to define the filter coefficients applied
in the loop filtering.
[0098] Upon receiving encoded video data, entropy decoding module
52 performs reciprocal decoding to the encoding performed by
entropy encoding module 46 (of encoder 50 in FIG. 2). At the
decoder, entropy decoding module 52 parses the bitstream to
determine LCU's and the corresponding partitioning associated with
the LCU's. In some examples, an LCU or the CUs of the LCU may
define coding modes that were used, and these coding modes may
include the bi-predictive merge mode. Accordingly, entropy decoding
module 52 may forward the syntax information to prediction unit
that identifies the bi-predictive merge mode. Memory 62 may include
a decoded picture buffer. Decoded picture buffer may store a
decoded picture. The decoded picture may be associated with a video
sequences such that the decoder picture is referenced during
prediction decoding. Syntax information may be used by video
decoder 60 to determine the size of the decoded picture to be
stored in the decoded picture buffer according to the techniques
described herein.
[0099] FIG. 5 is a flowchart illustrating an example technique for
decoding video data that may be performed by video decoder 28 or
video decoder 60. Video decoder 28 or video decoder 60 may obtain a
coded video sequence including a first picture coded using a first
smallest coding unit size and a second picture coded using second
smallest coding unit size (502). In one example, first picture may
be coded using a smallest coding unit size of 4.times.4 and second
picture may be coded using a smallest coding unit size of
8.times.8. Video decoder 28 or video decoder 60 may obtain a
picture size of a decoded picture to be stored in a decoded picture
buffer wherein the picture size is a multiple of one of the first
coding unit size, the second coding unit size or a maximum coding
unit size(504). In one example, the picture size may be
1920.times.1080. Video decoder 28 or video decoder 60 may store the
decoded picture in a decoded picture buffer (506). Further, video
decoder 28 or video decoder 60 may determine whether a bitstream
including a video sequence is a conforming bitstream based on
whether the obtained picture size is a multiple of the aligned
coding unit size.
[0100] In one or more examples, the functions described may be
implemented in hardware, software, firmware, or any combination
thereof. If implemented in software, the functions may be stored on
or transmitted over as one or more instructions or code on a
computer-readable medium and executed by a hardware-based
processing unit. Computer-readable media may include
computer-readable storage media, which corresponds to a tangible
medium such as data storage media, or communication media including
any medium that facilitates transfer of a computer program from one
place to another, e.g., according to a communication protocol. In
this manner, computer-readable media generally may correspond to
(1) tangible computer-readable storage media which is
non-transitory or (2) a communication medium such as a signal or
carrier wave. Data storage media may be any available media that
can be accessed by one or more computers or one or more processors
to retrieve instructions, code and/or data structures for
implementation of the techniques described in this disclosure. A
computer program product may include a computer-readable
medium.
[0101] By way of example, and not limitation, such
computer-readable storage media can comprise RAM, ROM, EEPROM,
CD-ROM or other optical disk storage, magnetic disk storage, or
other magnetic storage devices, flash memory, or any other medium
that can be used to store desired program code in the form of
instructions or data structures and that can be accessed by a
computer. Also, any connection is properly termed a
computer-readable medium. For example, if instructions are
transmitted from a website, server, or other remote source using a
coaxial cable, fiber optic cable, twisted pair, digital subscriber
line (DSL), or wireless technologies such as infrared, radio, and
microwave, then the coaxial cable, fiber optic cable, twisted pair,
DSL, or wireless technologies such as infrared, radio, and
microwave are included in the definition of medium. It should be
understood, however, that computer-readable storage media and data
storage media do not include connections, carrier waves, signals,
or other transitory media, but are instead directed to
non-transitory, tangible storage media. Disk and disc, as used
herein, includes compact disc (CD), laser disc, optical disc,
digital versatile disc (DVD), floppy disk and blu-ray disc where
disks usually reproduce data magnetically, while discs reproduce
data optically with lasers. Combinations of the above should also
be included within the scope of computer-readable media.
[0102] Instructions may be executed by one or more processors, such
as one or more digital signal processors (DSPs), general purpose
microprocessors, application specific integrated circuits (ASICs),
field programmable logic arrays (FPGAs), or other equivalent
integrated or discrete logic circuitry. Accordingly, the term
"processor," as used herein may refer to any of the foregoing
structure or any other structure suitable for implementation of the
techniques described herein. In addition, in some aspects, the
functionality described herein may be provided within dedicated
hardware and/or software modules configured for encoding and
decoding, or incorporated in a combined codec. Also, the techniques
could be fully implemented in one or more circuits or logic
elements.
[0103] The techniques of this disclosure may be implemented in a
wide variety of devices or apparatuses, including a wireless
handset, an integrated circuit (IC) or a set of ICs (e.g., a chip
set). Various components, modules, or units are described in this
disclosure to emphasize functional aspects of devices configured to
perform the disclosed techniques, but do not necessarily require
realization by different hardware units. Rather, as described
above, various units may be combined in a codec hardware unit or
provided by a collection of interoperative hardware units,
including one or more processors as described above, in conjunction
with suitable software and/or firmware.
[0104] Various aspects of the disclosure have been described. These
and other aspects are within the scope of the following claims.
* * * * *