U.S. patent application number 13/648174 was filed with the patent office on 2013-04-11 for adaptive frame size support in advanced video codecs.
This patent application is currently assigned to QUALCOMM INCORPORATED. The applicant listed for this patent is Qualcomm Incorporated. Invention is credited to Ying Chen, Marta Karczewicz, Ye-Kui Wang.
Application Number | 20130089154 13/648174 |
Document ID | / |
Family ID | 48042061 |
Filed Date | 2013-04-11 |
United States Patent
Application |
20130089154 |
Kind Code |
A1 |
Chen; Ying ; et al. |
April 11, 2013 |
ADAPTIVE FRAME SIZE SUPPORT IN ADVANCED VIDEO CODECS
Abstract
Techniques are described related to receiving first and second
sub-sequences of video, wherein the first sub-sequence includes one
or more frames each having a first resolution, and the second
sub-sequence includes one or more frames each having a second
resolution, receiving a first sequence parameter set and a second
sequence parameter set for the coded video sequence, wherein the
first sequence parameter set indicates the first resolution of the
one or more frames of the first sub-sequence, and the second
sequence parameter set indicates the second resolution of the one
or more frames of the second sub-sequence, and wherein the first
sequence parameter set is different than the second sequence
parameter set, and using the first sequence parameter set and the
second sequence parameter set to decode the coded video
sequence.
Inventors: |
Chen; Ying; (San Diego,
CA) ; Wang; Ye-Kui; (San Diego, CA) ;
Karczewicz; Marta; (San Diego, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Qualcomm Incorporated; |
San Diego |
CA |
US |
|
|
Assignee: |
QUALCOMM INCORPORATED
San Diego
CA
|
Family ID: |
48042061 |
Appl. No.: |
13/648174 |
Filed: |
October 9, 2012 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61550276 |
Oct 21, 2011 |
|
|
|
61545525 |
Oct 10, 2011 |
|
|
|
Current U.S.
Class: |
375/240.25 ;
375/240.01; 375/E7.027 |
Current CPC
Class: |
H04N 19/423 20141101;
H04N 19/573 20141101; H04N 19/58 20141101; H04N 19/70 20141101;
H04N 19/33 20141101; H04N 19/46 20141101 |
Class at
Publication: |
375/240.25 ;
375/240.01; 375/E07.027 |
International
Class: |
H04N 7/12 20060101
H04N007/12 |
Claims
1. A method of decoding video data, the method comprising:
receiving a coded video sequence comprising a first sub-sequence
and a second sub-sequence, wherein the first sub-sequence includes
one or more frames each having a first resolution, and the second
sub-sequence includes one or more frames each having a second
resolution, and wherein the first sub-sequence is different than
the second sub-sequence, and the first resolution is different than
the second resolution; receiving a first sequence parameter set and
a second sequence parameter set for the coded video sequence,
wherein the first sequence parameter set indicates the first
resolution of the one or more frames of the first sub-sequence, and
the second sequence parameter set indicates the second resolution
of the one or more frames of the second sub-sequence, and wherein
the first sequence parameter set is different than the second
sequence parameter set; and using the first sequence parameter set
and the second sequence parameter set to decode the coded video
sequence.
2. The method of claim 1, wherein the first sequence parameter set
and the second sequence parameter set are coded in a received
bitstream prior to either the first sub-sequence or the second
sub-sequence.
3. The method of claim 1, wherein receiving the first sequence
parameter set and the second sequence parameter set of the coded
video sequence comprises: receiving both the first sequence
parameter set and the second sequence parameter set prior to
receiving either of the first sub-sequence and the second
sub-sequence.
4. The method of claim 1, wherein the first sequence parameter set
is coded in a received bitstream prior to the first sub-sequence
and the second sequence parameter set is coded in the received
bitstream after at least one frame of the one or more frames of the
first sub-sequence, and prior to the second sub-sequence.
5. The method of claim 1, wherein receiving the first sequence
parameter set and the second sequence parameter set of the coded
video sequence comprises: receiving the second sequence parameter
set after receiving at least one frame of the one or more frames of
the first sub-sequence, and prior to receiving the second
sub-sequence.
6. The method of claim 1, wherein the one or more frames of the
first sub-sequence and the one or more frames of the second
sub-sequence are interleaved in the coded video sequence.
7. The method of claim 1, wherein the first resolution and the
second resolution each comprise a spatial resolution.
8. An apparatus for decoding video data, the apparatus comprising a
video decoder configured to: receive a coded video sequence
comprising a first sub-sequence and a second sub-sequence, wherein
the first sub-sequence includes one or more frames each having a
first resolution, and the second sub-sequence includes one or more
frames each having a second resolution, and wherein the first
sub-sequence is different than the second sub-sequence, and the
first resolution is different than the second resolution; receive a
first sequence parameter set and a second sequence parameter set
for the coded video sequence, wherein the first sequence parameter
set indicates the first resolution of the one or more frames of the
first sub-sequence, and the second sequence parameter set indicates
the second resolution of the one or more frames of the second
sub-sequence, and wherein the first sequence parameter set is
different than the second sequence parameter set; and use the first
sequence parameter set and the second sequence parameter set to
decode the coded video sequence.
9. The apparatus of claim 8, wherein the first sequence parameter
set and the second sequence parameter set are coded in a received
bitstream prior to either the first sub-sequence or the second
sub-sequence.
10. The apparatus of claim 8, wherein to receive the first sequence
parameter set and the second sequence parameter set of the coded
video sequence, the apparatus is configured to: receive both the
first sequence parameter set and the second sequence parameter set
prior to receiving either of the first sub-sequence and the second
sub-sequence.
11. The apparatus of claim 8, wherein the first sequence parameter
set is coded in a received bitstream prior to the first
sub-sequence and the second sequence parameter set is coded in the
received bitstream after at least one frame of the one or more
frames of the first sub-sequence, and prior to the second
sub-sequence.
12. The apparatus of claim 8, wherein to receive the first sequence
parameter set and the second sequence parameter set of the coded
video sequence, the apparatus is configured to: receive the second
sequence parameter set after receiving at least one frame of the
one or more frames of the first sub-sequence, and prior to
receiving the second sub-sequence.
13. The apparatus of claim 8, wherein the one or more frames of the
first sub-sequence and the one or more frames of the second
sub-sequence are interleaved in the coded video sequence.
14. The apparatus of claim 8, wherein the first resolution and the
second resolution each comprise a spatial resolution.
15. An apparatus for decoding video data, the apparatus comprising:
means for receiving a coded video sequence comprising a first
sub-sequence and a second sub-sequence, wherein the first
sub-sequence includes one or more frames each having a first
resolution, and the second sub-sequence includes one or more frames
each having a second resolution, and wherein the first sub-sequence
is different than the second sub-sequence, and the first resolution
is different than the second resolution; means for receiving a
first sequence parameter set and a second sequence parameter set
for the coded video sequence, wherein the first sequence parameter
set indicates the first resolution of the one or more frames of the
first sub-sequence, and the second sequence parameter set indicates
the second resolution of the one or more frames of the second
sub-sequence, and wherein the first sequence parameter set is
different than the second sequence parameter set; and means for
using the first sequence parameter set and the second sequence
parameter set to decode the coded video sequence.
16. A computer-readable storage medium comprising instructions
that, when executed, cause at least one processor to decode video
data, wherein the instructions cause the at least one processor to:
receive a coded video sequence comprising a first sub-sequence and
a second sub-sequence, wherein the first sub-sequence includes one
or more frames each having a first resolution, and the second
sub-sequence includes one or more frames each having a second
resolution, and wherein the first sub-sequence is different than
the second sub-sequence, and the first resolution is different than
the second resolution; receive a first sequence parameter set and a
second sequence parameter set for the coded video sequence, wherein
the first sequence parameter set indicates the first resolution of
the one or more frames of the first sub-sequence, and the second
sequence parameter set indicates the second resolution of the one
or more frames of the second sub-sequence, and wherein the first
sequence parameter set is different than the second sequence
parameter set; and use the first sequence parameter set and the
second sequence parameter set to decode the coded video
sequence.
17. A method of encoding video data, the method comprising:
generating a coded video sequence comprising a first sub-sequence
and a second sub-sequence, wherein the first sub-sequence includes
one or more frames each having a first resolution, and the second
sub-sequence includes one or more frames each having a second
resolution, and wherein the first sub-sequence is different than
the second sub-sequence, and the first resolution is different than
the second resolution; generating a first sequence parameter set
and a second sequence parameter set for the video sequence, wherein
the first sequence parameter set indicates the first resolution of
the one or more frames of the first sub-sequence, and the second
sequence parameter set indicates the second resolution of the one
or more frames of the second sub-sequence, and wherein the first
sequence parameter set is different than the second sequence
parameter set; and transmitting the coded video sequence comprising
the first sub-sequence and the second sub-sequence, and the first
sequence parameter set and the second sequence parameter.
18. The method of claim 17, wherein the first sequence parameter
set and the second sequence parameter set are coded in a
transmitted bitstream prior to either the first sub-sequence or the
second sub-sequence.
19. The method of claim 17, wherein transmitting the first sequence
parameter set and the second sequence parameter set of the coded
video sequence comprises: transmitting both the first sequence
parameter set and the second sequence parameter set prior to
transmitting either of the first sub-sequence and the second
sub-sequence.
20. The method of claim 17, wherein the first sequence parameter
set is coded in a transmitted bitstream prior to the first
sub-sequence and the second sequence parameter set is coded in the
transmitted bitstream after at least one frame of the one or more
frames of the first sub-sequence and prior to the second
sub-sequence.
21. The method of claim 17, wherein transmitting the first sequence
parameter set and the second sequence parameter set of the coded
video sequence comprises: transmitting the second sequence
parameter set after transmitting at least one frame of the one or
more frames of the first sub-sequence, and prior to transmitting
the second sub-sequence.
22. The method of claim 17, wherein the one or more frames of the
first sub-sequence and the one or more frames of the second
sub-sequence are interleaved in the coded video sequence.
23. The method of claim 17, wherein the first resolution and the
second resolution each comprise a spatial resolution.
24. An apparatus for coding video data, the apparatus comprising a
video coder configured to: generate a coded video sequence
comprising a first sub-sequence and a second sub-sequence, wherein
the first sub-sequence includes one or more frames each having a
first resolution, and the second sub-sequence includes one or more
frames each having a second resolution, and wherein the first
sub-sequence is different than the second sub-sequence, and the
first resolution is different than the second resolution; generate
a first sequence parameter set and a second sequence parameter set
for the video sequence, wherein the first sequence parameter set
indicates the first resolution of the one or more frames of the
first sub-sequence, and the second sequence parameter set indicates
the second resolution of the one or more frames of the second
sub-sequence, and wherein the first sequence parameter set is
different than the second sequence parameter set; and transmit the
coded video sequence comprising the first sub-sequence and the
second sub-sequence, and the first sequence parameter set and the
second sequence parameter.
25. The apparatus of claim 24, wherein the first sequence parameter
set and the second sequence parameter set are coded in a
transmitted bitstream prior to either the first sub-sequence or the
second sub-sequence.
26. The apparatus of claim 24, wherein to transmit the first
sequence parameter set and the second sequence parameter set of the
coded video sequence, the apparatus is configured to: transmit both
the first sequence parameter set and the second sequence parameter
set prior to transmitting either of the first sub-sequence and the
second sub-sequence.
27. The apparatus of claim 24, wherein the first sequence parameter
set is coded in a transmitted bitstream prior to the first
sub-sequence and the second sequence parameter set is coded in the
transmitted bitstream after at least one frame of the one or more
frames of the first sub-sequence and prior to the second
sub-sequence.
28. The apparatus of claim 24, wherein to transmit the first
sequence parameter set and the second sequence parameter set of the
coded video sequence, the apparatus is configured to: transmit the
second sequence parameter set after transmitting at least one frame
of the one or more frames of the first sub-sequence, and prior to
transmitting the second sub-sequence.
29. The apparatus of claim 24, wherein the one or more frames of
the first sub-sequence and the one or more frames of the second
sub-sequence are interleaved in the coded video sequence.
30. The apparatus of claim 24, wherein first resolution and the
second resolution each comprise a spatial resolution.
31. An apparatus for encoding video data, the apparatus comprising:
means for generating a coded video sequence comprising a first
sub-sequence and a second sub-sequence, wherein the first
sub-sequence includes one or more frames each having a first
resolution, and the second sub-sequence includes one or more frames
each having a second resolution, and wherein the first sub-sequence
is different than the second sub-sequence, and the first resolution
is different than the second resolution; means for generating a
first sequence parameter set and a second sequence parameter set
for the video sequence, wherein the first sequence parameter set
indicates the first resolution of the one or more frames of the
first sub-sequence, and the second sequence parameter set indicates
the second resolution of the one or more frames of the second
sub-sequence, and wherein the first sequence parameter set is
different than the second sequence parameter set; and means for
transmitting the coded video sequence comprising the first
sub-sequence and the second sub-sequence, and the first sequence
parameter set and the second sequence parameter.
32. A computer readable storage medium comprising instructions
that, when executed, cause at least one processor of a video
encoding device to: generate a coded video sequence comprising a
first sub-sequence and a second sub-sequence, wherein the first
sub-sequence includes one or more frames each having a first
resolution, and the second sub-sequence includes one or more frames
each having a second resolution, and wherein the first sub-sequence
is different than the second sub-sequence, and the first resolution
is different than the second resolution; generate a first sequence
parameter set and a second sequence parameter set for the video
sequence, wherein the first sequence parameter set indicates the
first resolution of the one or more frames of the first
sub-sequence, and the second sequence parameter set indicates the
second resolution of the one or more frames of the second
sub-sequence, and wherein the first sequence parameter set is
different than the second sequence parameter set; and transmit the
coded video sequence comprising the first sub-sequence and the
second sub-sequence, and the first sequence parameter set and the
second sequence parameter.
33. A computer readable storage medium, comprising a data structure
stored thereon, the data structure comprising: a coded video
sequence comprising a first sub-sequence and a second sub-sequence,
wherein the first sub-sequence includes one or more frames each
having a first resolution, and the second sub-sequence includes one
or more frames each having a second resolution, and wherein the
first sub-sequence is different than the second sub-sequence, and
the first resolution is different than the second resolution; and a
first sequence parameter set and a second sequence parameter set
for the coded video sequence, wherein the first sequence parameter
set indicates the first resolution of the one or more frames of the
first sub-sequence, and the second sequence parameter set indicates
the second resolution of the one or more frames of the second
sub-sequence, and wherein the first sequence parameter set is
different than the second sequence parameter set.
34. The computer readable medium of claim 33, wherein the first
sequence parameter set and the second sequence parameter set are
coded in a bitstream on the data structure prior to either the
first sub-sequence or the second sub-sequence.
35. The computer readable medium of claim 33, wherein the first
sequence parameter set is coded in a bitstream on the data
structure prior to the first sub-sequence and the second sequence
parameter set is coded in the bitstream on the data structure after
at least one frame of the one or more frames of the first
sub-sequence, and prior to the second sub-sequence.
Description
[0001] This application claims the benefit of:
[0002] U.S. Provisional Application No. 61/545,525, filed Oct. 10,
2011, and
[0003] U.S. Provisional Application No. 61/550,276, filed on Oct.
21, 2011 the entire contents each of which is hereby incorporated
by reference in its entirety.
TECHNICAL FIELD
[0004] This disclosure relates to video coding and, more
particularly, to techniques for coding video data.
BACKGROUND
[0005] Digital video capabilities can be incorporated into a wide
range of devices, including digital televisions, digital direct
broadcast systems, wireless broadcast systems, personal digital
assistants (PDAs), laptop or desktop computers, tablet computers,
e-book readers, digital cameras, digital recording devices, digital
media players, video gaming devices, video game consoles, cellular
or satellite radio telephones, so-called "smart phones," video
teleconferencing devices, video streaming devices, and the like.
Digital video devices implement video compression techniques, such
as those described in the standards defined by MPEG-2, MPEG-4,
ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding
(AVC), the High Efficiency Video Coding (HEVC) standard presently
under development, and extensions of such standards. The video
devices may transmit, receive, encode, decode, and/or store digital
video information more efficiently by implementing such video
compression techniques.
[0006] Video compression techniques perform spatial (intra-picture)
prediction and/or temporal (inter-picture) prediction to reduce or
remove redundancy inherent in video sequences. For block-based
video coding, a video slice (i.e., a video picture or a portion of
a video picture) may be partitioned into video blocks, which may
also be referred to as treeblocks, coding tree blocks (CTBs),
coding tree units (CTUs), coding units (CUs) and/or coding nodes.
Video blocks in an intra-coded (I) slice of a picture are encoded
using spatial prediction with respect to reference samples in
neighboring blocks in the same picture. Video blocks in an
inter-coded (P or B) slice of a picture may use spatial prediction
with respect to reference samples in neighboring blocks in the same
picture or temporal prediction with respect to reference samples in
other reference pictures. Pictures may be referred to as frames,
and reference pictures may be referred to a reference frames.
[0007] Spatial or temporal prediction results in a predictive block
for a block to be coded. Residual data represents pixel differences
between the original block to be coded and the predictive block. An
inter-coded block is encoded according to a motion vector that
points to a block of reference samples forming the predictive
block, and the residual data indicating the difference between the
coded block and the predictive block. An intra-coded block is
encoded according to an intra-coding mode and the residual data.
For further compression, the residual data may be transformed from
the pixel domain to a transform domain, resulting in residual
transform coefficients, which then may be quantized. The quantized
transform coefficients, initially arranged in a two-dimensional
array, may be scanned in order to produce a one-dimensional vector
of transform coefficients, and entropy coding may be applied to
achieve even more compression.
SUMMARY
[0008] In general, this disclosure describes techniques for coding
video sequences that include frames, or "pictures," having
different spatial resolutions. One aspect of this disclosure
includes using multiple sequence parameter sets in a single
resolution-adaptive coded video sequence to indicate a resolution
of a sequence of pictures in coded video. As one example, the
resolution-adaptive coded video sequence may comprise two or more
sub-sequences which may be coded, wherein each sub-sequence may
comprise a set of pictures with a common spatial resolution, and
may refer to a same active sequence parameter set. Another aspect
of this disclosure includes a novel activation process for
activating a sequence parameter set when using multiple sequence
parameter sets in a single resolution-adaptive coded video
sequence, as described above.
[0009] Yet another aspect of this disclosure includes novel
techniques for managing a decoded picture buffer (DPB). As one
example, a size of a DPB is not indicated using a number of frame
buffers (e.g., a number of storage locations each capable of
storing a frame, or "picture," of a fixed size), consistent with
some techniques, but rather using a different unit of size. As
another example, before inserting a decoded picture into a DPB, the
availability of the DPB to store the decoded picture is determined
based on a spatial resolution of the decoded picture to be
inserted, so as to ensure that the DPB includes sufficient empty
buffer space for inserting the decoded picture. As still another
example, after removing a decoded picture from a DPB, the
availability of the DPB to store a subsequent decoded picture is
determined based on a spatial resolution of the removed decoded
picture, and a spatial resolution of the subsequent decoded picture
to be inserted into the DPB. In other words, the proportion of the
DPB unavailable to store decoded pictures, or a "fullness" of the
DPB, after removing the decoded picture, is not decreased by an
amount corresponding to a single decoded picture of a fixed size,
consistent with some techniques, but rather by a varying amount,
depending on the spatial resolution of the removed decoded
picture.
BRIEF DESCRIPTION OF DRAWINGS
[0010] FIG. 1 is a block diagram illustrating an example video
encoding and decoding system that may utilize techniques described
in this disclosure.
[0011] FIG. 2 is a block diagram illustrating an example video
encoder that may implement the techniques described in this
disclosure.
[0012] FIG. 3 is a block diagram illustrating an example video
decoder that may implement the techniques described in this
disclosure.
[0013] FIGS. 4A-4D are conceptual diagrams illustrating an example
video sequence that includes a plurality of pictures that are
encoded and transmitted in accordance with the techniques of this
disclosure.
[0014] FIG. 5 is a conceptual diagram illustrating the operation of
a decoded picture buffer of a hypothetical reference decoder (HRD)
model in accordance with the techniques of this disclosure.
[0015] FIG. 6 is a flowchart illustrating an example operation of
using a first sub-sequence and a second sub-sequence to decode
video in accordance with the techniques of this disclosure.
[0016] FIG. 7 is a flowchart illustrating an example operation of
managing a decoded picture buffer in accordance with the techniques
of this disclosure.
DETAILED DESCRIPTION
[0017] The techniques of this disclosure are generally related for
techniques for using multiple sequence parameter sets (SPSs) for
communicating video data at different resolutions, and techniques
for managing the multiple SPSs. In the current High Efficiency
Video Coding (HEVC) design, pictures in a same coded video sequence
(CVS) have a same size, wherein the size is signaled in a sequence
parameter set (SPS) for the CVS. Additional syntax information for
the CVS also signaled in the SPS includes the Largest Coding Unit
(LCU) size and the Smallest Coding Unit (SCU) size, which define a
largest and a smallest block, or coding unit, size for each
picture, respectively. In the context of H.264/AVC and High
Efficiency Video Coding (HEVC), a CVS may refer to a sequence of
coded pictures starting from an instantaneous decoding refresh
(IDR) picture to another IDR picture, exclusive, in decoding order,
or the end of the coded video bitstream if the starting IDR picture
is the last IDR picture in coded video bitstream.
[0018] However, HEVC may support resolution-adaptive video
sequences that include frames with different resolutions. One
method for adaptive frame size support is described in JCTVC-F158:
Resolution switching for coding efficiency and resilience, Davies,
6th Meeting, Turin, IT, 14-22 Jul. 2011, referred to as JCTVC-F158
hereinafter.
[0019] To support resolution-adaptive video, this disclosure
describes techniques for coding multiple SPSs. Each SPS of the
multiple SPSs may include information related to a sequence of
pictures that has a different resolution. This disclosure also
introduces a new sequence, referred to as a resolution sub-sequence
(RSS) that may refer back to one of the multiple SPSs in order to
indicate the resolution of a sequence of pictures. This disclosure
also describes techniques for activating a single SPS when multiple
parameters sets may be utilized within a single CVS, as well as
different techniques and orders for transmitting the different
SPSs.
[0020] The techniques of this disclosure are also related to
techniques for managing a decoded picture buffer (DPB). For
example, a video coder (e.g., a video encoder or a video decoder)
includes a DPB. The DPB stores decoded pictures, including
reference pictures. Reference pictures are pictures that can
potentially be used for inter-predicting a picture. In other words,
the video coder may predict a picture, during coding (encoding or
decoding) of that picture, based on one or more reference pictures
stored in the DPB.
[0021] Decoded pictures used for predicting subsequent coded
pictures, and for future output, are buffered in a Decoded Picture
Buffer (DPB).
[0022] To efficiently utilize memory of a DPB, DPB management
processes, including a storage process of decoded pictures into the
DPB, a marking process of reference pictures, and an output and
removal processes of decoded pictures from the DPB, are specified.
DPB management includes at least the following aspects: (1) Picture
identification and reference picture identification; (2) Reference
picture list construction; (3) Reference picture marking; (4)
Picture output from the DPB; (5) Picture insertion into the DPB;
and (6) Picture removal from the DPB. Some introduction to
reference picture marking and reference picture list construction
is included below.
[0023] Each CVS may include a number of reference pictures, which
may be used to predict pixel values of other pictures (e.g.,
pictures that come before or after the reference picture). A video
coder marks each reference picture, and stores the reference
picture in the DPB. In previous video coding standards, such as
H.264/AVC, the DPB includes a maximum number, referred to as M
(num_ref_frames), of reference pictures used for inter-prediction
in the active sequence parameter set. When a reference picture is
decoded, the reference picture is marked as "used for reference."
If the decoding of the reference picture caused more than M
pictures to be marked as "used for reference," at least one picture
must be marked as "unused for reference." The DPB removal process
then would remove pictures marked as "unused for reference" from
the DPB if they are not needed for output as well.
[0024] When a picture is decoded, the decoded picture may be either
a non-reference picture or a reference picture. A reference picture
may be a long-term reference picture or short-term reference
picture, and when the decoded picture is marked as "unused for
reference", the decoded picture may become no longer needed for
reference. In some video coding standards, there may be reference
picture marking operations that change the status of the reference
pictures.
[0025] There may be at least two types of operation modes for the
reference picture marking, such as a sliding window operation mode,
and an adaptive memory control operation mode. The operation mode
for reference picture marking may be selected on a picture basis;
whereas, the sliding window operation mode may work as a
first-in-first-out queue with a fixed number of short-term
reference pictures. In other words, short-term reference pictures
with earliest decoding time may be the first to be removed (marked
as picture not used for reference), in an implicit fashion.
[0026] The video coder may also be tasked with constructing
reference picture lists that indicate which reference pictures may
be used for inter-prediction purposes. Two of these reference
picture lists are referred to as List 0 and List 1, respectively.
The video coder firstly employs default construction techniques to
construct List 0 and List 1 (e.g., preconfigured construction
schemes for constructing List 0 and List 1). Optionally, after the
initial List 0 and List 1 are constructed, the video decoder may
decode syntax elements, when present, that instruct the video
decoder to modify the initial List 0 and List 1.
[0027] The video encoder may signal syntax elements that are
indicative of identifier(s) of reference pictures in the DPB, and
the video encoder may also signal syntax elements that include
indices, within List 0, List 1, or both List 0 and List 1, that
indicate which reference picture or pictures to use to decode a
coded block of a current picture. The video decoder, in turn, uses
the received identifier to identify the index value or values for a
reference picture or reference pictures listed in List 0, List 1,
or both List 0 and List 1. From the index value(s) as well as the
identifier(s) of the reference picture or reference pictures, the
video decoder retrieves the reference picture or reference
pictures, or part(s) thereof, from the DPB, and decodes the coded
block of the current picture based on the retrieved reference
picture or pictures and one or more motion vectors that identify
blocks within the reference picture or pictures that are used for
decoding the coded block.
[0028] In the context of AVC and HEVC, a coded video sequence (CVS)
refers to a sequence of coded frames, or "pictures," ranging from
an instantaneous decoding refresh (IDR) picture to another IDR
picture, exclusive, in a decoding order, or to an end of a coded
video bitstream if the starting IDR picture is the last IDR picture
in the coded video bitstream.
[0029] However, when coding a single CVS comprising pictures having
at least two different spatial resolutions, with respect to some
solutions based on HEVC, e.g., as described in JCTVC-F158, using a
DPB having a size measured in pictures may cause a number of
issues, which are described below.
[0030] First, a sub-sequence of pictures with one resolution may
have different coding parameters, such as an LCU size, than another
sub-sequence of pictures with another, different resolution.
Accordingly, it may not be sufficient to use a single active SPS to
describe characteristics of a CVS comprising the sub-sequences of
pictures with the different resolutions.
[0031] Furthermore, different sub-sequences of a CVS may have
reference pictures having different sizes, that is, different
spatial resolutions. Accordingly, one set of particular parameters
included in an SPS for the CVS, e.g., max_num_ref_frames, may be
optimal for one sub-sequence, but can be sub-optimal for all
sub-sequences included in the CVS.
[0032] Additionally, some techniques for DPB management may no
longer be effective when coding a single CVS that includes pictures
having different resolutions. As one example, because the pictures
having the different resolutions may correspond to the pictures
having different sizes, a size of a DPB used to store the pictures
can no longer be indicated using a number of frame buffers, e.g., a
number of storage locations each capable of storing a frame, or
"picture," of a fixed size.
[0033] Furthermore, to insert a decoded picture into the DPB, the
DPB must include an empty frame buffer of a size that is
sufficiently large to store the decoded picture. However, once
again, because the pictures having the different resolutions may
correspond to the pictures having different sizes, a frame buffer
of a fixed size may not correspond to a size of a particular
decoded picture to be inserted. Accordingly, merely determining
whether the DPB includes an empty frame buffer of a fixed size may
be insufficient to determine whether the DPB is available to store
the decoded picture. As one example, the DPB may have less buffer
space than is required to store the decoded picture.
[0034] Similarly, after removing a decoded picture from the DPB,
wherein the removed decoded picture has a resolution that
corresponds to a size that is different than the size of the frame
buffer, merely determining that the decoded picture has been
removed from the DPB may be insufficient to determine whether the
DPB is actually available to store a subsequent decoded picture
having a particular resolution. Furthermore, the above
determination is also insufficient to indicate the actual buffer
space that may be available within the DPB for storing additional
decoded pictures.
[0035] In another example, a single empty frame buffer of a fixed
size may exist within the DPB, and the DPB may store decoded
picture(s) having a particular resolution in the frame buffer.
However, if a video coder removes a decoded picture from the DPB,
and the removed picture has a resolution that is smaller than the
size of the frame buffer, sufficient buffer space may exist within
the DPB to insert a decoded picture with a resolution that
corresponds to a size that is larger than the size of the removed
decoded picture. Accordingly, merely determining that a particular
decoded picture has been removed from the DPB may be insufficient
to indicate the actual buffer space that may be available within
the DPB for storing additional decoded pictures.
[0036] FIG. 1 is a block diagram illustrating an example video
encoding and decoding system 10 that may utilize techniques
described in this disclosure. In general, a reference picture set
is defined as a set of reference pictures associated with a
picture, consisting of all reference pictures that are prior to the
associated picture in decoding order, that may be used for inter
prediction of the associate picture or any picture following the
associated picture in decoding order. In some examples, the
reference pictures that are prior to the associated picture may be
reference pictures until the next instantaneous decoding refresh
(IDR) picture, or broken link access (BLA) picture. In other words,
reference pictures in the reference picture set may all be prior to
the current picture in decoding order. Also, the reference pictures
in the reference picture set may be used for inter-predicting the
current picture and/or inter-predicting any picture following the
current picture in decoding order until the next IDR picture or BLA
picture.
[0037] For example, some of the reference pictures in the reference
picture set are reference pictures that can potentially be used to
inter-predict a block of the current picture, and not pictures
following the current picture in decoding order. Some of the
reference pictures in the reference picture set are reference
pictures that can potentially be used to inter-predict a block of
the current picture, and blocks in one or more pictures following
the current picture in decoding order. Some of the reference
pictures in the reference picture set are reference pictures that
can potentially be used to inter-predict blocks in one or more
pictures following the current picture in decoding order, and
cannot be used to inter-predict a block in the current picture.
[0038] As used in this disclosure, reference pictures that can
potentially be used for inter-prediction refer to reference
pictures that can be used for inter-prediction, but do not
necessarily have to be used for inter-prediction. For example, the
reference picture set may identify reference pictures that can
potentially be used for inter-prediction. However, this does not
mean that all of the identified reference pictures must be used for
inter-prediction. Rather, one or more of these identified reference
pictures could be used for inter-prediction, but all do not
necessarily have to be used for inter-prediction.
[0039] As shown in FIG. 1, system 10 includes a source device 12
that generates encoded video for decoding by destination device 14.
Source device 12 and destination device 14 may each be an example
of a video coding device. Source device 12 may transmit the encoded
video to destination device 14 via communication channel 16 or may
store the encoded video on a storage medium 17 or a file server 19,
such that the encoded video may be accessed by the destination
device 14 as desired.
[0040] Source device 12 and destination device 14 may comprise any
of a wide range of devices, including a wireless handset such as
so-called "smart" phones, so-called "smart" pads, or other such
wireless devices equipped for wireless communication. Additional
examples of source device 12 and destination device 14 include, but
are not limited to, a digital television, a device in digital
direct broadcast system, a device in wireless broadcast system, a
personal digital assistants (PDA), a laptop computer, a desktop
computer, a tablet computer, an e-book reader, a digital camera, a
digital recording device, a digital media player, a video gaming
device, a video game console, a cellular radio telephone, a
satellite radio telephone, a video teleconferencing device, and a
video streaming device, a wireless communication device, or the
like.
[0041] As indicated above, in many cases, source device 12 and/or
destination device 14 may be equipped for wireless communication.
Hence, communication channel 16 may comprise a wireless channel, a
wired channel, or a combination of wireless and wired channels
suitable for transmission of encoded video data. Similarly, the
file server 19 may be accessed by the destination device 14 through
any standard data connection, including an Internet connection.
This may include a wireless channel (e.g., a Wi-Fi connection), a
wired connection (e.g., DSL, cable modem, etc.), or a combination
of both that is suitable for accessing encoded video data stored on
a file server.
[0042] The techniques of this disclosure, however, may be applied
to video coding in support of any of a variety of multimedia
applications, such as over-the-air television broadcasts, cable
television transmissions, satellite television transmissions,
streaming video transmissions, e.g., via the Internet, encoding of
digital video for storage on a data storage medium, decoding of
digital video stored on a data storage medium, or other
applications. In some examples, system 10 may be configured to
support one-way or two-way video transmission to support
applications such as video streaming, video playback, video
broadcasting, and/or video telephony
[0043] In the example of FIG. 1, source device 12 includes a video
source 18, video encoder 20, a modulator/demodulator (MODEM) 22 and
an output interface 24. In source device 12, video source 18 may
include a source such as a video capture device, such as a video
camera, a video archive containing previously captured video, a
video feed interface to receive video from a video content
provider, and/or a computer graphics system for generating computer
graphics data as the source video, or a combination of such
sources. As one example, if video source 18 is a video camera,
source device 12 and destination device 14 may form so-called
camera phones or video phones. However, the techniques described in
this disclosure may be applicable to video coding in general, and
may be applied to wireless and/or wired applications.
[0044] The captured, pre-captured, or computer-generated video may
be encoded by video encoder 20. The encoded video information may
be modulated by modem 22 according to a communication standard,
such as a wireless communication protocol, and transmitted to
destination device 14 via output interface 24. Modem 22 may include
various mixers, filters, amplifiers or other components designed
for signal modulation. Output interface 24 may include circuits
designed for transmitting data, including amplifiers, filters, and
one or more antennas.
[0045] The captured, pre-captured, or computer-generated video that
is encoded by the video encoder 20 may also be stored onto a
storage medium 17 or a file server 19 for later consumption. The
storage medium 17 may include Blu-ray discs, DVDs, CD-ROMs, flash
memory, or any other suitable digital storage media for storing
encoded video. The encoded video stored on the storage medium 17
may then be accessed by destination device 14 for decoding and
playback.
[0046] File server 19 may be any type of server capable of storing
encoded video and transmitting that encoded video to the
destination device 14. Example file servers include a web server
(e.g., for a website), an FTP server, network attached storage
(NAS) devices, a local disk drive, or any other type of device
capable of storing encoded video data and transmitting it to a
destination device. The transmission of encoded video data from the
file server 19 may be a streaming transmission, a download
transmission, or a combination of both. The file server 19 may be
accessed by the destination device 14 through any standard data
connection, including an Internet connection. This may include a
wireless channel (e.g., a Wi-Fi connection), a wired connection
(e.g., DSL, cable modem, Ethernet, USB, etc.), or a combination of
both that is suitable for accessing encoded video data stored on a
file server.
[0047] Destination device 14, in the example of FIG. 1, includes an
input interface 26, a modem 28, a video decoder 30, and a display
device 32. Input interface 26 of destination device 14 receives
information over channel 16, as one example, or from storage medium
17 or file server 17, as alternate examples, and modem 28
demodulates the information to produce a demodulated bitstream for
video decoder 30. The demodulated bitstream may include a variety
of syntax information generated by video encoder 20 for use by
video decoder 30 in decoding video data. Such syntax may also be
included with the encoded video data stored on a storage medium 17
or a file server 19. As one example, the syntax may be embedded
with the encoded video data, although aspects of this disclosure
should not be considered limited to such a requirement. The syntax
information defined by video encoder 20, which is also used by
video decoder 30, may include syntax elements that describe
characteristics and/or processing of video blocks, such as coding
tree units (CTUs), coding tree blocks (CTBs), prediction units
(PUs), coding units (CUs) or other units of coded video, e.g.,
video slices, video pictures, and video sequences or groups of
pictures (GOPs). Each of video encoder 20 and video decoder 30 may
form part of a respective encoder-decoder (CODEC) that is capable
of encoding or decoding video data.
[0048] Display device 32 may be integrated with, or external to,
destination device 14. In some examples, destination device 14 may
include an integrated display device and also be configured to
interface with an external display device. In other examples,
destination device 14 may be a display device. In general, display
device 32 displays the decoded video data to a user, and may
comprise any of a variety of display devices such as a liquid
crystal display (LCD), a plasma display, an organic light emitting
diode (OLED) display, or another type of display device.
[0049] In the example of FIG. 1, communication channel 16 may
comprise any wireless or wired communication medium, such as a
radio frequency (RF) spectrum or one or more physical transmission
lines, or any combination of wireless and wired media.
Communication channel 16 may form part of a packet-based network,
such as a local area network, a wide-area network, or a global
network such as the Internet. Communication channel 16 generally
represents any suitable communication medium, or collection of
different communication media, for transmitting video data from
source device 12 to destination device 14, including any suitable
combination of wired or wireless media. Communication channel 16
may include routers, switches, base stations, or any other
equipment that may be useful to facilitate communication from
source device 12 to destination device 14.
[0050] Video encoder 20 and video decoder 30 may operate according
to a video compression standard, such as the include ITU-T H.261,
ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T
H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 (also known as ISO/IEC
MPEG-4 AVC), including its Scalable Video Coding (SVC) and
Multiview Video Coding (MVC) extensions. In addition, there is a
new video coding standard, namely High Efficiency Video Coding
(HEVC) standard presently under development by the Joint
Collaboration Team on Video Coding (JCT-VC) of ITU-T Video Coding
Experts Group (VCEG) and ISO/IEC Motion Picture Experts Group
(MPEG). A recent Working Draft (WD) of HEVC, and referred to as
HEVC WD8 hereinafter, is available, as of Jul. 20, 2012, from
http://phenix.int-evry.fr/jct/doc_end_user/documents/10_Stockholm/wg11/JC-
TVC-J1003-v8.zip.
[0051] The techniques of this disclosure, however, are not limited
to any particular coding standard. For purposes of illustration
only, the techniques are described in accordance with the HEVC
standard.
[0052] Although not shown in FIG. 1, in some aspects, video encoder
20 and video decoder 30 may each be integrated with an audio
encoder and decoder, and may include appropriate MUX-DEMUX units,
or other hardware and software, to handle encoding of both audio
and video in a common data stream or separate data streams. If
applicable, MUX-DEMUX units may conform to the ITU H.223
multiplexer protocol, or other protocols such as the user datagram
protocol (UDP).
[0053] Video encoder 20 and video decoder 30 each may be
implemented as any of a variety of suitable encoder circuitry, such
as one or more processors including microprocessors, digital signal
processors (DSPs), application specific integrated circuits
(ASICs), field programmable gate arrays (FPGAs), discrete logic,
software, hardware, firmware or any combinations thereof. When the
techniques are implemented partially in software, a device may
store instructions for the software in a suitable, non-transitory
computer-readable medium and execute the instructions in hardware
using one or more processors to perform the techniques of this
disclosure.
[0054] Each of video encoder 20 and video decoder 30 may be
included in one or more encoders or decoders, either of which may
be integrated as part of a combined encoder/decoder (CODEC) in a
respective device. In some instances, video encoder 20 and video
decoder 30 may be commonly referred to as a video coder that codes
information (e.g., pictures and syntax elements). The coding of
information may refer to encoding when the video coder corresponds
to video encoder 20. The coding of information may refer to
decoding when the video coder corresponds to video decoder 30.
[0055] FIG. 2 is a block diagram illustrating an example video
encoder 20 that may implement the techniques described in this
disclosure. Video encoder 20 may perform intra- and inter-coding of
video blocks within video slices. Intra coding relies on spatial
prediction to reduce or remove spatial redundancy in video within a
given video frame or picture. Inter-coding relies on temporal
prediction to reduce or remove temporal redundancy in video within
adjacent frames or pictures of a video sequence. Intra-mode (I
mode) may refer to any of several spatial based compression modes.
Inter-modes, such as uni-directional prediction (P mode) or
bi-prediction (B mode), may refer to any of several temporal-based
compression modes.
[0056] In the example of FIG. 2, video encoder 20 includes a
partitioning unit 35, prediction processing unit 41, summer 50,
transform processing unit 52, quantization unit 54, entropy
encoding unit 56, decoded picture buffer (DPB) 64, and DPB
management unit 65. Prediction processing unit 41 includes motion
estimation unit 42, motion compensation unit 44, and intra
prediction unit 46. For video block reconstruction, video encoder
20 also includes inverse quantization unit 58, inverse transform
unit 60, and summer 62. A deblocking filter (not shown in FIG. 2)
may also be included to filter block boundaries to remove
blockiness artifacts from reconstructed video. If desired, the
deblocking filter would typically filter the output of summer 62.
Additional loop filters (in loop or post loop) may also be used in
addition to the deblocking filter.
[0057] As shown in FIG. 2, video encoder 20 receives video data,
and partitioning unit 35 partitions the data into video blocks.
This partitioning may also include partitioning into slices, tiles,
or other larger units, as wells as video block partitioning, e.g.,
according to a quadtree structure of LCUs and CUs. Video encoder 20
generally illustrates the components that encode video blocks
within a video slice to be encoded. The slice may be divided into
multiple video blocks (and possibly into sets of video blocks
referred to as tiles). Prediction processing unit 41 may select one
of a plurality of possible coding modes, such as one of a plurality
of intra coding modes or one of a plurality of inter coding modes,
for the current video block based on error results (e.g., coding
rate and the level of distortion). Prediction processing unit 41
may provide the resulting intra- or inter-coded block to summer 50
to generate residual block data and to summer 62 to reconstruct the
encoded block for use as a reference picture.
[0058] Intra prediction unit 46 within prediction processing unit
41 may perform intra-predictive coding of the current video block
relative to one or more neighboring blocks in the same picture or
slice as the current block to be coded to provide spatial
compression. Motion estimation unit 42 and motion compensation unit
44 within prediction processing unit 41 perform inter-predictive
coding of the current video block relative to one or more
predictive blocks in one or more reference pictures to provide
temporal compression.
[0059] Motion estimation unit 42 may be configured to determine the
inter-prediction mode for a video slice according to a
predetermined pattern for a video sequence. The predetermined
pattern may designate video slices in the sequence as P slices or B
slices. Motion estimation unit 42 and motion compensation unit 44
may be highly integrated, but are illustrated separately for
conceptual purposes. Motion estimation, performed by motion
estimation unit 42, is the process of generating motion vectors,
which estimate motion for video blocks. A motion vector, for
example, may indicate the displacement of a PU of a video block
within a current video picture relative to a predictive block
within a reference picture.
[0060] A predictive block is a block that is found to closely match
the PU of the video block to be coded in terms of pixel difference,
which may be determined by sum of absolute difference (SAD), sum of
square difference (SSD), or other difference metrics. In some
examples, video encoder 20 may calculate values for sub-integer
pixel positions of reference pictures stored in decoded picture
buffer 64. For example, video encoder 20 may interpolate values of
one-quarter pixel positions, one-eighth pixel positions, or other
fractional pixel positions of the reference picture. Therefore,
motion estimation unit 42 may perform a motion search relative to
the full pixel positions and fractional pixel positions and output
a motion vector with fractional pixel precision.
[0061] Motion estimation unit 42 calculates a motion vector for a
PU of a video block in an inter-coded slice by comparing the
position of the PU to the position of a predictive block of a
reference picture. The reference picture may be selected from a
first reference picture list (List 0) or a second reference picture
list (List 1), each of which identify one or more reference
pictures stored in decoded picture buffer 64. Motion estimation
unit 42 sends the calculated motion vector to entropy encoding unit
56 and motion compensation unit 44.
[0062] Motion compensation, performed by motion compensation unit
44, may involve fetching or generating the predictive block based
on the motion vector determined by motion estimation, possibly
performing interpolations to sub-pixel precision. Upon receiving
the motion vector for the PU of the current video block, motion
compensation unit 44 may locate the predictive block to which the
motion vector points in one of the reference picture lists. Video
encoder 20 forms a residual video block by subtracting pixel values
of the predictive block from the pixel values of the current video
block being coded, forming pixel difference values. The pixel
difference values form residual data for the block, and may include
both luma and chroma difference components. Summer 50 represents
the component or components that perform this subtraction
operation. Motion compensation unit 44 may also generate syntax
elements associated with the video blocks and the video slice for
use by video decoder 30 in decoding the video blocks of the video
slice.
[0063] Intra-prediction unit 46 may intra-predict a current block,
as an alternative to the inter-prediction performed by motion
estimation unit 42 and motion compensation unit 44, as described
above. In particular, intra-prediction unit 46 may determine an
intra-prediction mode to use to encode a current block. In some
examples, intra-prediction unit 46 may encode a current block using
various intra-prediction modes, e.g., during separate encoding
passes, and intra-prediction unit 46 (or mode select unit 40, in
some examples) may select an appropriate intra-prediction mode to
use from the tested modes. For example, intra-prediction unit 46
may calculate rate-distortion values using a rate-distortion
analysis for the various tested intra-prediction modes, and select
the intra-prediction mode having the best rate-distortion
characteristics among the tested modes. Rate-distortion analysis
generally determines an amount of distortion (or error) between an
encoded block and an original, unencoded block that was encoded to
produce the encoded block, as well as a bit rate (that is, a number
of bits) used to produce the encoded block. Intra-prediction unit
46 may calculate ratios from the distortions and rates for the
various encoded blocks to determine which intra-prediction mode
exhibits the best rate-distortion value for the block.
[0064] After selecting an intra-prediction mode for a block,
intra-prediction unit 46 may provide information indicative of the
selected intra-prediction mode for the block to entropy encoding
unit 56. Entropy encoding unit 56 may encode the information
indicating the selected intra-prediction mode in accordance with
the techniques of this disclosure. Video encoder 20 may include in
the transmitted bitstream configuration data, which may include a
plurality of intra-prediction mode index tables and a plurality of
modified intra-prediction mode index tables (also referred to as
codeword mapping tables), definitions of encoding contexts for
various blocks, and indications of a most probable intra-prediction
mode, an intra-prediction mode index table, and a modified
intra-prediction mode index table to use for each of the
contexts.
[0065] After prediction processing unit 41 generates the predictive
block for the current video block via either inter-prediction or
intra-prediction, video encoder 20 forms a residual video block by
subtracting the predictive block from the current video block. The
residual video data in the residual block may be included in one or
more TUs and applied to transform processing unit 52. Transform
processing unit 52 transforms the residual video data into residual
transform coefficients using a transform, such as a discrete cosine
transform (DCT) or a conceptually similar transform. Transform
processing unit 52 may convert the residual video data from a pixel
domain to a transform domain, such as a frequency domain.
[0066] Transform processing unit 52 may send the resulting
transform coefficients to quantization unit 54. Quantization unit
54 quantizes the transform coefficients to further reduce bit rate.
The quantization process may reduce the bit depth associated with
some or all of the coefficients. The degree of quantization may be
modified by adjusting a quantization parameter. In some examples,
quantization unit 54 may then perform a scan of the matrix
including the quantized transform coefficients. Alternatively,
entropy encoding unit 56 may perform the scan.
[0067] Following quantization, entropy encoding unit 56 entropy
encodes the quantized transform coefficients. For example, entropy
encoding unit 56 may perform context adaptive variable length
coding (CAVLC), context adaptive binary arithmetic coding (CABAC),
syntax-based context-adaptive binary arithmetic coding (SBAC),
probability interval partitioning entropy (PIPE) coding or another
entropy encoding methodology or technique. Following the entropy
encoding by entropy encoding unit 56, the encoded bitstream may be
transmitted to video decoder 30, or archived for later transmission
or retrieval by video decoder 30. Entropy encoding unit 56 may also
entropy encode the motion vectors and the other syntax elements for
the current video slice being coded.
[0068] Inverse quantization unit 58 and inverse transform unit 60
apply inverse quantization and inverse transformation,
respectively, to reconstruct the residual block in the pixel domain
for later use as a reference block of a reference picture. Motion
compensation unit 44 may calculate a reference block by adding the
residual block to a predictive block of one of the reference
pictures within one of the reference picture lists. Motion
compensation unit 44 may also apply one or more interpolation
filters to the reconstructed residual block to calculate
sub-integer pixel values for use in motion estimation. Summer 62
adds the reconstructed residual block to the motion compensated
prediction block produced by motion compensation unit 44 to produce
a reference block for storage in decoded picture buffer 64. The
reference block may be used by motion estimation unit 42 and motion
compensation unit 44 as a reference block to inter-predict a block
in a subsequent video frame or picture.
[0069] In accordance with this disclosure, prediction processing
unit 41 represents one example unit for performing the example
functions described above. For example, prediction processing unit
41 may encode syntax elements that support the use of adaptive
resolution CVSs. Prediction processing unit 41 may also generate
SPSs that may be activated by one or more resolution sub-sequences,
and transmit the SPSs and RSSs to a video decoder. Each of the SPSs
may include resolution information for one or more sequences of
pictures. Prediction processing unit 41 may also receive and order
one or more SPSs and cause video encoder 20 to code information
indicative of the reference pictures that belong to the reference
picture set. In addition, DPB management unit 65 may also perform
techniques related to the management of DPB 64.
[0070] Also, during the reconstruction process (e.g., the process
used to reconstruct a picture for use as a reference picture and
storage in DPB 64), prediction processing unit 41 may construct the
plurality of reference picture subsets that each identifies one or
more of the reference pictures. Prediction processing unit
processing 41 may also derive the reference picture set from the
constructed plurality of reference picture subsets. Also,
prediction processing unit 41 and DPB management unit 65 may
implement any one or more of the sets of example pseudo code
described below to implement one or more example techniques
described in this disclosure.
[0071] In accordance with the techniques of this disclosure,
prediction processing unit 41 may generate a coded video sequence
comprising a first sub-sequence and a second sub-sequence, wherein
the first sub-sequence includes one or more frames each having a
first resolution. The second sub-sequence may include one or more
frames each having a second resolution. The first sub-sequence may
be different than the second sub-sequence, and the first resolution
may be different than the second resolution. Prediction processing
unit 41 may further generate a first sequence parameter set and a
second sequence parameter set for the video sequence. The first
sequence parameter set may indicate the first resolution of the one
or more frames of the first sub-sequence, and the second sequence
parameter set may indicate the second resolution of the one or more
frames of the second sub-sequence. Also, the first sequence
parameter set may be different than the second sequence parameter
set. Prediction processing unit 41 may transmit the coded video
sequence comprising the first sub-sequence and the second
sub-sequence, and the first sequence parameter set and the second
sequence. In some examples, the resolution may comprise a spatial
resolution.
[0072] Prediction processing unit 41 may also alter the coding of
the sequence parameters sets. For example, prediction processing
unit 41 may code the first sequence parameter set and the second
sequence parameter in a transmitted bitstream prior to either the
first sub-sequence or the second sub-sequence. Prediction
processing unit 41 may also interleave in the coded video sequence
the one or more frames of the first sub-sequence and the one or
more frames of the second sub-sequence.
[0073] In some examples, to transmit the first sequence parameter
set and the second sequence parameter set of the coded video
sequence, prediction processing unit 41 may be configured to
transmit both the first sequence parameter set and the second
sequence parameter set prior to transmitting either of the first
sub-sequence and the second sub-sequence. In another example, to
transmit the first sequence parameter set and the second sequence
parameter set of the coded video sequence, prediction processing
unit 41 may be configured to transmit the second sequence parameter
set after transmitting at least one frame of the one or more frames
of the first sub-sequence, and prior to transmitting the second
sub-sequence.
[0074] In some examples, prediction processing unit 41 may code the
first sequence parameter set in a transmitted bitstream prior to
coding the first sub-sequence and prediction processing unit 41 may
also code the second sequence parameter set in the transmitted
bitstream after at least one frame of the one or more frames of the
first sub-sequence and prior to the second sub-sequence.
[0075] Decoded picture buffer 64, decoded picture buffer management
unit 65 and video encoder 20 may also perform the techniques of
this disclosure. In some examples, decoded picture buffer 64 may
receive a first decoded frame of video data, wherein the first
decoded frame is associated with a first resolution, determine
whether a decoded picture buffer is available to store the first
decoded frame based on the first resolution, and in the event the
decoded picture buffer is available to store the first decoded
frame, store the first decoded frame in the decoded picture buffer,
and determine whether decoded buffer 64 is available to store a
second decoded frame of video data, wherein the second decoded
frame is associated with a second resolution, based on the first
resolution and the second resolution, wherein the first decoded
frame is different than the second decoded frame.
[0076] In some additional examples, DPB management unit 65 may
determine an amount of information that may be stored within
decoded picture buffer 64, determine an amount of information
associated with the first decoded frame based on the first
resolution, and compare the amount of information that may be
stored within decoded picture buffer 64, and the amount of
information associated with the first decoded frame.
[0077] In one example, to determine whether decoded picture buffer
64 is available to store the second decoded frame based on the
first resolution and the second resolution, DPB management unit 65
may be configured to determine an amount of information that may be
stored within decoded picture buffer 64 based on the first
resolution, determine an amount of information associated with the
second decoded frame based on the second resolution, and compare
the amount of information that may be stored within decoded picture
buffer 64 and the amount of information associated with the second
decoded frame. DPB management unit 65 may also be configured to
remove the first decoded frame from decoded picture buffer 64, and
in some examples, the resolution may comprise a spatial
resolution.
[0078] The techniques described in this disclosure may refer to
video encoder 20 signaling information. When video encoder 20
signals information, the techniques of this disclosure generally
refer to any manner in which video encoder 20 provides the
information in a coded bitstream. For example, when video encoder
20 signals syntax elements to video decoder 30, it may mean that
video encoder 20 transmitted the syntax elements to video decoder
30 as part of a coded bitstream via output interface 24 and
communication channel 16, or that video encoder 20 stored the
syntax elements in a coded bitstream on storage medium 17 and/or
file server 19 for eventual reception by video decoder 30. In this
way, signaling from video encoder 20 to video decoder 30 should not
be interpreted as requiring transmission directly from video
encoder 20 to video decoder 30, although this may be one
possibility for real-time video applications. In other examples,
however, signaling from video encoder 20 to video decoder 30 should
be interpreted as any technique with which video encoder 20
provides information in a bitstream for eventual reception by video
decoder 30, either directly or via an intermediate storage (e.g.,
in storage medium 17 and/or file server 19).
[0079] Video encoder 20 and video decoder 30 may be configured to
implement the example techniques described in this disclosure for
coding, transmitting, receiving and activating SPSs and RSSs, as
well as for managing the DPB. For example, video decoder 30 may
invoke the techniques to support adaptive resolution CVSs and to
add and remove reference pictures from the DPB. Video decoder 30
may invoke the process in a similar manner.
[0080] To support SPSs in a single adaptive-resolution CVS,
prediction processing unit 41 may utilize RSSs. Each RSS may
indicate information, such as a resolution of a series of coded
video pictures of a CVS. Prediction processing unit 41 may use one
resolution sub-sequence (RSS) at given time. Each RSS may reference
a single SPS. As an example, if there are "n" RSSs in a given CVS,
there may be, altogether, "n" active SPSs when decoding the CVS.
However, in some examples, multiple RSSs may refer to a single SPS
in a CVS. The SPS or PPS may indicate the different resolution of
each RSS. The SPS or PPS may include a resolution ID as well as a
syntax element that indicates the resolution associated with each
resolution ID.
[0081] In accordance with the techniques of this disclosure, a
computer-readable storage medium may include a data structure that
represents CVSs, SPSs, and RSSs. In particular, the data structure
may include a coded video sequence comprising a first sub-sequence
and a second sub-sequence. The first sub-sequence may include one
or more frames each having a first resolution, and the second
sub-sequence may include one or more frames each having a second
resolution. The first sub-sequence may also be different than the
second sub-sequence, and the first resolution may be different than
the second resolution. The data structure may further comprise a
first sequence parameter set and a second sequence parameter set
for the coded video sequence. The first sequence parameter set may
indicate the first resolution of the one or more frames of the
first sub-sequence, the second sequence parameter set may indicate
the second resolution of the one or more frames of the second
sub-sequence, and the first sequence parameter set may be different
than the second sequence parameter set.
[0082] Prediction processing unit 41 of video encoder 20 may order
or restrict each of the RSSs according to spatial resolution
characteristics of each RSS. In general, prediction processing unit
41 may order the SPSs based on their horizontal resolutions. As an
example, if a horizontal size of a resolution "A" of an SPS is
greater than that of a resolution "B" of an SPS, a vertical size of
the resolution "A" may not be less than that of the resolution "B."
With this restriction, a resolution "C" of an SPS may be considered
to be larger than a resolution "D" of an SPS as long as one of a
horizontal size and a vertical size of the resolution "C" is
greater than a corresponding size of the resolution "D." Video
encoder 20 may assign an RSS with a largest spatial resolution a
resolution ID equal to "0," and an RRS with a second largest
spatial resolution a resolution ID equal to "1," and so forth.
[0083] In some examples, prediction processing unit 41 may not
signal a resolution ID. Rather, video encoder 20 may derive the
resolution ID according to the spatial resolutions of the RSSs.
Prediction processing unit 41 may still order each of the RSSs in
each CVS according to the spatial resolutions of each RSS, as
described above. The RSS with the largest spatial resolution is
assigned a resolution ID equal to 0, and the RSS with the second
largest spatial resolution is assigned a resolution ID equal to 1,
and so on.
[0084] For any RSS with a resolution ID equal to "rId," during
inter-prediction, prediction processing unit 41 may refer to
decoded pictures only within the same RSS, within an RSS with a
resolution ID equal to "rId-1," or within an RSS with a resolution
ID equal to "rId+1." Prediction processing unit 41 may not refer to
decoded pictures within other RSSs when performing
inter-prediction.
[0085] In some examples, there may be additional restrictions on
inter-prediction amongst RSSs. In one instance, an RSS prediction
processing unit 41 may only perform inter-prediction of blocks from
two adjacent RSSs, i.e., the RSS with the immediately larger
spatial resolution and the RSS with the immediately smaller spatial
resolution. In another example, prediction processing unit 41 may
not be limited to performing inter-prediction using
spatially-neighboring RSSs, and prediction processing unit 41 may
perform inter-prediction using any RSS, not just spatially
neighboring RSSs (e.g., RSSs with rId+1 or rId-1).
[0086] The techniques of this disclosure may also include processes
and techniques for transmitting and activating picture parameters
sets (PPSs). The use of PPSs may decouple the transmission of
infrequently changing information from the transmission of coded
block data for the CVSs. Video encoder 20 and decoder 30 may, in
some applications convey or signal the SPSs and PPSs "out-of-band,"
or using a different communication channel than that used to
communicate the coded block data of the CVSs, e.g., using a
reliable transport mechanism.
[0087] A PPS raw byte sequence payload (RBSP) may include
parameters to which coded slice network abstraction layer (NAL)
units of one or more coded pictures may refer. Each PPS RBSP is
initially considered not active at a start of a decoding process.
At most, one PPS RBSP is considered active at any given moment
during the decoding process, and activation of any particular PPS
RBSP results in deactivation of a previously-active PPS RBSP, if
any.
[0088] In some examples, prediction processing unit 41 of video
encoder 20 and prediction processing unit 81 of video decoder 30
may support RSSs each having the same resolution aspect ratio. In
other examples, video encoder 20 and decoder 30 may support
different RSSs having different resolution aspect ratios among the
different RSSs. The resolution aspect ratio of an RSS may be
defined as the proportion of the width of an RSS versus the height
of the RSS.
[0089] In the example where prediction processing units 41 and 81
support RSSs having different resolution aspect ratios, prediction
processing units 41 and 81 may crop a portion of a block of a
reference picture having a first resolution aspect ratio in order
to predict the values of a predictive block having a second,
different resolution aspect ratio. The techniques of this
disclosure define a number of syntax elements, referred to as
cropping parameters, which may be signaled in the RBSP of an SPS to
indicate how a reference picture should be cropped. The cropped
area of the reference picture may be referred to as a "cropping
window."
[0090] In order to support CVSs with adaptive-resolution, the
techniques of this disclosure propose adding following syntax
structures to the SPS. The syntax elements may include a profile
indicator or a flag that indicates the existence of more than one
spatial resolution in the CVS. Alternatively no flag may be added,
but the existence of the more than one spatial resolution in the
CVS may be indicated by a particular value of the profile
indicator, which may be denoted as profile_idc. Additionally, the
syntax elements may include a resolution ID, a syntax element that
indicates a spatial relationship between the current resolution
sub-sequence and an adjacent spatial resolution sub-sequence, and a
syntax element that indicates the required size of the DPB in units
of 8.times.8 blocks.
[0091] According to the techniques of this disclosure, a modified
SPS RBSP syntax structure may be expressed as shown below in Table
I:
TABLE-US-00001 TABLE I seq_parameter_set_rbsp( ) { Descriptor
profile_idc u(8) reserved_zero_8bits /* equal to 0 */ u(8)
level_idc u(8) seq_parameter_set_id ue(v)
max_temporal_layers_minus1 u(3) pic_width_in_luma_samples u(16)
pic_height_in_luma_samples u(16) bit_depth_luma_minus8 ue(v)
bit_depth_chroma_minus8 ue(v) pcm_bit_depth_luma_minus1 u(4)
pcm_bit_depth_chroma_minus1 u(4) log2_max_pic_order_cnt_lsb_minus4
ue(v) max_num_ref_frames ue(v) log2_min_coding_block_size_minus3
ue(v) log2_diff_max_min_coding_block_size ue(v)
log2_min_transform_block_size_minus2 ue(v)
log2_diff_max_min_transform_block_size ue(v)
log2_min_pcm_coding_block_size_minus3 ue(v)
max_transform_hierarchy_depth_inter ue(v)
max_transform_hierarchy_depth_intra ue(v)
chroma_pred_from_luma_enabled_flag u(1)
loop_filter_across_slice_flag u(1)
sample_adaptive_offset_enabled_flag u(1)
adaptive_loop_fiter_enabled_flag u(1) pcm_loop_filter_disable_flag
u(1) cu_qp_delta_enabled_flag u(1) temporal_id_nesting_flag u(1)
inter_4x4_enabled_flag u(1) adaptive_spatial_resolution_flag u(1)
if (adaptive_spatial_resolution_flag ) { resolution_id ue(v) for (
i = 0; i < 2; i++) { cropping_resolution_idc[ i ] u(2) if
(cropping_resolution_idc[ i ] & 0x01) { cropped_left[ i ] ue(v)
cropped_right[ i ] ue(v) } if (cropping_resolution_idc[ i ] &
0x10) { cropped_top[ i ] ue(v) cropped_bottom[ i ] ue(v) } } }
max_dec_pic_buffering ue(v) rbsp_trailing_bits( ) }
[0092] An exemplary description of the new SPS syntax elements in
Table I is set forth in more detail below.
[0093] adaptive_spatial_resolution_flag: When equal to "1," the
flag indicates that a CVS containing an RSS referring to an SPS may
contain pictures with different spatial resolutions. When equal to
"0," the flag indicates that all pictures in the CVS have a same
spatial resolution, or equivalently, that there is only one RSS in
the CVS. This syntax element applies to the entire CVS, and its
value shall be identical for all SPSs that may be activated for the
CVS.
[0094] The adaptive_spatial_resolution flag is only one example of
how adaptive resolution CVSs may be implemented. As another
example, there may be one or more profiles defined that enable
adaptive spatial resolution. Accordingly, the value of the
profile_idc syntax element, which may indicate the selection of an
adaptive resolution profile, may signal the enablement of adaptive
resolution.
[0095] resolution_id: Specifies an identifier of the RSS referring
to the SPS. A value of resolution_id may be in a range of "0" to
"7," inclusive. An RSS with a largest spatial resolution among all
RSSs in the CVS may have resolution_id equal to "0."
[0096] cropping_resolution_idc[i]: Indicates whether cropping is
needed to specify a reference region of a reference picture from a
target RSS, as defined below, used for inter-prediction as a
reference when decoding a coded picture from a current RSS.
[0097] The pseudocode that follows describes one example of how the
numbering of an RSS using the resolution_id value that refers to an
SPS may be implemented according to the techniques of this
disclosure. [0098] Let "rId" be a resolution_id of the current RSS;
[0099] The target RSS is the RSS with a resolution_id equal to:
rId+(i==0?-1:1); [0100] If the current RSS has a resolution_id
equal to 0, cropping_resolution_idc[0]=0 [0101] If the current RSS
has a largest resolution_id among all RSSs in the CVS,
cropping_resolution_idc[1]=0
[0102] As described above, the techniques of this disclosure may
enable RSSs and SPSs that may have different aspect ratios. When
performing inter-prediction, video encoder 20 may predict the pixel
values of a block from a block of a reference picture that has a
different aspect ratio. Because of the difference in the aspect
ratios, video encoder 20 may crop the portion of the block of the
reference block in order to obtain a block with a similar
resolution aspect ratio to the predictive block. The following
syntax elements describe how video encoder 20 may perform cropping
of blocks to obtain blocks with different resolution aspect
ratios.
[0103] Cropping_resolution_idc[i] equal to "0" indicates that the
target RSS does not exist, or that no cropping is needed.
[0104] Cropping_resolution_idc[i] equal to "1" indicates that
cropping at a left and/or right side is needed.
[0105] Cropping_resolution_idc[i] equal to "2" indicates that
cropping at a top and/or bottom is needed.
[0106] Cropping_resolution_idc[i] equal to "3" indicates that
cropping at both the left/right and the top/bottom is needed.
[0107] Table II below illustrates the various values of
Cropping_resolution_idc[i], and the corresponding indications.
TABLE-US-00002 TABLE II cropping_resolution_idc[ i ] 0 No cropping
is needed 1 Cropping may happen at the left and/or right side 2
Cropping may happen at the top and/or bottom 3 Cropping may happen
at both left/right and top/bottom
[0108] In addition to "cropping_resolution_idc" value, the RBSP of
an SPS may also include syntax elements that may indicate the
number of pixels to be cropped from the top, bottom, left, and/or
right of a reference picture from an RSS. These additional cropping
syntax elements are described in further detail below.
[0109] cropped_left[i]: Specifies a number of pixels to be cropped
at a left side of a luma component of the reference picture from
the target RSS, to specify the reference region. When not present,
video encoder 20 may infer the value to be equal to "0."
[0110] cropped_right[i]: Specifies a number of pixels to be cropped
at a right side of the luma component of the reference picture from
the target RSS, to specify the reference region. When not present,
video encoder 20 may infer the value to be equal to "0."
[0111] cropped_top[i]: Specifies a number of pixels to be cropped
at a top of the luma component of the reference picture from the
target RSS, to specify the reference region. When not present,
video encoder 20 may infer the value to be equal to "0."
[0112] cropped_bottom[i]: Specifies a number of pixels to be
cropped at a bottom of the luma component of the reference picture
from the target RSS, to specify the reference region. When not
present, video encoder 20 may infer the value to be equal to
"0."
[0113] In addition to signaling a bottom, top, left, and/or right
cropping, video encoder 20 may signal the cropping window in other
ways. As an example, video encoder 20 may signal the cropping
window as the starting vertical and horizontal positions plus the
width and height. As another example, video encoder 20 may signal
the cropping window as the starting vertical and horizontal
positions and the ending vertical and horizontal positions.
[0114] Before prediction processing unit 41 may use a coded picture
in the current RSS, prediction processing unit 41 may crop a
decoded picture from the target RSS as specified by the above
cropping syntax elements. prediction processing unit 41 may also
scale the cropped reference picture to be the same resolution as
the coded picture in the current RSS, and scale the motion vectors
of the cropped block accordingly.
[0115] As described above, video encoder 20 may each include DPB 64
that may contain decoded pictures. DPB management units 65 may
manage DPB 64. Each decoded picture contained within DPB 64 may be
needed for either inter-prediction as a reference, or for future
output. In accordance with the techniques of this disclosure, DPB
64 may be modified to support adaptive-resolution CVSs, and more
generally to store frames of different sizes.
[0116] In accordance with the techniques of this disclosure, prior
to initialisation, the DPB may empty (i.e., an indication of a
proportion of DPB 64 that is unavailable to store decoded pictures,
or DPB "fullness," is set to "0"). When a decoded picture is stored
in DPB 64, DPB management unit 65 may increment the "fullness" of
the DPB by the number of blocks (e.g., CUs or 8.times.8 pixel
blocks) in the picture. Similarly, when DPB management unit 65
removes a decoded picture from DPB 64, DPB management unit 65 may
decrease the fullness of the DPB by the number of blocks (e.g., CUs
or 8.times.8 pixel blocks) in the removed picture.
[0117] To support a DPB that utilizes a count a block count rather
than a frame count to indicate the "fullness" of the DPB, the RBSP
of an SPS may include a syntax element that specifies a size of the
DPB in 8.times.8 blocks. The parameter, denoted as
max_dec_pic_buffering, specifies a required size of a decoded
picture buffer (DPB), in units of 8.times.8 blocks, for decoding
the CVS. This syntax element may apply to the entire CVS, and its
value is identical for all SPSs that may be activated for the CVS.
Further detail of the operation of the DPB is described with
respect to FIG. 5, below.
[0118] FIG. 3 is a block diagram illustrating an example video
decoder 30 that may implement the techniques described in this
disclosure. In the example of FIG. 3, video decoder 30 includes an
entropy decoding unit 80, prediction processing unit 81, inverse
quantization unit 86, inverse transformation unit 88, summer 90,
decoded picture buffer (DPB) 92, and DBP management unit 93.
Prediction processing unit 81 includes motion compensation unit 82
and intra prediction unit 84. Video decoder 30 may, in some
examples, perform a decoding pass generally reciprocal to the
encoding pass described with respect to video encoder 20 from FIG.
2.
[0119] During the decoding process, video decoder 30 receives an
encoded video bitstream that represents video blocks of an encoded
video slice and associated syntax elements from video encoder 20.
Entropy decoding unit 80 of video decoder 30 entropy decodes the
bitstream to generate quantized coefficients, motion vectors, and
other syntax elements. Entropy decoding unit 80 forwards the motion
vectors and other syntax elements to prediction processing unit 81.
Video decoder 30 may receive the syntax elements at the video slice
level and/or the video block level.
[0120] When the video slice is coded as an intra-coded (I) slice,
intra prediction unit 84 of prediction processing unit 81 may
generate prediction data for a video block of the current video
slice based on a signaled intra prediction mode and data from
previously decoded blocks of the current picture. When the video
picture is coded as an inter-coded (i.e., B or P) slice, motion
compensation unit 82 of prediction processing unit 81 produces
predictive blocks for a video block of the current video slice
based on the motion vectors and other syntax elements received from
entropy decoding unit 80. The predictive blocks may be produced
from one of the reference pictures within one of the reference
picture lists. Video decoder 30 may construct the reference frame
lists, List 0 and List 1, using default construction techniques
based on reference pictures stored in decoded picture buffer 92. In
some examples, video decoder 30 may construct List 0 and List 1
from the reference pictures identified in the derived reference
picture set.
[0121] Motion compensation unit 82 determines prediction
information for a video block of the current video slice by parsing
the motion vectors and other syntax elements, and uses the
prediction information to produce the predictive blocks for the
current video block being decoded. For example, motion compensation
unit 82 uses some of the received syntax elements to determine a
prediction mode (e.g., intra- or inter-prediction) used to code the
video blocks of the video slice, an inter-prediction slice type
(e.g., B slice or P slice), construction information for one or
more of the reference picture lists for the slice, motion vectors
for each inter-encoded video block of the slice, inter-prediction
status for each inter-coded video block of the slice, and other
information to decode the video blocks in the current video
slice.
[0122] Motion compensation unit 82 may also perform interpolation
based on interpolation filters. Motion compensation unit 82 may use
interpolation filters as used by video encoder 20 during encoding
of the video blocks to calculate interpolated values for
sub-integer pixels of reference blocks. In this case, motion
compensation unit 82 may determine the interpolation filters used
by video encoder 20 from the received syntax elements and use the
interpolation filters to produce predictive blocks.
[0123] Inverse quantization unit 86 inverse quantizes, i.e., de
quantizes, the quantized transform coefficients provided in the
bitstream and decoded by entropy decoding unit 80. The inverse
quantization process may include use of a quantization parameter
calculated by video encoder 20 for each video block in the video
slice to determine a degree of quantization and, likewise, a degree
of inverse quantization that should be applied. Inverse transform
unit 88 applies an inverse transform, e.g., an inverse DCT, an
inverse integer transform, or a conceptually similar inverse
transform process, to the transform coefficients in order to
produce residual blocks in the pixel domain.
[0124] After prediction processing unit 81 generates the predictive
block for the current video block based on either inter- or
intra-prediction, video decoder 30 forms a decoded video block by
summing the residual blocks from inverse transform unit 88 with the
corresponding predictive blocks generated by prediction processing
unit 81. Summer 90 represents the component or components that
perform this summation operation. If desired, a deblocking filter
may also be applied to filter the decoded blocks in order to remove
blockiness artifacts. Other loop filters (either in the coding loop
or after the coding loop) may also be used to smooth pixel
transitions, or otherwise improve the video quality. DPB management
unit 93 may store the decoded video blocks of a given in decoded
picture buffer 92, which stores reference pictures used for
subsequent motion compensation. Decoded picture buffer 92 also
stores decoded video for later presentation on a display device,
such as display device 32 of FIG. 1.
[0125] In accordance with this disclosure, prediction processing
unit 81 and DPB management unit 93 represent example units for
performing the example functions described above. For example,
prediction processing unit 81 may receive a coded video sequence
comprising a first sub-sequence and a second sub-sequence, wherein
the first sub-sequence includes one or more frames each having a
first resolution, and the second sub-sequence includes one or more
frames each having a second resolution, and wherein the first
sub-sequence is different than the second sub-sequence, and the
first resolution is different than the second resolution.
Prediction processing unit 81 may also receive a first sequence
parameter set and a second sequence parameter set for the coded
video sequence, wherein the first sequence parameter set indicates
the first resolution of the one or more frames of the first
sub-sequence, and the second sequence parameter set indicates the
second resolution of the one or more frames of the second
sub-sequence, and wherein the first sequence parameter set is
different than the second sequence parameter set. Prediction
processing unit 81 may also use the first sequence parameter set
and the second sequence parameter set to decode the coded video
sequence.
[0126] As another example in accordance with the techniques of this
disclosure, prediction processing unit 81 may also receive a first
decoded frame of video data, wherein the first decoded frame is
associated with a first resolution. DPB management unit 93 may
determine whether DPB 92 is available to store the first decoded
frame based on the first resolution, and in the event the decoded
picture buffer is available to store the first decoded frame, store
the first decoded frame in DPB 92, and determine whether the DPB 93
is available to store a second decoded frame of video data, wherein
the second decoded frame is associated with a second resolution,
based on the first resolution and the second resolution, wherein
the first decoded frame is different than the second decoded
frame.
[0127] In general, video decoder 30 may perform any of the
techniques of this disclosure. In some examples, video decoder 30
may perform some or all of the techniques described above with
respect to video encoder 20 in FIG. 2. In some examples, video
decoder 30 may perform the techniques described with respect to
FIG. 2 in a reciprocal ordering or manner to that described with
respect to video encoder 20.
[0128] FIGS. 4A-4D are conceptual diagrams that illustrate examples
of a coded bitstream including coded video data in accordance with
the techniques of this disclosure. As shown in FIG. 4A, a coded
bitstream 400 may comprise one or more coded video sequences
(CVSs), in particular, CVS 402 and CVS 404. As also shown in FIG.
4A, each of CVS 402 and CVS 404 may comprise one or more frames, or
"pictures," PIC_1 (0)-PIC_1 (N), and PIC_2 (0)-PIC_2 (M),
respectively. As still further shown in FIG. 4A, each of CVS 402
and CVS 404 may further comprise a single sequence parameter set
(SPS), in particular, SPS1 and SPS2, respectively. As described
above, each of SPS1 and SPS2 may define parameters for the
corresponding one of CVS 402 and CVS 404, including LCU size, SCU
size, and other syntax information for the respective CVS that is
common to all frames, or "pictures" within the CVS.
[0129] As shown in FIG. 4B, a particular CVS, CVS 406, may further
comprise one or more picture parameter sets (PPSs), in particular,
PPS1 and PPS2. As described above, each of PPS1 and PPS2 may define
parameters for CVS 406, including syntax information that indicates
picture resolution, that are common to one or more pictures within
CVS 406, but not to all pictures within CVS 406. For example,
syntax information included within each of PPS1 and PPS2, e.g.,
picture resolution syntax information, may apply to a sub-set of
the pictures included within CVS 406. As one example, PPS1 may
indicate picture resolution for PIC_1 (0)-PIC_1 (N), and PPS2 may
indicate picture resolution for PIC_2 (0)-PIC_2 (M). Accordingly,
CVS 406 may comprise pictures having different resolutions, wherein
picture resolution for a particular one or more pictures (e.g.,
PIC_1 (0)-PIC_1 (N)) within CVS 406 that share a common picture
resolution may be specified by a corresponding one of PPS1 and
PPS2.
[0130] In cases where pictures having different resolutions are
alternated within a CVS in a decoding order, e.g., in a
resolution-adaptive CVS, a PPS may have to be signaled prior to
each picture having a different picture resolution relative to a
previous picture in the decoding order, to indicate the picture
resolution for the currently decoded picture. Accordingly, in such
cases, multiple PPSs may need to be signaled throughout decoding
the CVS, which may increase coding overhead.
[0131] As described above, A PPS RBSP may include parameters that
can be referred to by coded slice NAL units of one or more coded
pictures. Each PPS RBSP is initially considered not active at a
start of a decoding process. In most examples, one PPS RBSP is
considered active at any given moment during the decoding process,
and activation of any particular PPS RBSP results in deactivation
of a previously-active PPS RBSP, if any.
[0132] When a PPS RBSP (with a particular value of the
pic_parameter_set_id syntax element) is not active, and is referred
to by a coded slice NAL unit (using the particular value of
pic_parameter_set_id), the PPS referred to by the
pic_parameter_sed_id is activated. This PPS RBSP is referred to as
an "active PPS RBSP," until it is deactivated by an activation of
another PPS. Video encoder 20 or decoder 30 may require a PPS with
the referenced pic_parameter_set_id, value to have been received
before activating that PPS with that pic_parameter_set_id.
[0133] As an example of the PPS activation process, a NAL unit may
refer to PPS1. Video encoder 20 or decoder 30 may activate PPS1
based on the reference to PPS1 in the NAL unit. PPS1 is the active
PPS RBSP. PPS1 remains the active PPS RBSP until a NAL unit
references PPS2, at which point video encoder 20 or decoder 30 may
activate PPS2. Once activated, PPS2 becomes the active PPS RBSP,
and PPS1 is no longer the active PPS RBSP.
[0134] Any PPS NAL unit that has the same pic_parameter_set_id
value for the active PPS RBSP for a coded picture may have the same
content as that of the active PPS RBSP for the coded picture. That
is, if the pic_parameter_set_id of the PPS NAL is the same as that
of the active PPS RBSP, the content of the active PPS RBSP may not
change. There may be an exception to this rule, however. If a PPS
NAL has the same pic_parameter_set_id as the active PPS RBSP, and
the PPS NAL follows the last Video Coding Layer (VCL) NAL unit of
the coded picture, and precedes the first VCL NAL unit of another
coded picture, then the content of the active PPS RBSP may change
(e.g., the pic_parameter_set_id value may indicate a different set
of parameters).
[0135] In accordance with the techniques of this disclosure, as
shown in FIGS. 4C-4D, syntax information that indicates picture
resolution for one or more pictures within a CVS, wherein the CVS
comprises one or more pictures having different sizes, may be
indicated using multiple SPSs for the CVS, rather than using a
plurality of PPSs, as described above with reference to FIGS.
4A-4B.
[0136] A SPS RBSP may include parameters that can be referred to by
one or more PPS RBSPs, or one or more Supplemental Extension
Information (SEI) NAL units containing a buffering period SEI
message. Each SPS is initially considered not active at a start of
a decoding process. At most, one SPS may be considered active for
each RSS at any given moment during the decoding process, and the
activation of any particular SPS may result in a deactivation of a
previously-active SPS for the same resolution sub-sequence, if any.
Also, if there are "n" resolution sub-sequences within the CVS, at
most "n" SPS RBSPs may be considered active for the entire CVS at
any given moment during the decoding process.
[0137] When an SPS RBSP (with a particular value of
seq_parameter_set_id) is not already active, and is referred to by
activation of a PPS RBSP (using the particular value of
seq_parameter_set_id), or is referred to by an SEI NAL unit
containing a buffering period SEI message (using the particular
value of seq_parameter_set_id), the SPS RBSP is activated. This SPS
RBSP may be referred to as an "active SPS RBSP" for the associated
RSS (the RSS in which the coded pictures refers to the active SPS
RBSP through the PPS RBSPs), until it is deactivated by an
activation of another SPS RBSP. Video encoder 20 or decoder 30 may
require the SPS RBSP with a particular value of
seq_parameter_set_id, to be available to video encoder 20 or video
decoder 30 prior to the activation of that SPS. Additionally, the
SPS may remain active for the entire RSS in the CVS.
[0138] Additionally, because an instantaneous decoder refresh (IDR)
access unit begins a new CVS, and an activated SPS RBSP may remain
active for the entire RSS in the CVS, an SPS RBSP may only be
activated by a buffering period SEI message when the buffering
period SEI message is part of an IDR access unit.
[0139] Any SPS NAL unit containing the particular value of
seq_parameter_set_id for the active SPS RBSP for a RSS in a CVS may
have the same content as that of the active SPS RBSP for the RSS in
the CVS, unless it follows a last access unit of the CVS, and
precedes the first VCL NAL unit and the first SEI NAL unit
containing a buffering period SEI message (when present) of another
CVS.
[0140] Also, if a PPS RBSP or an SPS RBSP is conveyed within the
bitstream, these constraints impose an order constraint on the NAL
units that contain the PPS RBSP or the SPS RBSP, respectively.
Otherwise if PPS RBSP or SPS RBSP are conveyed by other means not
specified in this disclosure, they should be available to the
decoding process in a timely fashion such that these constraints
are obeyed.
[0141] The constraints that are expressed on the relationship
between the values of the syntax elements (and the values of
variables derived from those syntax elements) in SPS and PPS, and
other syntax elements, are typically expressions of constraints
that apply only to the active SPS and the active PPS. If any SPS
RBSP is present that is not activated in the bitstream, its syntax
elements usually have values that would conform to the specified
constraints if it were activated by reference in an otherwise
conforming bitstream. If any PPS RBSP is present that is not ever
activated in the bitstream, the syntax elements of the PPS RBSP may
have values that would conform to the specified constraints if the
PPS were activated by reference in an otherwise-conforming
bitstream.
[0142] During the decoding process, the values of parameters of the
active PPS and the active SPS may be considered to be in effect.
For interpretation of SEI messages, the values of the parameters of
the PPS and SPS that are active for the operation of the decoding
process for the VCL NAL units of the primary coded picture in the
same access unit may be considered in effect unless otherwise
specified in the SEI message semantics.
[0143] As one example, as shown in FIG. 4C, CVS 408 may include one
or more SPSs, in particular, SPS1 and SPS2, that each indicate
picture resolution for PIC_1 (0), PIC_1 (1), etc., and PIC_2 (0),
PIC_2 (1), etc., respectively. In other words, SPS1 indicates
picture resolution information for PIC_1 (0), PIC_1 (1), etc., and
SPS2 indicates picture resolution information for PIC_2 (0), PIC_2
(1), etc. In this example, CVS 408 may further comprise one or more
PPSs (not shown), wherein the one or more PPSs may specify syntax
information for one or more pictures of CVS 408, but wherein the
one or more PPSs do not include any syntax information that
indicates picture resolution for any of the one or more pictures of
CVS 408.
[0144] In this example, SPS1 and SPS2 may indicate picture
resolution information for all pictures within CVS 408, even in
cases where pictures having different resolutions are alternated
within a CVS in the decoding order. Accordingly, after the
indicating picture resolution information for all pictures within
CVS 408 using SPS1 and SPS2, no additional indication of the
information may be needed.
[0145] As shown in FIG. 4C, the multiple SPSs, e.g., SPS1 and SPS2,
may be located at the beginning of the corresponding CVS, e.g., CVS
408, prior to any of PIC_1 (0), PIC_1 (1) and PIC_2 (0), PIC_2 (1).
As shown in FIG. 4D, alternatively, an SPS that indicates picture
resolution information for one or more pictures may be located
before a first one of such pictures in a decoding sequence. For
example, as shown in FIG. 4D, SPS2 is located within CVS 410 prior
to a first one of pictures PIC_2 (0), PIC_2 (1), etc., but after a
first one of PIC_1 (0), PIC_1 (1), etc.
[0146] FIG. 5 is a conceptual diagram illustrating the operation of
a decoded picture buffer of a hypothetical reference decoder (HRD)
model in accordance with the techniques of this disclosure. FIG. 5
includes coded picture buffer (CPB) 502, decoded picture buffer
(DPB) 504, and DPB management unit 506. DPB management unit 506 may
remove a picture in coded picture buffer (CPB) 502. Video encoder
20 or decoder 30 may decode the picture, and DPB management unit
506 may store the decoded picture in decoded picture buffer 504.
Based on various criteria, such as an output time, output flag, or
a picture count, DPB management unit 506 may remove a picture from
DPB 504. In some cases video encoder 20 or decoder 30 may output
the decoded picture. CPB 502 may contain encoded pictures that are
removed so that video encoder 20 or decoder 30 may utilize the
decoded pictures that may be needed for inter-prediction as a
reference, or for future output. In general, DPB 504 may include a
maximum capacity. In previous video coding standards, DPB 504 may
include a maximum number of frames that can be stored in the DPB.
However, the support adaptive-resolution CVSs, DPB management unit
506 may maintain a count of blocks contained within the DPB to
measure the "fullness" of the DPB.
[0147] This disclosure describes the removal techniques of decoded
pictures in the DPB from at least two perspectives. In the first
perspective, DPB management unit 506 of video decoder 30 may remove
decoded pictures based on an output time if the pictures are
intended for output. In the second perspective, DPB management unit
506 may remove decode pictures based on the picture order count
(POC) values if the pictures are intended for output. In either
perspectives, DPB management unit 506 may remove decoded pictures
that are not needed for output (i.e., outputted already or not
intended for output) when the decoded picture is not in the
reference picture set, and prior to decoding the current picture.
Although described with respect to video decoder 30, video encoder
20 and DPB management unit 506 of video encoder 20 may also perform
any of the DPB management techniques described in this
disclosure.
[0148] DPB 504 may include a plurality of buffers, and each buffer
may store a decoded picture that is to be used as a reference
picture or is held for future output. Initially, the DPB is empty
(i.e., the DPB fullness is set to zero). In the described example
techniques, the removal of the decoded pictures from the DPB may
occur before the decoding of the current picture, but after video
decoder 30 parses the slice header of the first slice of the
current picture.
[0149] In the first perspective, the following techniques may occur
instantaneously at time t.sub.r(n) in the following sequence. In
this example, t.sub.r(n) is CPB removal time (i.e., decoding time)
of the access unit n containing the current picture. As described
in this disclosure, the techniques occurring instantaneously may
mean that the in the HRD model, it is assumed that decoding of a
picture is instantaneous, with a time period for decoding a picture
equal to zero.
[0150] In the first perspective, decoder 30 may invoke the
derivation process for a reference picture set. If the current
picture, which DPB management unit 506 may retrieve from CPB 502 is
an IDR picture, DPB management unit 506 may remove all decoded
pictures from DPB 504, and may set and the DPB fullness to 0. If
the decoded picture is not an IDR picture, DPB management unit 506
may remove all pictures not included in the reference picture set
of the current picture from DPB 504. DPB management unit 506 may
also remove all pictures having an OutputFlag value equal to "0",
or having DPB output time is less than or equal to the CPB removal
time of the current picture, which may be referred to as "n" (i.e.,
t.sub.o,dpb(m)<=t.sub.r(n)). The OutputFlag may indicate that
video decoder 30 should output the picture (e.g., for display or
for transmission in the case of an encoder).
[0151] Whenever DPB management unit 506 removes a picture from DPB
504, DPB management unit 506 may decrement the fullness of DPB 504
by the number of 8.times.8 blocks in the picture, i.e.,
(pic_width_in_luma_samples*pic_height_in_luma_samples)>>6.
[0152] After DPB management unit 506 has removed any pictures from
the DPB, video decoder 30 may decode and store the received picture
"n" in the DPB. DPB management unit 506 may increment the DPB
fullness by the number of 8.times.8 blocks in the stored decoded
picture, i.e.,
(pic_width_in_luma_samples*pic_height_in_luma_samples)>>6.
[0153] Each picture may also have an OutputFlag, as described
above. When the picture has an OutputFlag value equal to 1, the DPB
output time, denoted as t.sub.o,dpb(n), of the picture may be
derived by the following equation.
t.sub.o,dpb(n)=t.sub.r(n)+t.sub.c*dpb_outputdelay(n)
[0154] In the equation, dpb output delay(n) may be the value of dpb
output delay specified in the picture timing SEI message associated
with access unit "n."
[0155] If the OutputFlag of a picture is equal to "1" and
t.sub.o,dpb(n)=t.sub.r(n), video decoder 30 may output the current
picture. Otherwise, if the value of OutputFlag is equal to 0, video
decoder 30 may not output the current picture. Otherwise, (i.e., if
OutputFlag is equal to 1 and t.sub.o,dpb(n)>t.sub.r(n)), video
decoder 30 may output the current picture later, at time
t.sub.o,dpb(n).
[0156] As described above, in some examples, video decoder 30 may
crop the picture in the decoded picture buffer. Video decoder 30
may utilize the cropping rectangle specified in the active sequence
parameter set for the picture to determine the cropping
rectangle.
[0157] In some examples, video decoder 30 may determine a
difference between the DPB output time for a picture and the DPB
output time for a picture following the picture in output order.
When picture "n" is a picture that is output and is not the last
picture of the bitstream that is output, the output time of picture
"n" .DELTA.t.sub.o,dpb(n) may be defined according to the following
equation.
.DELTA.t.sub.o,dpb(n)=t.sub.o,dpb(n.sub.n)-t.sub.o,dpb(n)
[0158] In preceding equation, n.sub.n may denote the picture that
follows after picture "n" in output order and has OutputFlag equal
to 1.
[0159] In the second perspective for removing decoded pictures, the
HRD may implement the techniques instantaneously when DPB
management unit 506 removes an access unit from CPB 502. Again,
video decoder 30 and DPB management unit 506 of video decoder 30
may implement the removing of decoded pictures from DPB 504, and
video decoder 30 may not necessarily include CPB 502. In some
examples, video decoder 30 and video encoder 20 may not require CPB
502. Rather, CPB 504 is described as part of the HRD model for
purposes of illustration only.
[0160] As above, in the second perspective for removing decoded
pictures, DPB management unit 506 may remove the pictures from the
DPB before the decoding of the current picture, but after parsing
the slice header of the first slice of the current picture. Also,
similar to the first perspective for removing decoded pictures, in
the second perspective, video decoder 30 and DPB management unit
506 may perform similar functions to those described above with
respect to the first perspective when the current picture is an IDR
picture.
[0161] Otherwise, if the current picture is not an IDR picture, DPB
management unit 506 may empty, without output, buffers of the DPB
that store a picture that is marked as "not needed for output" and
that store pictures not included in the reference picture set of
the current picture. DPB management unit 506 may also decrement the
DPB fullness by the number of buffers that DPB management unit 506
emptied. When there is not empty buffer (i.e., the DPB fullness is
equal to the DBP size), DPB management unit 506 may implement a
"bumping" process described below. In some examples, when there is
no empty buffer, DPB management unit 506 may implement the bumping
process repeatedly unit there is an empty buffer in which video
decoder 30 can store the current decoded picture.
[0162] In general, video decoder 30 may implement the following
steps to implement the bumping process. Video decoder 30 may first
determine the picture to be outputted. For example, video decoder
30 may select the picture having the smaller PicOrderCnt (POC)
value of all the pictures in DPB 504 that are marked as "needed for
output." Video decoder 30 may crop the selected picture using the
cropping rectangle specified in the active sequence parameter set
for the picture. Video decoder 30 may output the cropped picture,
and may mark the picture as "not needed for output." Video decoder
30 may check the buffer of DPB 504 that stored the cropped and
outputted picture. If the picture is not included in the reference
picture set, DPB management unit 506 may empty that buffer and may
decrement the DPB fullness by the number of 8.times.8 blocks in the
removed picture.
[0163] Although the above techniques for the DPB management are
described from the context of video decoder 30 and DPB management
unit 65, in some examples, video encoder 20, and DPB management
unit 93 may implement similar techniques. However, video encoder 20
implementing similar techniques is not required in every example.
In some examples, video decoder 30 may implement these techniques,
and video encoder 20 may not implement these techniques.
[0164] In this manner, a video coder (e.g., video encoder 20 or
video decoder 30) may implement techniques to support CVSs having
adaptive resolution. Again, the reference picture set may identify
the reference pictures that can potentially be used for
inter-predicting the current picture and can potentially be used
for inter-predicting one or more picture following the current
picture in decoding order.
[0165] In the above examples, the DPB size or fullness may be
signaled with respect to the number of 8.times.8 blocks of a
pictured stored in the DPB. Alternatively, the fullness of the DPB,
i.e., the max_dec_pic buffering syntax element, may be signaled
based on the number of smallest coding units (SCUs) of a picture.
For example, if the smallest SCU among all active SPSs is
16.times.16, then the unit of max_dec_pic buffering may be
16.times.16 blocks.
[0166] As still another example, video encoder 20 or decoder 30 may
signal the DPB size, indicated by the max_dec_pic buffering syntax
element, using units of frame buffers that are specific to the
spatial resolution indicated by the SPS. For example, if there are
two RSSs, rss1 and rss2, with resolution res1 and resolution res2,
referring to SPS sps1 and SPS sps2 respectively, wherein res1 is
greater than res2, then max_dec_pic buffering in sps1 is counted in
frame buffers of res1, and max_dec_pic buffering in sps2 is counted
in frame buffers of res2. In this example, video encoder 20 or
decoder 30 may be subject to the restriction that the DPB size, if
counted in units of 8.times.8 blocks, indicated by the max_dec_pic
buffering value in sps1 may not be less than that indicated by the
max_dec_pic buffering value in sps2. Consequently, in the DPB
operations, when video decoder 30 removes one frame buffer of res1
from DPB 504, the freed buffer space may be sufficient for
insertion of a decoded picture of either resolution. However, when
decoder 30 removes one frame buffer of res2 from DPB 504, the freed
buffer space may not be sufficient for insertion of a decoded
picture of res1. Rather, video decoder 30 may remove multiple frame
buffers of res2 from DPB 504 in this case.
[0167] The video decoder 30 may derive the reference picture set in
any manner, including the example techniques described above. Video
decoder 30 may determine whether a decoded picture stored in the
decoded picture buffer is not needed for output and is not
identified in the reference picture set. When video decoder 30 has
outputted the decoded picture and the decoded picture is not
identified in the reference picture set, the video decoder 30 may
remove the decoded picture from the decoded picture buffer.
Subsequent to removing the decoded picture, video decoder 30 may
code the current picture. For example, video decoder 30 may
construct the reference picture list(s) as described above, and
code the current picture based on the reference picture
list(s).
[0168] FIG. 6 is a flowchart illustrating an example operation of
using a first sub-sequence and a second sub-sequence to decode
video in accordance with the techniques of this disclosure. For
purposes of illustration only, the method of FIG. 6 may be
performed by a video coder corresponding to either video encoder 20
or video decoder 30. In the method of FIG. 6, the video coder may
process a coded video sequence comprising a first sub-sequence and
a second sub-sequence (601). The first sub-sequence may include one
or more frames each having a first resolution, and the second
sub-sequence may include one or more frames each having a second
resolution. The first sub-sequence may be different than the second
sub-sequence, and the first resolution may be different than the
second resolution.
[0169] The video coder (e.g., video encoder 20 or video decoder 30)
may also process a first sequence parameter set (SPS) and a second
sequence parameter set for the coded video sequence (602). The
first sequence parameter set may indicate the first resolution of
the one or more frames of the first sub-sequence, and the second
sequence parameter set may indicate the second resolution of the
one or more frames of the second sub-sequence. The first sequence
parameter set may also be different than the second sequence
parameter set. The video coder (e.g., video encoder 20 or video
decoder 30) may use the first sequence parameter set and the second
sequence parameter set to code the coded video sequence (603).
[0170] In some examples, the video coder may comprise an encoder,
e.g., encoder 20 of FIGS. 1-2. In the case where the video coder
comprises a decoder, processing SPSs and sub-sequences may comprise
receiving the SPSs and sub-sequences. In this case, coding the
first and second video sequences may comprise decoding the first
and second video sequences.
[0171] In the case where the video coder comprises an encoder,
processing SPSs and sub-sequences may comprise generating the SPSs
and sub-sequences. In this case, coding the first and second video
sequences may comprise encoding the first and second video
sequences. Additionally in the case where the video coder comprises
an encoder, the video encoder may transmit the coded video sequence
comprising the first sub-sequence and the second subs-sequence
instead of receiving the video sequence comprising the first and
second sub-sequence. In some examples, the first resolution and the
second resolution may each comprise a spatial resolution.
[0172] In some examples, the video coder may code the first
sequence parameter set and the second sequence parameter in a
received bitstream prior to either the first sub-sequence or the
second sub-sequence.
[0173] In another example, to receive the first sequence parameter
set and the second sequence parameter set of the coded video
sequence, the video coder may be configured to receive both the
first sequence parameter set and the second sequence parameter set
prior to receiving either of the first sub-sequence and the second
sub-sequence.
[0174] In another example, the video coder may code the first
sequence parameter set in a received bitstream prior to the first
sub-sequence and the second sequence parameter set is coded in the
received bitstream after at least one frame of the one or more
frames of the first sub-sequence, and prior to the second
sub-sequence.
[0175] In another example, to receive the first sequence parameter
set and the second sequence parameter set of the coded video
sequence, the video coder may be configured to receive the second
sequence parameter set after receiving at least one frame of the
one or more frames of the first sub-sequence, and prior to
receiving the second sub-sequence.
[0176] In yet another example, the video coder may interleave the
one or more frames of the first sub-sequence and the one or more
frames of the second sub-sequence in the coded video sequence.
[0177] FIG. 7 is a flowchart illustrating an example operation of
managing a decoded picture buffer. For purposes of illustration
only, the method of FIG. 7 may be performed by a video coder
corresponding to either video encoder 20 or video decoder 30. In
the method of FIG. 7, a video coder may receive a coded video
sequence comprising a first sub-sequence and a second sub-sequence
(701). The first sub-sequence may include one or more frames each
having a first resolution, and the second sub-sequence may include
one or more frames each having a second resolution. The first
sub-sequence may be different than the second sub-sequence, and the
first resolution may be different than the second resolution. The
video coder may receive a first decoded frame of video data, and
the first decoded frame may be associated with a first resolution.
In some examples, wherein the resolution may comprise a spatial
resolution.
[0178] In accordance with the method illustrated in FIG. 7, the
video coder may also determine whether a decoded picture buffer is
available to store the first decoded frame based on the first
resolution (702). In the event the decoded picture buffer is
available to store the first decoded frame, the video coder may
store the first decoded frame in the decoded picture buffer, and
determine whether the decoded picture buffer is available to store
a second decoded frame of video data. The second decoded frame of
video data may be associated with a second resolution. The video
coder may also determine whether the decoded picture buffer is
available to store the second decoded frame based on the first
resolution and the second resolution (704). The first decoded frame
may also be different than the second decoded frame.
[0179] In some examples, to determine whether the decoded picture
buffer is available to store the first decoded frame based on the
first resolution, the video coder may be configured to determine an
amount of information that may be stored within the decoded picture
buffer, determine an amount of information associated with the
first decoded frame based on the first resolution, and compare the
amount of information that may be stored within the decoded picture
buffer and the amount of information associated with the first
decoded frame.
[0180] In an example, to determine whether the decoded picture
buffer is available to store the second decoded frame based on the
first resolution and the second resolution, the video coder may be
configured to determine an amount of information that may be stored
within the decoded picture buffer based on the first resolution,
determine an amount of information associated with the second
decoded frame based on the second resolution, and compare the
amount of information that may be stored within the decoded picture
buffer and the amount of information associated with the second
decoded frame.
[0181] In some examples, the video coder may be further configured
to remove the first decoded frame from the decoded picture buffer.
The video coder may also be an encoder, e.g., encoder 20 of FIGS.
1-2, or a decoder, e.g., decoder 30 of FIGS. 1-2, in some
examples.
[0182] By way of example, and not limitation, such
computer-readable storage media can comprise RAM, ROM, EEPROM,
CD-ROM or other optical disk storage, magnetic disk storage, or
other magnetic storage devices, flash memory, or any other medium
that can be used to store desired program code in the form of
instructions or data structures and that can be accessed by a
computer. Also, any connection is properly termed a
computer-readable medium. For example, if instructions are
transmitted from a website, server, or other remote source using a
coaxial cable, fiber optic cable, twisted pair, digital subscriber
line (DSL), or wireless technologies such as infrared, radio, and
microwave, then the coaxial cable, fiber optic cable, twisted pair,
DSL, or wireless technologies such as infrared, radio, and
microwave are included in the definition of medium. It should be
understood, however, that computer-readable storage media and data
storage media do not include connections, carrier waves, signals,
or other transient media, but are instead directed to
non-transient, tangible storage media. Disk and disc, as used
herein, includes compact disc (CD), laser disc, optical disc,
digital versatile disc (DVD), floppy disk and Blu-ray disc, where
disks usually reproduce data magnetically, while discs reproduce
data optically with lasers. Combinations of the above should also
be included within the scope of computer-readable media.
[0183] Instructions may be executed by one or more processors, such
as one or more digital signal processors (DSPs), general purpose
microprocessors, application specific integrated circuits (ASICs),
field programmable logic arrays (FPGAs), or other equivalent
integrated or discrete logic circuitry. Accordingly, the term
"processor," as used herein may refer to any of the foregoing
structure or any other structure suitable for implementation of the
techniques described herein. In addition, in some aspects, the
functionality described herein may be provided within dedicated
hardware and/or software modules configured for encoding and
decoding, or incorporated in a combined codec. Also, the techniques
could be fully implemented in one or more circuits or logic
elements.
[0184] The techniques of this disclosure may be implemented in a
wide variety of devices or apparatuses, including a wireless
handset, an integrated circuit (IC) or a set of ICs (e.g., a chip
set). Various components, modules, or units are described in this
disclosure to emphasize functional aspects of devices configured to
perform the disclosed techniques, but do not necessarily require
realization by different hardware units. Rather, as described
above, various units may be combined in a codec hardware unit or
provided by a collection of interoperative hardware units,
including one or more processors as described above, in conjunction
with suitable software and/or firmware.
[0185] Various examples have been described. These and other
examples are within the scope of the following claims.
* * * * *
References