U.S. patent application number 13/459809 was filed with the patent office on 2012-08-23 for local picture identifier and computation of co-located information.
This patent application is currently assigned to Microsoft Corporation. Invention is credited to Kim-chyan Gan, Naveen Thumpudi, Yongjun Wu.
Application Number | 20120213286 13/459809 |
Document ID | / |
Family ID | 42397706 |
Filed Date | 2012-08-23 |
United States Patent
Application |
20120213286 |
Kind Code |
A1 |
Wu; Yongjun ; et
al. |
August 23, 2012 |
LOCAL PICTURE IDENTIFIER AND COMPUTATION OF CO-LOCATED
INFORMATION
Abstract
Video decoding innovations for using local picture identifiers
and computing co-located information are described. In one aspect,
a decoder identifies reference pictures in a reference picture list
of a temporal direct prediction mode macroblock that match
reference pictures used by a co-located macroblock using local
picture identifiers. In another aspect, a decoder determines
whether reference pictures used by blocks are the same by comparing
local picture identifiers during calculation of boundary strength.
In yet another aspect, a decoder determines a picture type of a
picture and based on the picture type selectively skips or
simplifies computation of co-located information for use in
reconstructing direct prediction mode macroblocks outside the
picture.
Inventors: |
Wu; Yongjun; (Bellevue,
WA) ; Thumpudi; Naveen; (Redmond, WA) ; Gan;
Kim-chyan; (Sammamish, WA) |
Assignee: |
Microsoft Corporation
Redmond
WA
|
Family ID: |
42397706 |
Appl. No.: |
13/459809 |
Filed: |
April 30, 2012 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
12364325 |
Feb 2, 2009 |
8189666 |
|
|
13459809 |
|
|
|
|
Current U.S.
Class: |
375/240.16 ;
375/240.12; 375/E7.248; 375/E7.255; 375/E7.265 |
Current CPC
Class: |
H04N 19/46 20141101;
H04N 19/176 20141101; H04N 19/44 20141101; H04N 19/70 20141101;
H04N 19/159 20141101; H04N 19/102 20141101; H04N 19/513 20141101;
H04N 19/61 20141101 |
Class at
Publication: |
375/240.16 ;
375/240.12; 375/E07.248; 375/E07.255; 375/E07.265 |
International
Class: |
H04N 7/36 20060101
H04N007/36; H04N 7/34 20060101 H04N007/34; H04N 7/32 20060101
H04N007/32 |
Claims
1-9. (canceled)
10. A computer-implemented method for transforming encoded video
information using a video decoder, the method comprising: receiving
encoded video information in a bitstream; performing loop filtering
during decoding the encoded video information, comprising:
calculating boundary strength values for plural blocks, wherein the
calculating comprises determining whether reference pictures used
by the plural blocks are the same by comparing local picture
identifiers of the reference pictures, wherein the local picture
identifiers are assigned to picture structures when allocated, and
wherein the decoder reuses the local picture identifiers during the
decoding based on availability of the local picture identifiers;
and outputting the filtered macroblock.
11. The method of claim 10 wherein the local picture identifiers
are 8-bit local picture identifiers, and wherein the decoder sets
the local picture identifiers independent of picture order
count.
12. The method of claim 10 wherein the local picture identifiers
are 5-bit local picture identifiers.
13. The method of claim 10 wherein the local picture identifiers
are greater than or equal to 5-bits, and less than or equal to
32-bits, and wherein the decoder selectively reuses the local
picture identifiers during decoding based on which of the local
picture identifiers are in use, thereby controlling bit depth of
the local picture identifiers and speeding up the determination of
whether reference pictures used by the plural blocks are the same
during the loop filtering.
14. The method of claim 10 wherein the encoded video information is
H.264 encoded video information.
15-20. (canceled)
21. The method of claim 10 wherein the local picture identifiers
are 8-bit local picture identifiers, and wherein an invalid picture
identifier is assigned an 8-bit value of 255.
22. The method of claim 10 wherein the local picture identifiers
are 32-bit local picture identifiers.
23. The method of claim 10 wherein the loop filtering is performed
as part of deblock filtering to smooth reconstructed sample data
across block boundaries.
24. A computing device implementing a video decoder, the computing
device comprising: a processing unit; and memory; wherein the
computing device is configured to perform operations for decoding
video, the operations comprising: receiving encoded video
information in a bitstream; performing loop filtering during
decoding the encoded video information, comprising: calculating
boundary strength values for plural blocks, wherein the calculating
comprises determining whether reference pictures used by the plural
blocks are the same by comparing local picture identifiers of the
reference pictures, wherein the local picture identifiers are
assigned to picture structures when allocated, and wherein the
decoder reuses the local picture identifiers during the decoding
based on availability of the local picture identifiers; and
outputting the filtered macroblock.
25. The computing device of claim 24 wherein the local picture
identifiers are 8-bit local picture identifiers, and wherein the
decoder sets the local picture identifiers independent of picture
order count.
26. The computing device of claim 24 wherein the local picture
identifiers are 5-bit local picture identifiers.
27. The computing device of claim 24 wherein the local picture
identifiers are greater than or equal to 5-bits, and less than or
equal to 32-bits, and wherein the decoder selectively reuses the
local picture identifiers during decoding based on which of the
local picture identifiers are in use, thereby controlling bit depth
of the local picture identifiers and speeding up the determination
of whether reference pictures used by the plural blocks are the
same during the loop filtering.
28. The computing device of claim 24 wherein the encoded video
information is H.264 encoded video information.
29. The computing device of claim 24 wherein the local picture
identifiers are 8-bit local picture identifiers, and wherein an
invalid picture identifier is assigned an 8-bit value of 255.
30. The computing device of claim 24 wherein the local picture
identifiers are 32-bit local picture identifiers.
31. The computing device of claim 24 wherein the loop filtering is
performed as part of deblock filtering to smooth reconstructed
sample data across block boundaries.
32. A computer-readable storage medium storing computer-executable
instructions for causing a computing device programmed thereby to
perform a method for decoding encoded video information, the method
comprising: receiving encoded video information in a bitstream;
performing loop filtering during decoding the encoded video
information, comprising: calculating boundary strength values for
plural blocks, wherein the calculating comprises determining
whether reference pictures used by the plural blocks are the same
by comparing local picture identifiers of the reference pictures,
wherein the local picture identifiers are assigned to picture
structures when allocated, and wherein the decoder reuses the local
picture identifiers during the decoding based on availability of
the local picture identifiers; and outputting the filtered
macroblock.
33. The computer-readable storage medium of claim 32 wherein the
local picture identifiers are one or 5-bit and 8-bit local picture
identifiers, and wherein the decoder sets the local picture
identifiers independent of picture order count.
34. The computer-readable storage medium of claim 32 wherein the
local picture identifiers are greater than or equal to 5-bits, and
less than or equal to 32-bits, and wherein the decoder selectively
reuses the local picture identifiers during decoding based on which
of the local picture identifiers are in use, thereby controlling
bit depth of the local picture identifiers and speeding up the
determination of whether reference pictures used by the plural
blocks are the same during the loop filtering.
35. The computer-readable storage medium of claim 32 wherein the
encoded video information is H.264 encoded video information.
Description
RELATED APPLICATION INFORMATION
[0001] The present application is a divisional of U.S. patent
application Ser. No. 12/364,325, entitled "LOCAL PICTURE IDENTIFIER
AND COMPUTATION OF CO-LOCATED INFORMATION," filed Feb. 2, 2009, the
disclosure of which is hereby incorporated by reference.
BACKGROUND
[0002] Companies and consumers increasingly depend on computers to
process, distribute, and play back high quality video content.
Engineers use compression (also called source coding or source
encoding) to reduce the bit rate of digital video. Compression
decreases the cost of storing and transmitting video information by
converting the information into a lower bit rate form.
Decompression (also called decoding) reconstructs a version of the
original information from the compressed form. A "codec" is an
encoder/decoder system.
[0003] Compression can be lossless, in which the quality of the
video does not suffer, but decreases in bit rate are limited by the
inherent amount of variability (sometimes called source entropy) of
the input video data. Or, compression can be lossy, in which the
quality of the video suffers, and the lost quality cannot be
completely recovered, but achievable decreases in bit rate are more
dramatic. Lossy compression is often used in conjunction with
lossless compression--lossy compression establishes an
approximation of information, and the lossless compression is
applied to represent the approximation.
[0004] A basic goal of lossy compression is to provide good
rate-distortion performance. So, for a particular bit rate, an
encoder attempts to provide the highest quality of video. Or, for a
particular level of quality/fidelity to the original video, an
encoder attempts to provide the lowest bit rate encoded video. In
practice, considerations such as encoding time, encoding
complexity, encoding resources, decoding time, decoding complexity,
decoding resources, overall delay, and/or smoothness in quality/bit
rate changes also affect decisions made in codec design as well as
decisions made during actual encoding.
[0005] In general, video compression techniques include
"intra-picture" compression and "inter-picture" compression.
Intra-picture compression techniques compress a picture with
reference to information within the picture, and inter-picture
compression techniques compress a picture with reference to a
preceding and/or following picture (often called a reference or
anchor picture) or pictures.
[0006] For intra-picture compression, for example, an encoder
splits a picture into 8.times.8 blocks of samples, where a sample
is a number that represents the intensity of brightness or the
intensity of a color component for a small, elementary region of
the picture, and the samples of the picture are organized as arrays
or planes. The encoder applies a frequency transform to individual
blocks. The frequency transform converts an 8.times.8 block of
samples into an 8.times.8 block of transform coefficients. The
encoder quantizes the transform coefficients, which may result in
lossy compression. For lossless compression, the encoder entropy
codes the quantized transform coefficients.
[0007] Inter-picture compression techniques often use motion
estimation and motion compensation to reduce bit rate by exploiting
temporal redundancy in a video sequence. Motion estimation is a
process for estimating motion between pictures. For example, for an
8.times.8 block of samples or other unit of the current picture,
the encoder attempts to find a match of the same size in a search
area in another picture, the reference picture. Within the search
area, the encoder compares the current unit to various candidates
in order to find a candidate that is a good match. When the encoder
finds an exact or "close enough" match, the encoder parameterizes
the change in position between the current and candidate units as
motion data (such as a motion vector ("MV")). In general, motion
compensation is a process of reconstructing pictures from reference
picture(s) using motion data.
[0008] The example encoder also computes the sample-by-sample
difference between the original current unit and its
motion-compensated prediction to determine a residual (also called
a prediction residual or error signal). The encoder then applies a
frequency transform to the residual, resulting in transform
coefficients. The encoder quantizes the transform coefficients and
entropy codes the quantized transform coefficients.
[0009] If an intra-compressed picture or motion-predicted picture
is used as a reference picture for subsequent motion compensation,
the encoder reconstructs the picture. A decoder also reconstructs
pictures during decoding, and it uses some of the reconstructed
pictures as reference pictures in motion compensation. For example,
for an 8.times.8 block of samples of an intra-compressed picture,
an example decoder reconstructs a block of quantized transform
coefficients. The example decoder and encoder perform inverse
quantization and an inverse frequency transform to produce a
reconstructed version of the original 8.times.8 block of
samples.
[0010] As another example, the example decoder or encoder
reconstructs an 8.times.8 block from a prediction residual for the
block. The decoder decodes entropy-coded information representing
the prediction residual. The decoder/encoder inverse quantizes and
inverse frequency transforms the data, resulting in a reconstructed
residual. In a separate motion compensation path, the
decoder/encoder computes an 8.times.8 predicted block using motion
vector information for displacement from a reference picture. The
decoder/encoder then combines the predicted block with the
reconstructed residual to form the reconstructed 8.times.8
block.
I. Video Codec Standards.
[0011] Over the last two decades, various video coding and decoding
standards have been adopted, including the H.261, H.262 (MPEG-2)
and H.263 series of standards and the MPEG-1 and MPEG-4 series of
standards. More recently, the H.264 standard (sometimes referred to
as H.264/AVC) and VC-1 standard have been adopted. For additional
details, see representative versions of the respective
standards.
[0012] Such a standard typically defines options for the syntax of
an encoded video bit stream according to the standard, detailing
the parameters that must be in the bit stream for a video sequence,
picture, block, etc. when particular features are used in encoding
and decoding. The standards also define how a decoder conforming to
the standard should interpret the bit stream parameters--the bit
stream semantics. In many cases, the standards provide details of
the decoding operations the decoder should perform to achieve
correct results. Often, however, the low-level implementation
details of the operations are not specified, or the decoder is able
to vary certain implementation details to improve performance, so
long as the correct decoding results are still achieved.
[0013] During development of a standard, engineers may concurrently
generate reference software, sometimes called verification model
software or JM software, to demonstrate rate-distortion performance
advantages of the various features of the standard. Typical
reference software provides a "proof of concept" implementation
that is not algorithmically optimized or optimized for a particular
hardware platform. Moreover, typical reference software does not
address multithreading implementation decisions, instead assuming a
single threaded implementation for the sake of simplicity.
II. Acceleration of Video Decoding and Encoding.
[0014] While some video decoding and encoding operations are
relatively simple, others are computationally complex. For example,
inverse frequency transforms, fractional sample interpolation
operations for motion compensation, in-loop deblock filtering,
post-processing filtering, color conversion, and video re-sizing
can require extensive computation. This computational complexity
can be problematic in various scenarios, such as decoding of
high-quality, high-bit rate video (e.g., compressed high-definition
video). In particular, decoding tasks according to more recent
standards such as H.264 and VC-1 can be computationally intensive
and consume significant memory resources.
[0015] Some decoders use video acceleration to offload selected
computationally intensive operations to a graphics processor. For
example, in some configurations, a computer system includes a
primary central processing unit ("CPU") as well as a graphics
processing unit ("GPU") or other hardware specially adapted for
graphics processing. A decoder uses the primary CPU as a host to
control overall decoding and uses the GPU to perform simple
operations that collectively require extensive computation,
accomplishing video acceleration.
[0016] In a typical software architecture for video acceleration
during video decoding, a video decoder controls overall decoding
and performs some decoding operations using a host CPU. The decoder
signals control information (e.g., picture parameters, macroblock
parameters) and other information to a device driver for a video
accelerator (e.g., with GPU) across an acceleration interface.
[0017] The acceleration interface is exposed to the decoder as an
application programming interface ("API"). The device driver
associated with the video accelerator is exposed through a device
driver interface ("DDI"). In an example interaction, the decoder
fills a buffer with instructions and information then calls a
method of an interface to alert the device driver through the
operating system. The buffered instructions and information, opaque
to the operating system, are passed to the device driver by
reference, and video information is transferred to GPU memory if
appropriate. While a particular implementation of the API and DDI
may be tailored to a particular operating system or platform, in
some cases, the API and/or DDI can be implemented for multiple
different operating systems or platforms.
[0018] In some cases, the data structures and protocol used to
parameterize acceleration information are conceptually separate
from the mechanisms used to convey the information. In order to
impose consistency in the format, organization and timing of the
information passed between the decoder and device driver, an
interface specification can define a protocol for instructions and
information for decoding according to a particular video decoding
standard or product. The decoder follows specified conventions when
putting instructions and information in a buffer. The device driver
retrieves the buffered instructions and information according to
the specified conventions and performs decoding appropriate to the
standard or product. An interface specification for a specific
standard or product is adapted to the particular bit stream syntax
and semantics of the standard/product.
[0019] Given the critical importance of video compression and
decompression to digital video, it is not surprising that
compression and decompression are richly developed fields. Whatever
the benefits of previous techniques and tools, however, they do not
have the advantages of the following techniques and tools.
SUMMARY
[0020] In summary, techniques and tools are described for various
aspects of video decoder implementations. These techniques and
tools help, for example, to increase decoding speed to facilitate
real time decoding, reduce computational complexity, and/or reduce
memory utilization (e.g., for use in scenarios such as those with
processing power constraints and/or delay constraints).
[0021] According to one aspect of the techniques and tools
described herein, a decoder receives encoded video information in a
bitstream and during decoding identifies a temporal direct
prediction mode macroblock, where the temporal direct prediction
mode macroblock is associated with a reference picture list, and
where reference pictures of the reference picture list are
identified using local picture identifiers. The decoder then
identifies a co-located macroblock of the temporal direct
prediction mode macroblock, where the co-located macroblock uses
one or more reference pictures. Next, the decoder identifies one or
more reference pictures in the reference picture list that match
the one or more reference pictures used by the co-located
macroblock, where the identifying the one or more reference
pictures in the reference picture list uses local picture
identifiers. Finally, the decoder uses the identified one or more
reference pictures in reconstruction of the temporal direct
prediction mode macroblock. In a specific implementation, the local
picture identifiers are 8-bit local picture identifiers. In other
implementations, different length local picture identifiers are
used (e.g., 5-bit and 32-bit local picture identifiers).
[0022] In a specific implementation, a table is used to identify
matching reference pictures. For example, the decoder creates a
table that stores reference picture list index values for reference
pictures in the reference picture list, where the stored reference
picture list index values are indexed in the table by their
respective local picture identifiers. The decoder performs the
identification by looking up local picture identifiers of the one
or more reference pictures used by the co-located macroblock in the
table and retrieving corresponding reference picture list index
values, where the retrieved reference picture list index values
identify the one or more reference pictures in the reference
picture list of the temporal direct prediction mode macroblock that
match the one or more reference pictures used by the co-located
macroblock.
[0023] According to another aspect of the techniques and tools
described herein, a decoder receives encoded video information in a
bitstream and during decoding performs loop filtering on a
macroblock. For example, the loop filtering comprises calculating
boundary strength values for plural blocks, where the calculating
comprises determining whether reference pictures used by the plural
blocks are the same by comparing local picture identifiers of the
reference pictures. In a specific implementation, the local picture
identifiers are 8-bit local picture identifiers. In other
implementations, different length local picture identifiers are
used (e.g., 5-bit and 32-bit local picture identifiers).
[0024] According to yet another aspect of the techniques and tools
described herein, a decoder receives encoded video information in a
bitstream and during decoding determines a picture type of a
picture and based on the picture type selectively skips or
simplifies computation of co-located information for use in
reconstructing direct prediction mode macroblocks (e.g., temporal
or spatial direct prediction mode macroblocks) outside the
picture.
[0025] The various techniques and tools can be used in combination
or independently. Additional features and advantages will be made
more apparent from the following detailed description of different
embodiments, which proceeds with reference to the accompanying
figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[0026] FIG. 1 is a block diagram illustrating a generalized example
of a suitable computing environment in which several of the
described embodiments may be implemented.
[0027] FIG. 2 is a block diagram of a generalized video decoder in
conjunction with which several of the described embodiments may be
implemented.
[0028] FIG. 3 is a flowchart illustrating an example method for
decoding video information using local picture identifiers.
[0029] FIG. 4 is a flowchart illustrating an example technique for
determining a picture type.
[0030] FIG. 5 is a flowchart illustrating an example method for
simplifying computation of co-located information during decoding
of video information.
DETAILED DESCRIPTION
[0031] The present application relates to innovations in
implementations of video decoders. Many of these innovations reduce
decoding complexity and/or increase decoding speed to improve
decoding performance. These innovations include the use of local
picture identifiers (IDs). Local picture identifiers can be used
during computation of co-located information and during deblock
filtering. For example, an 8-bit local picture ID can be used in
place of a global 64-bit picture ID. These innovations also include
improvements in computation of co-located information. For example,
a picture type can be used during computation of co-located
information to improve computation efficiency (e.g., speed and
memory utilization) during video decoding.
[0032] The innovations described herein can be implemented by
single-threaded or multi-threaded decoders. In some
implementations, a multi-threaded decoder uses decoder modules that
facilitate multi-threaded decoding. For example, in some
implementations a PED module is used. The PED module finds a
complete picture from the bit stream and initializes the parameters
and data structures that will be used for decoding the picture. The
PED module populates some of the initialized parameters and
structures with parameters parsed from the bit stream. The PED
module also enters the initialized (but as yet un-decoded) picture
into a live DPB, which facilitates multithreaded decoding. For
additional detail, see U.S. Patent Application Publication No.
2009-0003446-A1, entitled "COMPUTING COLLOCATED MACROBLOCK
INFORMATION FOR DIRECT MODE MACROBLOCKS," the disclosure of which
is hereby incorporated by reference.
[0033] Collectively, these improvements are at times loosely
referred to as "optimizations." As used conventionally and as used
herein, the term "optimization" means an improvement that is deemed
to provide a good balance of performance in a particular scenario
or platform, considering computational complexity, memory use,
processing speed, and/or other factors. Use of the term
"optimization" does not foreclose the possibility of further
improvements, nor does it foreclose the possibility of adaptations
for other scenarios or platforms.
[0034] With these innovations, efficient decoder implementations
have been provided for diverse platforms. The implementations
include media players for gaming consoles with complex,
special-purpose hardware and graphics capabilities, personal
computers, and set-top boxes/digital video receivers.
[0035] Various alternatives to the implementations described herein
are possible. For example, certain techniques described with
reference to flowchart diagrams can be altered by changing the
ordering of stages shown in the flowcharts, by repeating or
omitting certain stages, etc., while achieving the same result. As
another example, although some implementations are described with
reference to specific macroblock formats, other formats also can be
used. As another example, while several of the innovations
described below are presented in terms of H.264/AVC decoding
examples, the innovations are also applicable to other types of
decoders (e.g., MPEG-2, VC-1) that provide or support the same or
similar decoding features.
[0036] The various techniques and tools described herein can be
used in combination or independently. For example, although
flowcharts in the figures typically illustrate techniques in
isolation from other aspects of decoding, the illustrated
techniques in the figures can typically be used in combination with
other techniques (e.g., shown in other figures). Different
embodiments implement one or more of the described techniques and
tools. Some of the techniques and tools described herein address
one or more of the problems noted in the Background. Typically, a
given technique/tool does not solve all such problems, however.
Rather, in view of constraints and tradeoffs in decoding time
and/or resources, the given technique/tool improves performance for
a particular implementation or scenario.
I. Computing Environment
[0037] FIG. 1 illustrates a generalized example of a suitable
computing environment (100) in which several of the described
embodiments may be implemented. The computing environment (100) is
not intended to suggest any limitation as to scope of use or
functionality, as the techniques and tools may be implemented in
diverse general-purpose or special-purpose computing
environments.
[0038] With reference to FIG. 1, the computing environment (100)
includes at least one CPU (110) and associated memory (120) as well
as at least one GPU or other co-processing unit (115) and
associated memory (125) used for video acceleration. In FIG. 1,
this most basic configuration (130) is included within a dashed
line. The processing unit (110) executes computer-executable
instructions and may be a real or a virtual processor. In a
multi-processing system, multiple processing units execute
computer-executable instructions to increase processing power. A
host encoder or decoder process offloads certain computationally
intensive operations (e.g., fractional sample interpolation for
motion compensation, in-loop deblock filtering) to the GPU (115).
The memory (120, 125) may be volatile memory (e.g., registers,
cache, RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory,
etc.), or some combination of the two. The memory (120, 125) stores
software (180) for a decoder implementing one or more of the
decoder innovations described herein.
[0039] A computing environment may have additional features. For
example, the computing environment (100) includes storage (140),
one or more input devices (150), one or more output devices (160),
and one or more communication connections (170). An interconnection
mechanism (not shown) such as a bus, controller, or network
interconnects the components of the computing environment (100).
Typically, operating system software (not shown) provides an
operating environment for other software executing in the computing
environment (100), and coordinates activities of the components of
the computing environment (100).
[0040] The computer-readable storage medium (140) may be removable
or non-removable, and includes magnetic disks, magnetic tapes or
cassettes, CD-ROMs, DVDs, or any other tangible medium which can be
used to store information and which can be accessed within the
computing environment (100). The computer-readable storage medium
(140) may also include the memory (120) and (125) (e.g., RAM, ROM,
flash memory, etc.). The storage (140) stores instructions for the
software (180). The computer-readable storage medium (140) does not
include the communication medium (170) described below (e.g.,
signals).
[0041] The input device(s) (150) may be a touch input device such
as a keyboard, mouse, pen, or trackball, a voice input device, a
scanning device, or another device that provides input to the
computing environment (100). For audio or video encoding, the input
device(s) (150) may be a sound card, video card, TV tuner card, or
similar device that accepts audio or video input in analog or
digital form, or a CD-ROM or CD-RW that reads audio or video
samples into the computing environment (100). The output device(s)
(160) may be a display, printer, speaker, CD-writer, or another
device that provides output from the computing environment
(100).
[0042] The communication connection(s) (170) enable communication
over a communication medium to another computing entity. The
communication medium conveys information such as
computer-executable instructions, audio or video input or output,
or other data in a modulated data signal. A modulated data signal
is a signal that has one or more of its characteristics set or
changed in such a manner as to encode information in the signal. By
way of example, and not limitation, communication media include
wired or wireless techniques implemented with an electrical,
optical, RF, infrared, acoustic, or other carrier.
[0043] The techniques and tools can be described in the general
context of computer-executable instructions, such as those included
in program modules, being executed in a computing environment on a
target real or virtual processor. Generally, program modules
include routines, programs, libraries, objects, classes,
components, data structures, etc. that perform particular tasks or
implement particular abstract data types. The functionality of the
program modules may be combined or split between program modules as
desired in various embodiments. Computer-executable instructions
for program modules may be executed within a local or distributed
computing environment.
[0044] For the sake of presentation, the detailed description uses
terms like "decide," "make" and "get" to describe computer
operations in a computing environment. These terms are high-level
abstractions for operations performed by a computer, and should not
be confused with acts performed by a human being. The actual
computer operations corresponding to these terms vary depending on
implementation.
II. Example Organization of Video Frames
[0045] For progressive video, lines of a video frame contain
samples starting from one time instant and continuing through
successive lines to the bottom of the frame. An interlaced video
frame consists of two scans--one for the even lines of the frame
(the top field) and the other for the odd lines of the frame (the
bottom field).
[0046] A progressive video frame can be divided into 16.times.16
macroblocks. For 4:2:0 format, a 16.times.16 macroblock includes
four 8.times.8 blocks (Y0 through Y3) of luma (or brightness)
samples and two 8.times.8 blocks (Cb, Cr) of chroma (or color
component) samples, which are collocated with the four luma blocks
but half resolution horizontally and vertically.
[0047] An interlaced video frame includes alternating lines of the
top field and bottom field. The two fields may represent two
different time periods or they may be from the same time period.
When the two fields of a frame represent different time periods,
this can create jagged tooth-like features in regions of the frame
where motion is present.
[0048] Therefore, interlaced video frames can be rearranged
according to a field structure, with the odd lines grouped together
in one field, and the even lines grouped together in another field.
This arrangement, known as field coding, is useful in high-motion
pictures. For an interlaced video frame organized for
encoding/decoding as separate fields, each of the two fields of the
interlaced video frame is partitioned into macroblocks. The top
field is partitioned into macroblocks, and the bottom field is
partitioned into macroblocks. In the luma plane, a 16.times.16
macroblock of the top field includes 16 lines from the top field,
and a 16.times.16 macroblock of the bottom field includes 16 lines
from the bottom field, and each line is 16 samples long.
[0049] On the other hand, in stationary regions, image detail in
the interlaced video frame may be more efficiently preserved
without rearrangement into separate fields. Accordingly, frame
coding (at times referred to coding with MBAFF pictures) is often
used in stationary or low-motion interlaced video frames. An
interlaced video frame organized for encoding/decoding as a frame
is also partitioned into macroblocks. In the luma plane, each
macroblock includes 8 lines from the top field alternating with 8
lines from the bottom field for 16 lines total, and each line is 16
samples long. Within a given macroblock, the top-field information
and bottom-field information may be coded jointly or separately at
any of various phases--the macroblock itself may be field-coded or
frame-coded.
III. Generalized Video Decoder
[0050] FIG. 2 is a block diagram of a generalized video decoder
(200) in conjunction with which several described embodiments may
be implemented. A corresponding video encoder (not shown) may also
implement one or more of the described embodiments.
[0051] The relationships shown between modules within the decoder
(200) indicate general flows of information in the decoder; other
relationships are not shown for the sake of simplicity. In
particular, while a decoder host performs some operations of
modules of the decoder (200), a video accelerator performs other
operations (such as inverse frequency transforms, fractional sample
interpolation, motion compensation, in-loop deblocking filtering,
color conversion, post-processing filtering and/or picture
re-sizing). For example, the decoder (200) passes instructions and
information to the video accelerator as described in "Microsoft
DirectX VA: Video Acceleration API/DDI," version 1.01, a later
version of DXVA or another acceleration interface. In general, once
the video accelerator reconstructs video information, it maintains
some representation of the video information rather than passing
information back. For example, after a video accelerator
reconstructs an output picture, the accelerator stores it in a
picture store, such as one in memory associated with a GPU, for use
as a reference picture. The accelerator then performs in-loop
deblock filtering and fractional sample interpolation on the
picture in the picture store.
[0052] In some implementations, different video acceleration
profiles result in different operations being offloaded to a video
accelerator. For example, one profile may only offload out-of-loop,
post-decoding operations, while another profile offloads in-loop
filtering, fractional sample interpolation and motion compensation
as well as the post-decoding operations. Still another profile can
further offload frequency transform operations. In still other
cases, different profiles each include operations not in any other
profile.
[0053] Returning to FIG. 2, the decoder (200) processes video
pictures, which may be video frames, video fields or combinations
of frames and fields. The bit stream syntax and semantics at the
picture and macroblock levels may depend on whether frames or
fields are used. The decoder (200) is block-based and uses a 4:2:0
macroblock format for frames. For fields, the same or a different
macroblock organization and format may be used. 8.times.8 blocks
may be further sub-divided at different stages. Alternatively, the
decoder (200) uses a different macroblock or block format, or
performs operations on sets of samples of different size or
configuration.
[0054] The decoder (200) receives information (295) for a
compressed sequence of video pictures and produces output including
a reconstructed picture (205) (e.g., progressive video frame,
interlaced video frame, or field of an interlaced video frame). The
decoder system (200) decompresses predicted pictures and key
pictures. For the sake of presentation, FIG. 2 shows a path for key
pictures through the decoder system (200) and a path for predicted
pictures. Many of the components of the decoder system (200) are
used for decompressing both key pictures and predicted pictures.
The exact operations performed by those components can vary
depending on the type of information being decompressed.
[0055] A demultiplexer (290) receives the information (295) for the
compressed video sequence and makes the received information
available to the entropy decoder (280). The entropy decoder (280)
entropy decodes entropy-coded quantized data as well as
entropy-coded side information, typically applying the inverse of
entropy encoding performed in the encoder. A motion compensator
(230) applies motion information (215) to one or more reference
pictures (225) to form motion-compensated predictions (235) of
sub-blocks, blocks and/or macroblocks of the picture (205) being
reconstructed. One or more picture stores store previously
reconstructed pictures for use as reference pictures.
[0056] The decoder (200) also reconstructs prediction residuals. An
inverse quantizer (270) inverse quantizes entropy-decoded data. An
inverse frequency transformer (260) converts the quantized,
frequency domain data into spatial domain video information. For
example, the inverse frequency transformer (260) applies an inverse
block transform to sub-blocks and/or blocks of the frequency
transform coefficients, producing sample data or prediction
residual data for key pictures or predicted pictures, respectively.
The inverse frequency transformer (260) may apply an 8.times.8,
8.times.4, 4.times.8, 4.times.4, or other size inverse frequency
transform.
[0057] For a predicted picture, the decoder (200) combines
reconstructed prediction residuals (245) with motion compensated
predictions (235) to form the reconstructed picture (205). A motion
compensation loop in the video decoder (200) includes an adaptive
deblocking filter (210). The decoder (200) applies in-loop
filtering (210) to the reconstructed picture to adaptively smooth
discontinuities across block/sub-block boundary rows and/or columns
in the picture. The decoder stores the reconstructed picture in a
picture buffer (220) for use as a possible reference picture.
[0058] Depending on implementation and the type of compression
desired, modules of the decoder can be added, omitted, split into
multiple modules, combined with other modules, and/or replaced with
like modules. In alternative embodiments, encoders or decoders with
different modules and/or other configurations of modules perform
one or more of the described techniques. Specific embodiments of
video decoders typically use a variation or supplemented version of
the generalized decoder (200).
[0059] For the sake of presentation, the following table provides
example explanations for acronyms and selected shorthand terms used
herein.
TABLE-US-00001 Term Explanation block arrangement (in general,
having any size) of sample values for pixel data or residual data,
for example, including the possible blocks in H.264/AVC - 4 .times.
4, 4 .times. 8, 8 .times. 4, 8 .times. 8, 8 .times. 16, 16 .times.
8, and 16 .times. 16 CABAC context adaptive binary arithmetic
coding CAVLC context adaptive variable length coding DPB decoded
picture buffer ED entropy decoding FIFO first in first out INTRA
spatial intra-prediction LF loop filtering MB megabyte OR
macroblock, depending on context; a macroblock is, e.g., 16 .times.
16 arrangement of sample values for luma with associated
arrangements of sample values for chroma MBAFF macroblock adaptive
frame field MC motion compensation MMCO memory management control
operation NALU network abstraction layer unit PED picture extent
discovery PICAFF picture adaptive frame field PPS picture parameter
set PROG progressive SEI supplemental enhancement information SIMD
single instruction multiple data SPS sequence parameter set stage
(of a set of different passes/steps to decode a picture, such as
PED, ED, MC decoding) and so on sub-block a partition of a sub-MB,
e.g., 8 .times. 4, 4 .times. 8 or 4 .times. 4 block or other size
block sub-MB a partition of an MB, e.g., 16 .times. 8, 8 .times. 16
or 8 .times. 8 block or other size block; in some contexts, the
term sub-MB also indicates sub-blocks task a stage plus input data
wave a set of portions of a picture (e.g., a diagonal set of
macroblocks in the picture) such that each portion within one wave
can be processed in parallel, without dependencies on the other
portions within the same wave; a picture can then be processed as a
sequence of waves where each wave is dependent on the data
resulting from processing the preceding waves
IV. Local Picture Identifier Innovations for a Video Decoder
[0060] In some embodiments, a decoder uses one or more local
picture identifier (ID) innovations when decoding video.
Collectively, the local picture ID innovations improve computation
efficiency (e.g., speed and memory utilization) during video
decoding.
[0061] A. Overall Local Picture Identifier Framework
[0062] In order to identify a picture in a bitstream, the picture's
picture identifier (ID) needs to be known. Initially the picture ID
is ((POC<<1)+structure) of the picture, where POC is Picture
Order Count, and where structure could be frame, top field, or
bottom field. Since POC is a 32-bit variable, generally 33 bits are
needed. In a typical computing system, the result is a 64-bit
picture ID to identify a picture. In an H.264/AVC decoder, there
are two places where a determination must be made whether two
pictures are the same or not. One is in the computation of
co-located pictures for obtaining motion vector information of
direct MBs in a B slice, and the other is in the strength
computation of loop filtering.
[0063] Using a local picture ID (e.g., an 8-bit or 5-bit local
picture ID), which can also be called a reduced-bit picture ID, in
place of a global 64-bit picture ID provides various performance
advantages. For example, 8-bit local picture IDs use 1/8.sup.th the
memory of 64-bit picture IDs. In addition, local picture IDs
improve computation efficiency (e.g., using 8-bit comparisons
instead of 64-bit comparisons). Use of a local picture ID can also
provide efficiency improvements. For example, the x86 architecture
handles 64-bit comparisons using two instructions. Reduction of
64-bit to 8 bit data structures allows x86 comparisons to execute
in one instruction. In addition, less memory is used. The reduction
in bits used to represent the picture ID affects ref_pic_num and
co-located remapping data structures. In a specific test scenario,
an H.264/AVC decoder using 8-bit local picture IDs showed 4 to 7 MB
memory savings using a multi-threading implementation.
[0064] B. Usage of Picture ID
[0065] In an H.264/AVC decoder, there are two places where a
determination needs to be made whether two pictures are the same or
not. The first place is with the computation of co-located
information for direct macroblocks (MBs). In H.264/AVC, when
direct_spatial_mv_pred_flag is 0 (temporal mode is used for direct
macroblock), motion vector (MV) and reference picture information
needs to be retrieved from the co-located MBs. Specifically, the
reference pictures used by the co-located MB of the co-located
picture needs to be found in reference list 0 of the current slice.
Therefore, the picture IDs of the reference pictures used by the
co-located MB needs to be compared with those in the reference list
0 of the current slice.
[0066] The second place in an H.264/AVC decoder where a
determination needs to be made whether two pictures are the same or
not is in the loop filter. In the loop filter, when computing the
strength for deblocking, a comparison needs to be made to determine
whether two inter blocks are using the same reference pictures or
not. In this case, all the pictures used for reference in a picture
come from the same Decoded Picture Buffer (DPB), and a DPB can only
contain, at most, 16.times.3 different pictures. If all the
pictures in the DPB have different local picture IDs, a
determination can be made whether two pictures are the same or
not.
[0067] C. 8-Bit Local Picture ID
[0068] In a specific implementation, an 8-bit local picture ID is
used in place of the global 64-bit picture ID. An 8-bit picture ID
provides a sufficient number of picture identifiers to perform
H.264/AVC decoding even with the use of large-scale multi-threaded
decoding.
[0069] Generally, there will be less than 32 pictures (frame, top
field, or bottom field picture) in flight at the same time, i.e.,
less than 32 pPicHolders, even with large scale multi-threading.
Assume each of the 32 pictures is a frame picture, and will be
split into two fields. The 32 pictures in flight will use 96
(32.times.3) StorablePicture structures. According to the H.264/AVC
specification, the maximum DPB size is 16. Therefore, DPB will use
48 (16.times.3) StorablePicture structures at most.
[0070] In addition, if two pictures' frame_num have a gap, a
function will be called to fill in the frame_num gap. The maximum
number of StorablePicture structures used to fill frame_num gap is
48 (16.times.3). Because a mechanism is used to release those
pictures used for fill frame_num gap right after they are bumped
out from DPB, in total only 96 (16.times.3.times.2) StorablePicture
structures are needed, assuming the worst case that the pictures
used for fill_frame_num_gap is bumped out by the pictures used for
fill_frame_num_gap again.
[0071] Overall, there are a maximum of 240 (96+48+96)
StorablePicture structures in flight during the lifetime of an
H.264/AVC decoder. When a StorablePicture structure is allocated, a
unique 8-bit picture ID can be assigned to it. An 8-bit local
picture ID provides 255 unique values, and is thus able to
accommodate the maximum of 240 StorablePicture structures. The
8-bit picture ID will be attached to the StorablePicture structure
and remain the same during the lifetime of the H.264/AVC
decoder.
[0072] This specific implementation of a local 8-bit picture ID
assumes there will be up to 32 pictures (frame, top field, or
bottom field picture) in flight at the same time. However, a local
8-bit picture ID can support up to 37 pictures in flight at the
same time. If more than 37 pictures in flight are required, the
local picture ID can be extended beyond 8-bits (e.g., a 16-bit
local picture ID can be used).
[0073] With the loop filter, because the StorablePicture structures
come from the same DPB, different StorablePicture structures in the
DPB will have different 8-bit picture IDs. Determining whether two
references pictures are the same or not can be done easily with the
8-bit picture ID.
[0074] In the computation of co-located information, an 8-bit local
picture ID is sufficient to decode content conforming to the
H.264/AVC specification. The fact that an 8-bit local picture ID
can be used to decode conforming content may not be initially
obvious when considering the process that finds the corresponding
picture in reference list 0 of the current slice for the reference
picture used by the co-located MB of the co-located picture.
However, it can be proven that this process operates correctly
using an 8-bit local picture ID.
[0075] Assume there is one slice per picture, without loss of
generality. Current picture A is using some pictures as reference
in list 0 and list 1. Co-located picture B is using some other
pictures as reference in list 0 and list 1. The corresponding
pictures in list 0 of current picture A need to be found for the
reference pictures used by picture B. In decoding order, co-located
picture B is decoded first, some pictures in the middle, and then
current picture A. During the decoding process from picture B to
picture A, some pictures used as reference by co-located picture B
may be bumped out from the DPB, get deleted with a picture ID x,
POC y, and structure z and reused again with a picture ID x, POC m,
and structure n, since the 8-bit local picture ID will keep the
same throughout the lifetime of the H.264/AVC decoder. In this case
the two StorablePicture structures have the same 8-bit local
picture ID, even though they are actually different pictures. If
the StorablePicture structure with a picture ID x, POC y, and
structure z is in the reference lists of co-located picture B, and
the StorablePicture structure with an picture ID x, POC m, and
structure n is in the reference lists of current picture A, they
will be treated as the same picture, because now they have the same
picture ID x. If this situation ever occurs, it may cause
corruption of the decoded content. However, this situation will
never occur for conforming content.
[0076] According to Section 8.4.1.2.3 of the H.264/AVC
specification, when a picture in list 0 or list 1 of the co-located
picture is used as reference picture by a co-located MB, the same
picture shall be in the list 0 of current picture. That means in
the decoding process from co-located picture B to current picture
A, the picture cannot get bumped out from DPB and deleted. It also
means that when a picture is used as a reference picture by a
co-located MB, the picture found in list 0 of the current picture
must be the correct match. When a direct MB is decoded in current
picture A, the location in list 0 (of current picture A) of the
picture used as a reference by the co-located MB is needed. If
those reference indices/positions are correct, the direct MB can be
decoded correctly. As for those pictures that get bumped out from
DPB, deleted, and reused during the decoding process from
co-located picture B to current picture A, they will never be used
as reference pictures by co-located MB, and therefore it is
irrelevant whether the matching for them is correct or not.
[0077] D. 5-Bit Local Picture ID
[0078] In another specific implementation, a 5-bit local picture ID
is used in place of the 64-bit picture ID. A 5-bit local picture ID
can be used, for example, with a single-threaded decoder (e.g.,
either in a DXVA implementation or a software implementation).
[0079] E. Alternative Local Picture ID Implementations
[0080] Depending on implementation details, a 5-bit or 8-bit local
picture ID may not be the most efficient choice. For example, with
the XBox 360 architecture, 32-bit operations are more efficient
than 8-bit operations. Therefore, with the XBox 360, a 32-bit local
picture ID can be used (in place of a 64-bit picture ID). Such a
32-bit local picture ID only needs to include 8-bits of relevant
information (e.g., the upper three bytes of the 32-bit local
picture ID are not used).
[0081] F. Choice of Invalid Picture ID
[0082] The JM reference code sets the invalid picture ID to
0x8000000000000000. In boundary strength computation of the loop
filter, a comparison of picture ID with branch is involved. For the
8-bit local picture ID design, the invalid picture ID value is set
to 255. This allows the local picture ID to be compared with
shifting and logical operations, and in turn speeds up the
computation process.
[0083] The JM reference code reads as follows:
TABLE-US-00002 if ( refidx>=0) q0 =
ref_pic_num[slice_id][list][refidx) else q0 =
0x8000000000000000;
[0084] When modified to support the 8-bit local picture ID, the
code reads as follows: [0085]
(((refidx)>>(sizeof(RefPicNumType)*8-1))|(ref_pic_num[slice_id][lis-
t][refidx))) Where sizeof(RefPicNumType) is 1.
[0086] Depending on the number of bits used for the local picture
ID (e.g., 5-bit, 16-bit, 32-bit), a similar invalid picture ID can
be used. For example, for a 32-bit local picture ID, 0xffffffff can
be used.
[0087] G. Table Based Remapping for Co-Located Computation
[0088] A reference index (ref_idx in H.264) in a slice is an index
to a picture in a reference picture list of the slice. In different
slices, reference indices with the same value (e.g., 3) may refer
to different pictures because the reference picture lists for the
different slices can be different. When the decoder retrieves
collocated macroblock information for a direct mode macroblock in a
B slice, the decoder determines which picture (if any) in the B
slice's reference picture list corresponds to the reference picture
used for reference by the collocated macroblock that provides the
collocated macroblock information.
[0089] In co-located computation, the reference pictures used by
co-located MBs in co-located pictures need to be mapped to those in
list 0 of the current slice. In a specific implementation, a table
is used in the remapping procedure as follows. [0090] First all the
pictures that are not in list 0 of current slice are initialized.
memset(rgPicIDRefIdxMap, -1, sizeof(char)*256);
[0091] Next, the index of the existing reference picture in list 0
of the current slice is stored in the table. Note that duplicate
reference pictures are skipped in list 0 of the current slice
because the reference picture used by the co-located MB in the
co-located picture is mapped to the first matching picture in list
0 of the current slice.
TABLE-US-00003 for
(i=0;i<pSliceHolder->listXsize[LIST_0];i++) { RefPicNumType
StorablePicID =
pSliceHolder->listX[LIST_0][i]->StorablePicID;
H264_ASSERT(StorablePicID<INVALID_REF_PIC_NUM); if
(-1==rgPicIDRefIdxMap[StorablePicID]) {
rgPicIDRefIdxMap[StorablePicID] = (char)i; } }
[0092] Using the remapping process, the index in list 0 of the
current slice can be retrieved for the reference picture used by
the co-located MB directly with the index table above. The
remapping process can improve computation efficiency up to 16 or 32
times.
[0093] H. Example Local Picture ID Implementation
[0094] FIG. 3 depicts an example method 300 for decoding video
information using local picture identifiers. At 310, a temporal
direct prediction mode macroblock is identified. The macroblock is
associated with a reference picture list (e.g., reference picture
list 0) and the reference pictures of the reference picture list
are identified using local picture identifiers (e.g., 8-bit local
picture IDs).
[0095] At 320, a co-located macroblock of the temporal direct
prediction mode macroblock is identified. The co-located macroblock
uses one or more reference pictures.
[0096] At 330, one or more reference pictures are identified in the
reference picture list that match the one or more reference
pictures used by the co-located macroblock, where the identifying
the one or more reference pictures in the reference picture list
uses local picture identifiers.
[0097] At 340, the temporal direct prediction mode macroblock is
reconstructed using the identified reference pictures.
[0098] In the example method 300, the local picture IDs can be, for
example, 5-bit local picture IDs, 8-bit local picture IDs, or
32-bit local picture IDs.
[0099] In some implementations, a table can be used to identify
matching reference pictures (330). For example, a table can be
created, where the table stores reference picture list index values
for reference pictures in the reference picture list, and where the
stored reference picture list index values are indexed in the table
by their respective local picture identifiers. Once the table has
been created, it can be used in the identification process, where
the identification is performed by looking up local picture
identifiers of the one or more reference pictures used by the
co-located macroblock in the table and retrieving corresponding
reference picture list index values, where the retrieved reference
picture list index values identify the one or more reference
pictures in the reference picture list of the temporal direct
prediction mode macroblock that match the one or more reference
pictures used by the co-located macroblock.
[0100] I. Hardware Acceleration
[0101] The local picture ID framework can be implemented with
software decoders and hardware accelerated decoders. For example,
the local picture ID framework can be implemented with hardware
accelerated decoders that support DirectX Video Acceleration
(DXVA).
V. Innovations in Computation of Co-Located Information for a Video
Decoder
[0102] In some embodiments, a decoder uses one or more innovations
related to the computation of co-located information when decoding
video. Collectively, the computation of co-located information
innovations improve computation efficiency (e.g., speed and memory
utilization) during video decoding.
[0103] A direct mode macroblock uses information from a
corresponding macroblock in a collocated picture when determining
which motion vectors to apply in motion compensation. The
information from the corresponding macroblock is an example of
collocated macroblock information. In many encoding scenarios, more
than half of the macroblocks in B slices are direct mode
macroblocks, and efficient determination of collocated macroblock
information is important to performance.
[0104] A. Overall Computation Framework
[0105] In an H.264/AVC encoded video bitstream, B slices can
contain many direct MBs. For direct MBs, there is no MV or RefIdx
information encoded in the bitstream. The MV and RefIdx information
is derived from co-located MBs and their spatial neighbors.
[0106] When spatial mode is used for direct MBs, the MV and RefIdx
information is obtained from spatial neighbors with median
prediction. However, a check needs to be made to determine whether
the co-located MB is moving or not. If the co-located MB is not
moving, the MV will be reset to 0. Otherwise, the MV and RefIdx
information from median prediction is used.
[0107] When temporal mode is used for direct MBs, the MV and RefIdx
information is obtained from co-located MBs. The reference picture
used by a co-located MB is found in list 0 of the current slice.
This reference picture in list 0 of the current slice is one of the
reference pictures for the direct MB. The co-located picture is the
other reference picture for the direct MB.
[0108] With the setup of MV and RefIdx information for direct MBs,
the MV and RefIdx information needs to be accessed in the
co-located picture, and some computation needs to be performed.
Various optimizations can be performed depending on the picture
type of the co-located picture.
[0109] For example, if the co-located picture type is identified as
"I picture," then its side information, motion vectors, macro-block
type and reference index do not need to be checked. Therefore,
information retrieval and checking operations can be eliminated.
Similarly, if the co-located picture type is identified as "P
picture," then only half of the information and retrieval
checking/computation needs to be performed.
[0110] B. Definition of Picture Type
[0111] There is no picture type in the H.264/AVC specification. In
a specific implementation, in order to support the improvements in
computation of co-located information, a picture type is defined as
follows. When a picture is encountered in PED, its picture type is
assigned to one of the below types, as follows:
[0112] I picture (bIPicture): all the slices in the picture are I
slices,
[0113] P picture (bPPicture): all the slices in the picture are I
or P slices but not all the slices are I slices,
[0114] B picture (bBPicture): at least one slice in the picture is
B slice.
[0115] The type of a picture can only be one of the three types
defined above. A picture cannot be assigned more than one type
according to the above definition.
[0116] FIG. 4 is a flowchart illustrating an example technique 400
for determining a picture type, using the definition described
above. In the flowchart 400, a picture is encountered in PED
410.
[0117] At 420, a check is made to determine whether all the slices
in the picture are I slices. If yes, the picture type is set to "I
Picture" 430. If not, the technique proceeds to 440.
[0118] At 440, a check is made to determine whether all the slices
in the picture are I or P slices (with at least one P slice). If
yes, the picture type is set to "P Picture" 450. If not, the
technique proceeds to 460.
[0119] At 460, a check is made to determine if at least one slice
in the picture is a B slice. If yes, the picture type is set to "B
Picture" 470. If not, the technique proceeds to 480. Alternatively,
if the determination at 440 is "no," then the picture can be
automatically set to "B Picture" 470 because that is the only
remaining picture type (i.e., the check at 460 can be skipped).
[0120] At 480, a check is made to see if there are any remaining
pictures. If so, the next picture is assigned a picture type 410.
Otherwise, the technique ends.
[0121] C. Computation of Co-Located Information
[0122] For 16.times.16 direct MBs with spatial mode, the following
four optimizations regarding computation of co-located information
can be performed.
[0123] First, when the co-located picture (the co-located picture
is the picture containing the co-located macroblock of the direct
macroblock to be decoded) is a long term picture, the co-located MB
is always treated as "moving". Therefore, there is no need to
retrieve any information from the co-located picture. The whole
direct MB has the same MV and RefIdx. It can be recast into a
16.times.16 MB.
[0124] Second, when the co-located picture is an I picture, the
co-located MB is always treated as "moving". Therefore, there is no
need to retrieve any information from the co-located picture. The
whole direct MB has the same MV and RefIdx. It can be recast into a
16.times.16 MB.
[0125] Third, when the co-located picture is a P picture, only the
information from list 0 of the co-located picture (not from list 1)
needs to be retrieved because list 1 does not exist for a P
picture. The computation for "moving" detection has to be done for
the information from list 0. A check needs to be made to determine
whether the whole direct MB can be recast into a 16.times.16
MB.
[0126] Fourth, when the co-located picture is a B picture, the
information from list 0 and list 1 of co-located picture needs to
be retrieved. The computation for "moving" detection has to be done
for the information from list 0 and list 1. A check needs to be
made to determine whether the whole direct MB can be recast into a
16.times.16 MB.
[0127] For 16.times.16 direct MBs with temporal mode, the following
three optimizations regarding computation of co-located information
can be performed.
[0128] First, when the co-located picture is an I picture, the
information coming from the co-located MB is fixed (i.e., all
invalid RefIdxs). Therefore, there is no need to retrieve any
information from the co-located picture. The whole direct MB has
the same MV and RefIdx (i.e., all 0 MVs and 0 RefIdxs). It can be
recast into a 16.times.16 MB.
[0129] Second, when the co-located picture is a P picture, only the
information from list 0 of co-located picture needs to be retrieved
(not from list 1) because list 1 does not exist for a P picture. A
check needs to be made to determine whether the whole direct MB can
be recast into a 16.times.16 MB.
[0130] Third, when the co-located picture is a B picture, the
information from list 0 and list 1 of the co-located picture needs
to be retrieved. A check needs to be made to determine whether the
whole direct MB can be recast into a 16.times.16 MB.
[0131] A direct MB is a 16.times.16 block. By default it is treated
as 16 4.times.4 blocks or 4 8.times.8 blocks with different side
information, including motion vectors and reference frames.
However, if all the 16 4.times.4 blocks or 4 8.times.8 blocks have
the same side information, then the block partition (16 4.times.4
blocks or 4 8.times.8 blocks) does not matter, and the direct MB
can be treated as one 16.times.16 block. Performing motion
compensation and deblocking operations on a whole 16.times.16 block
is more efficient, in typical scenarios, than performing such
operations on 16 4.times.4 blocks or 4 8.times.8 blocks.
[0132] FIG. 5 depicts an example method 500 for simplifying
computation of co-located information during decoding of video
information. At 510, encoded video information is received (e.g.,
in a bitstream).
[0133] At 520, a picture type is determined for a picture based on
slice type of one or more slices in the picture. In a specific
implementation, the picture is assigned a picture type according to
the flowchart depicted in FIG. 4, and as described in Section V(B)
above. The picture can be called a "co-located picture" because it
may contain a co-located macroblock of a direct prediction
macroblock to be decoded.
[0134] At 530, based on the picture type of the picture, the
decoder selectively skips or simplifies computation of co-located
information for use in reconstruction of one or more direct
prediction mode macroblocks outside the picture.
[0135] a direct prediction mode macroblock is identified. The
direct prediction mode macroblock can be a temporal direct
prediction mode macroblock or a spatial direct prediction mode
macroblock. In a specific implementation, the skipping and
simplifications described in Section V(C) above are performed.
[0136] Depending on the content and encoding parameters used, the
above optimizations can save significant resources during
computation of co-located information. For example, experimental
results using HD-DVD clips result in a large number of direct MB's
in B slices (approximately 50% of the MBs are direct MBs in some
situations). In addition, B pictures are not used for reference in
HD-DVD clips. With such HD-DVD clips, the above optimizations can
reduce the computation of co-location information by approximately
half.
[0137] In view of the many possible embodiments to which the
principles of the disclosed invention may be applied, it should be
recognized that the illustrated embodiments are only preferred
examples of the invention and should not be taken as limiting the
scope of the invention. Rather, the scope of the invention is
defined by the following claims. We therefore claim as our
invention all that comes within the scope and spirit of these
claims.
* * * * *