U.S. patent application number 13/539086 was filed with the patent office on 2013-01-03 for fast encoding method for lossless coding.
This patent application is currently assigned to FUTUREWEI TECHNOLOGIES, INC.. Invention is credited to Gregory Cook, Wen Gao, Jin Song, Mingyuan Yang, Haoping Yu.
Application Number | 20130003839 13/539086 |
Document ID | / |
Family ID | 46506637 |
Filed Date | 2013-01-03 |
United States Patent
Application |
20130003839 |
Kind Code |
A1 |
Gao; Wen ; et al. |
January 3, 2013 |
Fast Encoding Method for Lossless Coding
Abstract
An apparatus comprising a processor configured to receive a
current block of a video frame, and determine a coding mode for the
current block based on only a bit rate cost function, wherein the
coding mode is selected from a plurality of available coding modes,
and wherein calculation of the bit rate cost function does not
consider distortion of the current block.
Inventors: |
Gao; Wen; (West Windsor,
NJ) ; Cook; Gregory; (San Jose, CA) ; Yang;
Mingyuan; (Shenzhen, CN) ; Song; Jin;
(Shenzhen, CN) ; Yu; Haoping; (Carmel,
IN) |
Assignee: |
FUTUREWEI TECHNOLOGIES,
INC.
Plano
TX
|
Family ID: |
46506637 |
Appl. No.: |
13/539086 |
Filed: |
June 29, 2012 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61503534 |
Jun 30, 2011 |
|
|
|
61506958 |
Jul 12, 2011 |
|
|
|
Current U.S.
Class: |
375/240.12 ;
375/E7.243 |
Current CPC
Class: |
H04N 19/176 20141101;
H04N 19/132 20141101; H04N 19/129 20141101; H04N 19/146 20141101;
H04N 19/136 20141101; H04N 19/50 20141101; H04N 19/61 20141101;
H04N 19/593 20141101; H04N 19/46 20141101 |
Class at
Publication: |
375/240.12 ;
375/E07.243 |
International
Class: |
H04N 7/32 20060101
H04N007/32 |
Claims
1. An apparatus comprising: a processor configured to: receive a
current block of a video frame; and determine a coding mode for the
current block based on only a bit rate cost function, wherein the
coding mode is selected from a plurality of available coding modes,
and wherein calculation of the bit rate cost function does not
consider distortion of the current block.
2. The apparatus of claim 1, wherein the coding mode results in a
least number of bits needed to encode the current block compared
with all other coding modes in the plurality of available coding
modes.
3. The apparatus of claim 2, wherein the current block is encoded
using a transform bypass encoding scheme, wherein a transform step
and a quantization step are bypassed in the transform bypass
encoding scheme.
4. The apparatus of claim 2, wherein the current block is encoded
using a transform without quantization encoding scheme, wherein a
quantization step is bypassed in the transform without quantization
encoding scheme.
5. A method comprising: receiving a current block of a video frame;
and determining a coding mode for the current block based on only a
bit rate cost function, wherein the coding mode is selected from a
plurality of available coding modes, and wherein calculation of the
bit rate cost function does not consider distortion of the current
block.
6. The method of claim 5, wherein the coding mode results in a
least number of bits needed to encode the current block compared
with all other coding modes in the plurality of available coding
modes.
7. The method of claim 6, wherein the current block is encoded
using a transform bypass encoding scheme, wherein a transform step
and a quantization step are bypassed in the transform bypass
encoding scheme.
8. The method of claim 6, wherein the current block is encoded
using a transform without quantization encoding scheme, wherein a
quantization step is bypassed in the transform without quantization
encoding scheme.
9. An apparatus used in video coding comprising: a processor
configured to: for each of a plurality of pixels in a block,
determine a difference with one of a plurality of corresponding
pixels in a reference block, wherein each difference is based on
two color values of a pair of compared pixels; and if each of the
differences is within a pre-set boundary, generate information to
signal the block as a skipped block, wherein the information
identifies the block and the reference block, and include the
information into a bitstream without further encoding of the
block.
10. The apparatus of claim 9, wherein the block is a coding unit
(CU), and wherein the reference block is a reference CU.
11. The apparatus of claim 10, wherein the information comprises: a
plurality of coordinates of the CU; and a plurality of coordinates
of the reference CU.
12. The apparatus of claim 10, wherein the pre-set boundary is
.+-.1.
13. The apparatus of claim 10, wherein the pre-set boundary is
0.
14. The apparatus of claim 12, wherein the block is located at a
first position in a video frame, wherein the reference block is
located at a second position in a reference video frame, wherein
the first position and second position are equal in coordinates,
wherein each pair of compared pixels are located at a same position
in the block and the reference block, wherein the video frame or
the reference video frame is a predicted frame (P-frame), an
intra-coded frame (I-frame), or a bi-directionally predicted frame
(B-frame).
15. The apparatus of claim 12, wherein the block is located at a
first position in a video slice, wherein the reference block is
located at a second position in a reference video slice, wherein
the first position and second position are equal in coordinates,
wherein each pair of compared pixels are located at a same position
in the block and the reference block, wherein the video slice or
the reference video slice is a predicted slice (P-slice), an
intra-coded slice (I-slice), or a bi-directionally predicted slice
(B-slice).
16. The apparatus of claim 12, wherein the processor is further
configured to: if any of the differences exceeds the pre-set
boundary, determine a coding mode for the block based on only a bit
rate cost function, wherein the coding mode is selected from a
plurality of available coding modes, wherein calculation of the bit
rate cost function does not consider distortion of the block, and
wherein the coding mode results in a least number of bits needed to
encode the current block compared with all other coding modes in
the plurality of available coding modes.
17. A method used in video coding comprising: for each of a
plurality of pixels in a block, determining a difference with one
of a plurality of corresponding pixels in a reference block,
wherein each difference is based on two color values of a pair of
compared pixels; and if each of the differences is within a pre-set
boundary, generating information to signal the block as a skipped
block, wherein the information identifies the block and the
reference block, and including the information into a bitstream
without further encoding of the block.
18. The method of claim 17, wherein the block is a coding unit
(CU), and wherein the reference block is a reference CU.
19. The method of claim 18, wherein the information comprises: a
plurality of coordinates of the CU; and a plurality of coordinates
of the reference CU.
20. The method of claim 18, wherein the pre-set boundary is
.+-.1.
21. The method of claim 18, wherein the pre-set boundary is 0.
22. The method of claim 20, wherein the block is located at a first
position in a video frame, wherein the reference block is located
at a second position in a reference video frame, wherein the first
position and second position are equal in coordinates, wherein each
pair of compared pixels are located at a same position in the block
and the reference block, wherein the video frame or the reference
video frame is a predicted frame (P-frame), an intra-coded frame
(I-frame), or a bi-directionally predicted frame (B-frame).
23. The method of claim 20, wherein the block is located at a first
position in a video slice, wherein the reference block is located
at a second position in a reference video slice, wherein the first
position and second position are equal in coordinates, wherein each
pair of compared pixels are located at a same position in the block
and the reference block, wherein the video slice or the reference
video slice is a predicted slice (P-slice), an intra-coded slice
(I-slice), or a bi-directionally predicted slice (B-slice).
24. The method of claim 20, further comprising: if any of the
differences exceeds the pre-set boundary, determining a coding mode
for the block based on only a bit rate cost function, wherein the
coding mode is selected from a plurality of available coding modes,
wherein calculation of the bit rate cost function does not consider
distortion of the block, and wherein the coding mode results in a
least number of bits needed to encode the current block compared
with all other coding modes in the plurality of available coding
modes.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims priority to U.S. Provisional
Patent Application No. 61/503,534 filed Jun. 30, 2011 by Wen Gao et
al. and entitled "Lossless Coding Tools for Compound Video", and
U.S. Provisional Patent Application No. 61/506,958 filed Jul. 12,
2011 by Wen Gao et al. and entitled "Additional Lossless Coding
Tools for Compound Video", each of which is incorporated herein by
reference as if reproduced in its entirety.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
[0002] Not applicable.
REFERENCE TO A MICROFICHE APPENDIX
[0003] Not applicable.
BACKGROUND
[0004] The amount of video data needed to depict even a relatively
short film can be substantial, which may result in difficulties
when the data is to be streamed or otherwise communicated across a
communications network with limited bandwidth capacity. Thus, video
data is generally compressed prior to being communicated across
modern day telecommunications networks. Video compression devices
often use software and/or hardware at the source to code the video
data prior to transmission, thereby decreasing the quantity of data
needed to represent digital video images. The compressed data is
then received at the destination by a video decompression device
that decodes the video data. Due to limited network resources,
improved compression and decompression techniques that increase
compression ratios without substantially reducing image quality are
desirable.
SUMMARY
[0005] In one embodiment, the disclosure includes an apparatus
comprising a processor configured to receive a current block of a
video frame, and determine a coding mode for the current block
based on only a bit rate cost function, wherein the coding mode is
selected from a plurality of available coding modes, and wherein
calculation of the bit rate cost function does not consider
distortion of the current block.
[0006] In another embodiment, the disclosure includes a method
comprising receiving a current block of a video frame, and
determining a coding mode for the current block based on only a bit
rate cost function, wherein the coding mode is selected from a
plurality of available coding modes, and wherein calculation of the
bit rate cost function does not consider distortion of the current
block.
[0007] In yet another embodiment, the disclosure includes an
apparatus used in video coding comprising a processor configured to
for each of a plurality of pixels in a block, determine a
difference with one of a plurality of corresponding pixels in a
reference block, wherein each difference is based on two color
values of a pair of compared pixels, and if each of the differences
is within a pre-set boundary, generate information to signal the
block as a skipped block, wherein the information identifies the
block and the reference block, and include the information into a
bitstream without further encoding of the block.
[0008] In yet another embodiment, the disclosure includes a method
used in video coding comprising for each of a plurality of pixels
in a block, determining a difference with one of a plurality of
corresponding pixels in a reference block, wherein each difference
is based on two color values of a pair of compared pixels, and if
each of the differences is within a pre-set boundary, generating
information to signal the block as a skipped block, wherein the
information identifies the block and the reference block, and
including the information into a bitstream without further encoding
of the block.
[0009] These and other features will be more clearly understood
from the following detailed description taken in conjunction with
the accompanying drawings and claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] For a more complete understanding of this disclosure,
reference is now made to the following brief description, taken in
connection with the accompanying drawings and detailed description,
wherein like reference numerals represent like parts.
[0011] FIG. 1 is a schematic diagram of an embodiment of a
transform bypass encoding scheme.
[0012] FIG. 2 is a schematic diagram of an embodiment of a
transform bypass decoding scheme.
[0013] FIG. 3 is a schematic diagram of an embodiment of a
transform without quantization encoding scheme.
[0014] FIG. 4 is a schematic diagram of an embodiment of a
transform without quantization decoding scheme.
[0015] FIG. 5 is a schematic diagram of an embodiment of a lossy
encoding scheme.
[0016] FIG. 6 is a schematic diagram of an embodiment of a lossy
decoding scheme.
[0017] FIG. 7 is a flowchart of an embodiment of an encoding
method.
[0018] FIG. 8 is a flowchart of an embodiment of a decoding
method.
[0019] FIG. 9 is a flowchart of an embodiment of an encoding mode
selection method.
[0020] FIG. 10 is a schematic diagram of an embodiment of a network
unit.
[0021] FIG. 11 is a schematic diagram of a general-purpose computer
system.
DETAILED DESCRIPTION
[0022] It should be understood at the outset that, although an
illustrative implementation of one or more embodiments are provided
below, the disclosed systems and/or methods may be implemented
using any number of techniques, whether currently known or in
existence. The disclosure should in no way be limited to the
illustrative implementations, drawings, and techniques illustrated
below, including the exemplary designs and implementations
illustrated and described herein, but may be modified within the
scope of the appended claims along with their full scope of
equivalents.
[0023] Typically, video media involves displaying a sequence of
still images or frames in relatively quick succession, thereby
causing a viewer to perceive motion. Each frame may comprise a
plurality of picture elements or pixels, each of which may
represent a single reference point in the frame. During digital
processing, each pixel may be assigned an integer value (e.g., 0,
1, . . . or 255) that represents an image quality or
characteristic, such as luminance or chrominance, at the
corresponding reference point. In use, an image or video frame may
comprise a large amount of pixels (e.g., 2,073,600 pixels in a
1920.times.1080 frame), thus it may be cumbersome and inefficient
to encode and decode (referred to hereinafter simply as code) each
pixel independently. To improve coding efficiency, a video frame is
usually broken into a plurality of rectangular blocks or
macroblocks, which may serve as basic units of processing such as
prediction, transform, and quantization. For example, a typical
N.times.N block may comprise N.sup.2 pixels, where N is an integer
greater than one and is often a multiple of four.
[0024] In a working draft of the International Telecommunications
Union (ITU) Telecommunications Standardization Sector (ITU-T) and
the International Organization for Standardization
(ISO)/International Electrotechnical Commission (IEC), High
Efficiency Video Coding (HEVC), which is poised to be the next
video standard, new block concepts have been introduced. For
example, coding unit (CU) may refer to a sub-partitioning of a
video frame into rectangular blocks of equal or variable size. In
HEVC, a CU may replace macroblock structure of previous standards.
Depending on a mode of inter or intra prediction, a CU may comprise
one or more prediction units (PUs), each of which may serve as a
basic unit of prediction. For example, for intra prediction, a
64.times.64 CU may be symmetrically split into four 32.times.32
PUs. For another example, for an inter prediction, a 64.times.64 CU
may be asymmetrically split into a 16.times.64 PU and a 48.times.64
PU. Similarly, a PU may comprise one or more transform units (TUs),
each of which may serve as a basic unit for transform and/or
quantization. For example, a 32.times.32 PU may be symmetrically
split into four 16.times.16 TUs. Multiple TUs of one PU may share a
same prediction mode, but may be transformed separately. Herein,
the term block may generally refer to any of a macroblock, CU, PU,
or TU.
[0025] Depending on the application, a block may be coded in either
a lossless mode (i.e., no distortion or information loss) or a
lossy mode (i.e., with distortion). In use, high quality videos
(e.g., with YUV subsampling of 4:4:4) may be coded using a lossless
mode, while low quality videos (e.g., with YUV subsampling of
4:2:0) may be coded using a lossy mode. Sometimes, a single video
frame or slice (e.g., with YUV subsampling of either 4:4:4 or
4:2:0) may employ both lossless and lossy modes to code a plurality
of regions, which may be rectangular or irregular in shape. Each
region may comprise a plurality of blocks. For example, a compound
video may comprise a combination of different types of contents,
such as texts, computer graphics, and natural-view content (e.g.,
camera-captured video). In a compound frame, regions of texts and
graphics may be coded in a lossless mode, while regions of
natural-view content may be coded in a lossy mode. Lossless coding
of texts and graphics may be desired, e.g. in computer screen
sharing applications, since lossy coding may lead to poor quality
or fidelity of texts and graphics, which may cause eye fatigue.
Current HEVC test models (HMs), such as HM 3.0, may code
natural-view content fairly efficiently. However, the current HMs
may lack a lossless coding mode for certain videos, thus their
coding efficiency and speed may be limited.
[0026] In lossy coding schemes of current HMs, a bit rate and
distortion of a coded video may need to be balanced. To achieve low
distortion, often more information (e.g., pixel values or transform
coefficients) needs to be encoded, leading to more encoded bits and
thus a higher bit rate. On the other hand, to achieve a smaller bit
rate, certain information may need to be removed. For example,
through a two-dimensional transform operation, pixel values in a
spatial domain are converted to transform coefficients in a
frequency domain. In a transform coefficient matrix, high-index
transform coefficients (e.g., in bottom-right corner) corresponding
to small spatial features may have relatively small values. Thus,
in a subsequent quantization operation, larger quantization
coefficients may be applied on the high-index transform
coefficients. After integer rounding, a number of zero-valued
transform coefficients may be created in the high-index positions,
which may then be skipped in following encoding steps. Although
quantization may lower the bit rate, information for small spatial
features may be lost in the coding process. The lost information
may be irretrievable, thus distortion may be increased and coding
fidelity lowered in the decoded video.
[0027] In use, there may be a plurality of coding modes to code a
video frame. For example, a particular slice of the frame may use
various block partitions (number, size, and shape). For each
partition, if inter prediction is to be used, there may be various
motion vectors associated with one or more reference frames.
Otherwise, if intra prediction is to be used, there may be various
reference pixels corresponding to various intra prediction modes.
Each coding mode may lead to a different bit rate and/or
distortion. Thus, a rate-distortion optimization (RDO) module in a
video encoder may be configured to select a best coding mode from
the plurality of coding modes to determine an optimal balance or
trade-off between the bit rate and distortion.
[0028] Current HMs may jointly evaluate an overall cost of bit rate
and distortion by using a joint rate-distortion (RD) cost. For
example, a bit rate (denoted as R) and a distortion cost (denoted
as D) may be combined into a single joint rate-distortion (RD) cost
(denoted as J), which may be mathematically presented as:
J=D+.lamda.R
where .lamda. is a Lagrangian coefficient representing the
relationship between a bit rate and a particular quality level.
[0029] Various mathematical metrics may be used to calculate
distortion, such as a sum of squared distortion (SSD), sum of
absolute error (SAE), sum of absolute differences (SAD), mean of
absolute difference (MAD), or mean of squared errors (MSE). Using
any of these distortion metrics, the RDO process may attempt to
find a coding mode that minimizes J.
[0030] In current HMs, the selection of an optimal coding mode in
an encoder may be a complex process. For example, for every
available coding mode (denoted as m) of every block, the encoder
may code the block using mode m and calculate R, which is the
number of bits required to code the block. Then, the encoder may
reconstruct the block and calculate D, which is a difference
between the original and reconstructed block. Then, the encoder may
calculate the mode cost J.sub.m using the equation above. This
process may be repeated for every available coding mode. Then, the
encoder may choose a mode that gives the minimum J.sub.m. The RDO
process in the encoder may be a computationally intensive process,
since there may be potentially hundreds of possible coding modes,
e.g., based on various combinations of block sizes, inter
prediction frames, intra prediction directions. Both R and D of the
block may need to be calculated hundreds of times before the best
coding mode may be determined.
[0031] In addition, when a sequence of video frames is being coded,
sometimes certain regions may remain stable for a relatively long
period of time. For example, in video conferencing applications, a
background region of each user may remain unchanged for tens of
minutes. In current encoders, the RDO module may still evaluate bit
rate and/or distortion for blocks in these regions, which may
consume valuable computation resource and time.
[0032] Disclosed herein are systems and methods for improved video
coding. The disclosure provides a lossless coding mode and a forced
skip mode, which may complement a lossy coding mode in coding of a
video such as a compound video. The lossless mode may include a
transform bypass coding scheme and a transform without quantization
coding scheme. In lossless coding of a block, since no distortion
(or only slight distortion) may be induced, the RDO mode selection
process may be simplified. In an embodiment, only a bit rate
portion of a joint RD cost is preserved. Thus, from a plurality of
available coding modes, the RDO process may only need to determine
an optimal coding mode that leads to a least number of bits. A
reconstructed block may not need to be compared with an original
source block, which may save both computation resource and time.
Furthermore, if a video frame or slice comprises one or more
regions which remain stable for a relatively long period (e.g.,
tens of seconds or minutes), the RDO process may implement a forced
skip mode in the one or more regions. In an embodiment of the
forced skip mode, if a CU is found to be an exact match (or an
approximate match with difference in a pre-set boundary) with a
corresponding reference CU in a reference frame, the CU may be
skipped in the rest of the encoding steps. Due to implementation of
the simplified RDO mode selection scheme and the forced skip mode,
videos may be coded both faster and more efficiently.
[0033] In use, there may be a module before an encoder to analyze
contents of a video frame, and identify certain regions (e.g.,
texts and/or graphics regions) where lossless encoding is desired.
Information or instructions regarding which regions to encode in a
lossless mode may be passed to the encoder. Based on the
information, the encoder may encode the identified regions using
the lossless mode. Alternatively, a user may manually define
certain regions to be encoded using a lossless mode, and provide
the encoder with information identifying these regions. Thus, a
video (e.g., a compound video) may be encoded in a lossless mode
and/or a lossy mode, depending on information received by the
encoder. Herein, the lossless encoding mode may include transform
bypass encoding and transform without quantization encoding. These
two lossless encoding schemes as well as a lossy encoding scheme
are described herein.
[0034] Likewise, based on information contained in a received
bitstream, a video decoder may decode a video frame using a
lossless mode and/or a lossy mode. The lossless decoding mode may
include transform bypass decoding and transform without
quantization decoding. The two lossless decoding schemes as well as
a lossy decoding scheme are described herein.
[0035] FIG. 1 illustrates an embodiment of a transform bypass
encoding scheme 100, which may be implemented in a video encoder.
The transform bypass encoding scheme 100 may comprise a
rate-distortion optimization (RDO) module 110, a prediction module
120, an entropy encoder 130, and a reconstruction module 140
arranged as shown in FIG. 1. In operation, an input video
comprising a sequence of video frames (or slices) may be received
by the encoder. Herein, a frame may refer to any of a predicted
frame (P-frame), an intra-coded frame (I-frame), or a bi-predictive
frame (B-frame). Likewise, a slice may refer to any of a P-slice,
an I-slice, or a B-slice.
[0036] The RDO module 110 may be configured to make logic decisions
for one or more of other modules. In an embodiment, based on one or
more previously encoded frames, the RDO module 110 may determine
how a current frame (or slice) being encoded is partitioned into a
plurality of CUs, and how a CU is partitioned into one or more PUs
and TUs. For example, homogeneous regions of the current frame
(i.e., no or slight difference from previously encoded frames) may
be partitioned into relatively larger blocks, and detailed regions
of the current frame (i.e., significant difference from previously
encoded frames) may be partitioned into relatively smaller
blocks.
[0037] In addition, the RDO module 110 may control the prediction
module 120 by determining how the current frame is predicted. The
current frame may be predicted via inter and/or intra prediction.
Inter prediction (i.e., inter frame prediction) may exploit
temporal redundancies in a sequence of frames, e.g. similarities
between corresponding blocks of successive frames, to reduce
compression data. In inter prediction, the RDO module 110 may
determine a motion vector of a block in the current frame based on
a corresponding block in one or more reference frames. On the other
hand, intra prediction may exploit spatial redundancies within a
single frame, e.g., similarities between adjacent blocks, to reduce
compression data. In intra prediction, reference pixels adjacent to
a current block may be used to generate a prediction block. Intra
prediction (i.e., intra frame prediction) may be implemented using
any of a plurality of available prediction modes or directions
(e.g., 34 modes in HEVC), which may be determined by the RDO module
110. For example, the RDO module 110 may calculate a sum of
absolute error (SAE) for each prediction mode, and select a
prediction mode that results in the smallest SAE.
[0038] Based on logic decisions made by the RDO module 110, the
prediction module 120 may utilize either one or more reference
frames (inter prediction) or a plurality of reference pixels (intra
prediction) to generate a prediction block, which may be an
estimate of a current block. Then, the current block may be
subtracted by the prediction block, thereby generating a residual
block. The residual block may comprise a plurality of residual
values, each of which may indicate a difference between a pixel in
the current block and a corresponding pixel in the prediction
block. Then, all values of the residual block may be scanned and
encoded by the entropy encoder 130 into an encoded bitstream. The
entropy encoder 130 may employ any entropy encoding scheme, such as
context-adaptive binary arithmetic coding (CABAC) encoding,
exponential Golomb encoding, or fixed length encoding, or any
combination thereof. In the transform bypass encoding scheme 100,
since the residual block is encoded without a transform step or a
quantization step, no information loss may be induced in the
encoding process.
[0039] To facilitate continuous encoding of video frames, the
residual block may also be fed into the reconstruction module 140,
which may generate either reference pixels for intra prediction of
future blocks or reference frames for inter prediction of future
frames. If desired, filtering may be performed on the reference
frames/pixels before they are used for inter/intra prediction. A
person skilled in the art is familiar with the functioning of the
prediction module 120 and the reconstruction module 140, so these
modules will not be further described. It should be noted that FIG.
1 may be a simplified illustration of a video encoder, thus it may
only include a portion of modules present in the encoder. Other
modules (e.g., filter, scanner, and transmitter), although not
shown in FIG. 1, may also be included to facilitate video encoding.
Prior to transmission from the encoder, the encoded bitstream may
be further configured to include other information, such as video
resolution, frame rate, block partitioning information (sizes,
coordinates), prediction modes, etc., so that the encoded sequence
of video frames may be properly decoded.
[0040] FIG. 2 illustrates an embodiment of a transform bypass
decoding scheme 200, which may be implemented in a video decoder.
The transform bypass decoding scheme 200 may correspond to the
transform bypass encoding scheme 100, and may comprise an entropy
decoder 210, a prediction module 220, and a reconstruction module
230 arranged as shown in FIG. 2. In operation, an encoded bitstream
containing information of a sequence of video frames may be
received by the entropy decoder 210, which may decode the bitstream
to an uncompressed format. The entropy decoder 210 may employ any
entropy decoding scheme, such as CABAC decoding, exponential Golomb
decoding, or fixed length encoding, or any combination thereof.
[0041] For a current block being decoded, a residual block may be
generated after the execution of the entropy decoder 210. In
addition, information containing a prediction mode of the current
block may also be decoded by the entropy decoder 210. Then, based
on the prediction mode, the prediction module 220 may generate a
prediction block for the current block based on previously decoded
blocks or frames. If the prediction mode is an inter mode, one or
more previously decoded reference frames may be used to generate
the prediction block. Otherwise, if the prediction mode is an intra
mode, a plurality of previously decoded reference pixels in
reference blocks may be used to generate the prediction block.
Then, the reconstruction module 230 may combine the residual block
with the prediction block to generate a reconstructed block.
Additionally, to facilitate continuous decoding of video frames,
the reconstructed block may be used in a reference frame to inter
predict future frames. Some pixels of the reconstructed block may
also serve as reference pixels for intra prediction of future
blocks in the same frame.
[0042] In use, if an original block is encoded and decoded using
lossless schemes, such as the transform bypass encoding scheme 100
and the transform bypass decoding scheme 200, no information loss
may be induced in the entire coding process. Thus, barring
distortion caused during transmission, a reconstructed block may be
exactly the same with the original block. This high fidelity of
coding may improve a user's experience in viewing video contents
such as texts and graphics.
[0043] During lossless coding of certain regions in a video frame,
sometimes it may be desirable to include a transform step into the
coding process. For example, for some blocks of a text region, an
added transform step may generate a shorter bitstream compared to a
transform bypass coding scheme. In an embodiment, a RDO module may
be configured to determine whether to include the transform step.
For example, a test transform may be performed to convert a
residual block to a matrix of transform coefficients. If a number
of bits needed to encode transform coefficients may be smaller
compared to a number of bits needed to encode residual values
without transform in the residual block, the transform step may be
included. Otherwise, the transform step may be bypassed. FIG. 3
illustrates an embodiment of a transform without quantization
encoding scheme 300, which may comprise a RDO module 310, a
prediction module 320, a transform module 330, an entropy encoder
340, an inverse transform module 350, and a reconstruction module
360. Some aspects of the transform without quantization encoding
scheme 300 may be the same or similar to the transform bypass
encoding scheme 100 in FIG. 1, thus the similar aspects will not be
further described in the interest of clarity.
[0044] The transform without quantization encoding scheme 300 may
be implemented in a video encoder, which may receive an input video
comprising a sequence of video frames. The RDO module 310 may be
configured to control one or more of other modules, and may be the
same or similar to the RDO module 110 in FIG. 1. Based on logic
decisions made by the RDO module 310, the prediction module 320 may
utilize either reference frames (inter prediction) or reference
pixels (intra prediction) to generate a prediction block, which is
an estimate of a current block. Then, the current block may be
subtracted by the prediction block, thereby generating a residual
block. The prediction module 320 may be the same or similar to the
prediction module 120 in FIG. 1.
[0045] Instead of being entropy encoded directly, the residual
block in the transform without quantization encoding scheme 300 may
be first transformed from a spatial domain to a frequency domain by
the transform module 330. The transform module 330 may convert the
values of the residual block (i.e., residual values) to a transform
matrix comprising a plurality of transform coefficients. The
transform module 330 may be implemented using any appropriate
algorithm, such as a discrete cosine transform (DCT), a fractal
transform (FT), or a discrete wavelet transform (DWT). In use, some
algorithms, such as a 4.times.4 integer transform defined in
H.264/advanced video coding (AVC), may not induce any information
loss, while other algorithms, such as an 8.times.8 integer DCT
transform defined in the HEVC working draft, may induce slight
information loss. For example, since the 8.times.8 integer DCT
transform in HEVC may not be fully reversible, recovered values of
the residual block after the inverse transform module 350 may be
slightly different (e.g., up to .+-.2 values) from the original
values of the residual block before the transform module 330. When
slight information loss is induced, the encoding may be near
lossless instead of lossless. However, compared with a quantization
step, the information loss caused by the transform step may be
insignificant or unnoticeable, thus the transform without
quantization encoding scheme 300 may also be included herein as
part of a lossless coding scheme.
[0046] Transform coefficients generated by the transform module 330
may be scanned and encoded by the entropy encoder 340 into an
encoded bitstream. The entropy encoder 340 may be the same or
similar with the entropy encoder 130. To facilitate continuous
encoding of video frames, the transform coefficients may also be
fed into the inverse transform module 350, which may perform the
inverse of the transform module 330 and generate an exact version
(i.e., lossless) or an approximation (i.e., near lossless) of the
residual block. Then, the residual block may be fed into the
reconstruction module 360, which may generate either reference
pixels for intra prediction of future blocks or reference frames
for inter prediction of future frames. The reconstruction module
360 may be the same or similar to the reconstruction module 140 in
FIG. 1. Prior to transmission from the encoder, the encoded
bitstream may include other information, such as video resolution,
frame rate, block partitioning information (sizes, coordinates),
prediction modes, etc., so that the encoded sequence of video
frames may be properly decoded.
[0047] FIG. 4 illustrates an embodiment of a transform without
quantization decoding scheme 400, which may be implemented in a
video decoder. The without quantization decoding scheme 400 may
correspond to the transform without quantization encoding scheme
300, and may comprise an entropy decoder 410, an inverse transform
module 420, a prediction module 430, and a reconstruction module
440 arranged as shown in FIG. 4. In operation, an encoded bitstream
containing information of a sequence of video frames may be
received by the entropy decoder 410, which may decode the bitstream
to an uncompressed format. The entropy decoder 410 may be the same
or similar to the entropy decoder 210 in FIG. 2.
[0048] After execution of the entropy decoder 410, a matrix of
transform coefficients may be generated, which may then be fed into
the inverse transform module 420. The inverse transform module 420
may convert the transform coefficients in a frequency domain to
residual pixel values in a spatial domain. In use, depending on
whether an algorithm used by the inverse transform module 420 is
fully reversible, an exact version (i.e., lossless) or an
approximation (i.e., near lossless) of the residual block may be
generated. The inverse transform module 420 may be the same or
similar with the inverse transform module 350 in FIG. 3.
[0049] In addition, information containing a prediction mode of the
current block may also be decoded by the entropy decoder 410. Based
on the prediction mode, the prediction module 430 may generate a
prediction block for the current block. The prediction module 430
may be the same or similar with the prediction module 220 in FIG.
2. Then, the reconstruction module 440 may combine the residual
block with the prediction block to generate a reconstructed block.
Additionally, to facilitate continuous decoding of video frames,
the reconstructed block may be used in a reference frame to inter
predict future frames. Some pixels of the reconstructed block may
also serve as reference pixels for intra prediction of future
blocks in the same frame.
[0050] In use, if an original block is encoded and decoded using
near lossless schemes, such as the transform without quantization
encoding scheme 300 and the transform without quantization decoding
scheme 400, only slight distortion may be induced in the coding
process. Thus, barring significant distortion caused during
transmission, a reconstructed block may be almost the same with the
original block. Transform without quantization coding schemes may
be desired sometimes, as they may achieve higher compression ratio
than the transform bypass schemes, without noticeable sacrifice of
coding fidelity.
[0051] As mentioned previously, in current encoders a RDO module
may select an optimal coding mode based on a joint RD cost. On the
contrary, in either a transform bypass lossless coding scheme or a
transform without quantization coding scheme disclosed herein, a
quantization step may be bypassed. Without information loss induced
by quantization, the distortion of an original current block due to
encoding may be, if any, negligible. Thus, a RDO module (e.g., the
RDO module 110 in FIG. 1 or the RDO module 310 in FIG. 3) may
exclude distortion from consideration in its selection of a best
coding mode. With removal of the distortion factor, only a bit rate
portion of a joint RD cost function may be preserved, thus the RD
cost may be referred to as a bit rate cost. In an embodiment, the
bit rate cost may be mathematically expressed as:
J=.lamda.R
[0052] Based on the disclosed bit rate cost function, the RDO
module may test a subset or all of a plurality of available coding
modes for a current block. Tested coding modes may vary in block
size, motion vector, inter prediction reference frame, intra
prediction mode, or reference pixels, or any combination thereof.
For each tested coding mode, a number of bits may be calculated for
a coded residual block of the current block or a coded matrix of
transform coefficients for the current block. After comparing all
resulted bit numbers, the RDO module may select a coding mode that
results in a least number of bits.
[0053] In comparison with current encoders which calculate both D
and R in determining the optimal coding mode, the disclosed coding
mode selection scheme may be relatively simpler. For example, with
removal of the D portion, a reconstructed block may not need to be
compared with its original block anymore. Thus, several calculation
steps may be removed from the evaluation process in each coding
mode, which may save coding time and computation resources.
Considering there may be potentially hundreds of coding modes for
the current block in the evaluation, the savings may be significant
and encoding may be made faster, which may greatly facilitate
real-time encoding process.
[0054] Sometimes it may be unnecessary to code an entire video
frame using a lossless mode. For example, regions containing
natural-view contents (e.g., captured by a low resolution camera)
in a compound video may not require lossless coding, because the
original video quality may already be limited, or because
distortion due to lossy coding may not be significant. FIG. 5
illustrates an embodiment of a lossy encoding scheme 500, which may
be the same or similar with encoding schemes used in current HMs.
The lossy encoding scheme 500 may comprise a RDO module 510, a
prediction module 520, a transform module 530, a quantization
module 540, an entropy encoder 550, a de-quantization module 560,
an inverse transform module 570, and a reconstruction module 580.
Some aspects of the lossy encoding scheme 500 may be the same or
similar to the transform without quantization encoding scheme 300
in FIG. 3, thus the similar aspects will not be further described
in the interest of clarity.
[0055] The lossy encoding scheme 500 may be implemented in a video
encoder, which may receive a sequence of video frames. The RDO
module 510 may be configured to control one or more of other
modules. Based on logic decisions made by the RDO module 310, the
prediction module 320 may utilize either reference frames or
reference pixels to generate a prediction block. Then, a current
block from the input video may be subtracted by the prediction
block to generate a residual block. The residual block may be fed
into the transform module 530, which may convert residual pixel
values into a matrix of transform coefficients.
[0056] In contrast to the transform without quantization encoding
scheme 300, in the lossy encoding scheme 500, the transform
coefficients may be quantized by the quantization module 540 before
being fed into the entropy encoder 550. The quantization module 550
may alter the scale the transform coefficients and round them to
integers, which may reduce the number of non-zero coefficients.
Consequently, a compression ratio may be increased at a cost of
information loss.
[0057] Quantized transform coefficients generated by the
quantization module 540 may be scanned. Non-zero-valued
coefficients may be encoded by the entropy encoder 550 into an
encoded bitstream. The quantized transform coefficients may also be
fed into the de-quantization module 560 to recover the original
scale of the transform coefficients. Then, the inverse transform
module 570 may perform the inverse of the transform module 530 and
generate a noisy version of the original residual block. Then, the
lossy residual block may be fed into the reconstruction module 580,
which may generate either reference pixels for intra prediction of
future blocks or reference frames for inter prediction of future
frames.
[0058] FIG. 6 illustrates an embodiment of a lossy decoding scheme
600, which may be implemented in a video decoder. The lossy
decoding scheme 600 may correspond to the lossy encoding scheme
500, and may comprise an entropy decoder 610, a de-quantization
module 620, an inverse transform module 630, a prediction module
640, and a reconstruction module 650 arranged as shown in FIG. 6.
In operation, an encoded bitstream containing information of a
sequence of video frames may be received by the entropy decoder
610, which may decode the bitstream to an uncompressed format. A
matrix of quantized transform coefficients may be generated, which
may then be fed into the de-quantization module 620, which may be
the same or similar to the de-quantization module 560 in FIG. 5.
Then, output of the de-quantization module 620 may be fed into the
inverse transform module 630, which may convert transform
coefficients to residual values of a residual block. In addition,
information containing a prediction mode of the current block may
also be decoded by the entropy decoder 610. Based on the prediction
mode, the prediction module 640 may generate a prediction block for
the current block. Then, the reconstruction module 650 may combine
the residual block with the prediction block to generate a
reconstructed block. Additionally, to facilitate continuous
decoding, the reconstructed block may be used in a reference frame
to inter predict future frames. Some pixels of the reconstructed
block may also serve as reference pixels for intra prediction of
future blocks in the same frame.
[0059] In an embodiment, if desired, all of the aforementioned
encoding schemes, including the transform bypass encoding scheme
100, the transform without quantization encoding scheme 300, and
the lossy encoding scheme 500, may be implemented in a single
encoder. For example, when encoding a compound video, the encoder
may receive information regarding which regions should be encoded
in a lossless mode and/or which regions should be encoded in a
lossy mode. Based on the information, the encoder may encode
certain regions using a lossy mode and other regions using a
lossless mode. In the lossless mode, a RDO module (e.g., the RDO
module 110 in FIG. 1) of the encoder may determine whether to
bypass a transform step, after comparing bitstream lengths resulted
by the transform bypass encoding scheme 100 and the transform
without quantization encoding scheme 300. Similarly, if desired,
all of the aforementioned decoding schemes, including the transform
bypass decoding scheme 200, the transform without quantization
decoding scheme 400, and the lossy decoding scheme 600, may be
implemented in a single decoder.
[0060] For a decoder to properly reconstruct an encoded video
frame, it should recognize one or more encoding schemes that have
been used to encode the video frame. Since lossless encoding may be
applied only on some regions of the video frame (referred to
hereinafter as lossless encoding regions), lossy encoding may be
applied on the other regions (referred to hereinafter as lossy or
regular encoding regions). Information signaling lossless encoding
regions and/or lossy encoding regions may be conveyed in a
bitstream that carries the encoded video frame. In use, such
information may be packed in a high level syntax structure, such as
a sequence parameter set (SPS) or a picture parameter set (PPS) of
the bitstream. A SPS or PPS may be a key normative part of the
bitstream, and may be defined by a video coding standard. After
receiving of the bitstream, the decoder may extract region
indication information from the SPS or PPS, and then reconstruct
each region according to its encoding mode. In an embodiment, the
SPS or PPS may include a number of rectangular lossless encoding
regions as well as information identifying their positions in the
video frame (e.g., top-left and bottom-right coordinates, or
top-right and bottom-left coordinates). In another embodiment, the
SPS or PPS may include a number of rectangular lossy encoding
regions as well as information identifying their positions in the
video frame (e.g., top-left and bottom-right coordinates, or
top-right and bottom-left coordinates).
[0061] In some applications, such as sharing a screen during a
video conference, certain regions of a video may remain stable
across a plurality of video frames. In this case, region indication
information may only change at a relatively low frequency (e.g.,
once in tens of seconds), thus bitrate overhead caused by this
signaling method may be negligible.
[0062] Within a lossless encoding region, a transform bypass scheme
and/or a transform without quantization scheme may be used. To
allow proper decoding, a bitstream may also contain information
regarding which blocks have been encoded via the transform bypass
scheme and which blocks via the transform without quantization
scheme. In an embodiment, two transform bypass flags may be
introduced for each PU in the lossless encoding region. A luminance
(luma) transform bypass flag may indicate whether a transform step
is bypassed (or skipped) in the coding of luma pixels of a PU, and
a chrominance (chroma) transform bypass flag may indicate whether a
transform step is bypassed in the coding of chroma pixels of the
PU. For example, if a transform module (e.g., the transform module
330 in FIG. 3) is bypassed for the luma pixels, the luma transform
bypass flag may be set to `1`. Otherwise, if the transform module
is used and a quantization module (e.g., the quantization module
540) is bypassed, the luma transform bypass flag may be set to `0`.
Alternatively, if desired, the luma transform bypass flag may be
set to `0` if the transform module is bypassed, and `1` if the
transform module is used. The chroma transform bypass flag may be
set using a same or similar approach with the luma transform bypass
flag.
[0063] Both the luma and chroma transform bypass flags may be
encoded by an entropy encoder (e.g., the entropy encoder 130 in
FIG. 1). The entropy encoder may use a CABAC algorithm, which may
use a plurality of context models. In an embodiment, three context
models may be used for each of the luma and chroma transform bypass
flags. To improve coding efficiency, the entropy encoder may select
a context model based on an index, which may be correlated to
transform bypass flags of adjacent PUs. Consider, for example, the
coding of a luma transform bypass flag for a current PU, with the
assumption that a chroma transform bypass flag for the current PU
may be coded in a same or similar way. Two adjacent PUs--an upper
PU and a left PU--may also have luma transform bypass flags. A sum
of the two luma transform bypass flags may be configured to be the
index of the context models. If either the upper PU or the left PU
does not have a luma transform bypass flag (e.g., the current PU on
a boundary of a lossless encoding region), `0` may be assigned to
the luma transform bypass flag. After entropy encoding using the
selected context model, the encoded luma and chroma transform flags
may be included into the bit stream.
[0064] In an embodiment, the luma and chroma components of a PU may
share a same lossless coding scheme, and both components may bypass
or include a transform step in their coding process. In this case,
a single transform bypass flag may be used for both components.
Compared with separate transform bypass flags for the luma and
chroma components, the single transform bypass flag may lead to
less signaling overhead in the bitstream. Moreover, it should be
noted that, although transform bypass flags (luma and/or chroma)
are set on the PU level in the descriptions above, if desired, the
transform bypass flags may also be similarly set on a TU level,
which may result in finer granularity but more signaling
overhead.
[0065] FIG. 7 is a flowchart of an embodiment of an encoding method
700, which may implement some or all of the aforementioned encoding
schemes in a video encoder. The method 700 may start in step 702,
where an input video comprising a sequence of video frames or
slices may be received. For each frame or a set of frames,
information or instructions indicating one or more lossless
encoding regions and/or lossy encoding regions may also be
received. Next, in step 703, region indication information may be
added to a high level syntax of the compressed bitstream, which may
identify these lossless encoding regions and/or lossy encoding
regions. The syntax may be included in the SPS or PPS of a
bitstream. In an embodiment, the region indication information may
include a number of rectangular lossless encoding regions and their
positions in the video frame (e.g., top-left and bottom-right
coordinates, or top-right and bottom-left coordinates). In another
embodiment, the region indication information may include a number
of rectangular lossy encoding regions and their positions in the
video frame (e.g., top-left and bottom-right coordinates, or
top-right and bottom-left coordinates).
[0066] Next, in step 704, based on received information, the method
700 may determine if a region (e.g., rectangular) currently being
encoded is a lossless encoding region. If the condition in the
block 704 is met, the method 700 may proceed to step 706 to encode
the current region in a lossless mode (e.g., using the transform
bypass encoding scheme 100 and/or the transform without
quantization encoding scheme 300). Otherwise, the method 700 may
proceed to step 730 to encode the current region in a lossy mode
(e.g., using the lossy encoding scheme 500).
[0067] Next, in step 706, a residual block may be generated for
each block of the current region. To generate the residual block, a
RDO module (e.g., the RDO module 110 in FIG. 1) may make logic
decisions, such as selecting a best block partitioning scheme for
the current region, as well as determining a best inter or intra
prediction mode for a current block (e.g., a PU). Based on logic
decisions of the RDO module, a prediction module (e.g., the
prediction module 120) may generate a prediction block, which may
then be subtracted from the current block to obtain the residual
block.
[0068] Next, in step 708, the method 700 may determine if a
transform step should be bypassed for luma and/or chroma components
of the current block, which may be implemented through the RDO
module. If the condition in the block 708 is met, the method 700
may proceed to step 710, where one or more transform bypass flags
for the current block may be set to `1`. Otherwise, the method 700
may proceed to step 720, where the one or more transform bypass
flags may be set to `0`. The binary value may be arbitrary set. For
example, if desired, the one or more transform bypass flags may be
set to `0` in step 710 and `1` in step 720. In use, luma and chroma
components may use separate transform bypass flags. If the two
components always use a same encoding scheme, they may also share a
transform bypass flag.
[0069] Step 710 may be followed by step 712, where the residual
block may be encoded using an entropy encoder (e.g., the entropy
encoder 130 in FIG. 1) into a compressed bitstream. The entropy
encoder may use any suitable algorithm, such as a CABAC algorithm.
In addition, the one or more `1` transform bypass flags may be
encoded by the entropy encoder. In an embodiment, three context
models may be used for each of the luma and chroma components.
[0070] Step 720 may be followed by step 722, where the residual
block may be converted in a transform module (e.g., the transform
module 330 in FIG. 3) into a two-dimensional matrix of transform
coefficients. The transform module may use any suitable transform,
such as an integer DCT transform an integer DCT-like transform.
Next, in step 724, the transform coefficients may be encoded using
an entropy encoder (e.g., the entropy encoder 340 in FIG. 3) into a
compressed bitstream. In addition, the one or more `0` transform
bypass flags may be encoded by the entropy encoder.
[0071] If a lossy encoding mode is chosen for the current region in
step 704, the method 700 may proceed to step 730, where a residual
block may be generated for each block of the current region. To
generate the residual block, a RDO module (e.g., the RDO module 510
in FIG. 5) may select a block partitioning scheme for the current
region and an inter or intra prediction mode for a current block
(e.g., a PU). Based on logic decisions of the RDO module, a
prediction module (e.g., the prediction module 520) may generate a
prediction block, which may then be subtracted from the current
block to obtain the residual block. Next, in step 732, the residual
block may be converted in a transform module (e.g., the transform
module 530) into a matrix of transform coefficients. Next, in step
734, the matrix may be quantized in a quantization module (e.g.,
the quantization module 540) into another matrix of quantized
transform coefficients. Next, in step 736, the quantized transform
coefficients may be encoded using an entropy encoder (e.g., the
entropy encoder 550) into the bitstream which may already have the
region indication information.
[0072] Each block of the current region may be encoded using some
of steps 702-736. In an embodiment, after encoding all blocks in
the current region, in step 740, the bitstream may be transmitted,
for example, over a network to a decoder. It should be understood
that the method 700 may only include a portion of all necessary
encoding steps, thus other steps, such as de-quantization and
inverse transform, may also be incorporated into the encoding
process wherever necessary.
[0073] FIG. 8 is a flowchart of an embodiment of a decoding method
800, which may correspond to the encoding method 700 and may
implement some or all of the aforementioned decoding schemes in a
video decoder. The method 800 may start in step 802, where a
bitstream comprising a sequence of video frames may be received.
Next, in step 804, a high level syntax (e.g., SPS or PPS) of the
bitstream may be checked for region indication information, which
may signal which regions in a frame or a set of frames have been
encoded in a lossless mode. Next, in step 806, based on the region
indication information, the method 800 may determine if a region
(e.g., rectangular) currently being decoded has been encoded in a
lossless mode. If the condition in the block 806 is met, the method
800 may proceed to step 808 to decode the current region in a
lossless mode (e.g., using the transform bypass decoding scheme 200
and/or the transform without quantization decoding scheme 400).
Otherwise, the method 800 may proceed to step 830 to decode the
current region in a lossy mode (e.g., using the lossy decoding
scheme 500).
[0074] For each block of the current region, in step 808, one or
more encoded transform bypass flags may be decoded in an entropy
decoder (e.g., the entropy decoder 210 in FIG. 2), which may
perform the inverse of an entropy encoder. If luma and chroma
components of a current block use separate transform bypass flags,
two flags may be decoded for the current block. Alternatively, if
the luma and chroma components share a transform bypass flag, one
flag may be decoded. Next, in step 810, the method 800 may
determine if the transform bypass flag is `1`. As mentioned above,
a transform bypass flag of `1` may indicate that a transform step
has been bypassed in the encoding process of the current block, and
a transform bypass flag of `0` may indicate that a transform step
has been used without quantization. It should be understood that
the binary value here may be interpreted based on a corresponding
encoding method (e.g., the method 700). For example, if the method
700 reverses the meaning of `1` and `0`, the method 800 may also be
adjusted accordingly. If the condition in the block 810 is met, the
method 800 may proceed to step 812, where a residual block of the
current block may be decoded using the entropy decoder into an
uncompressed format. Otherwise, the method 800 may proceed to step
820, where a matrix of transform coefficients may be decoded using
the entropy decoder. Step 820 may be followed by step 822, where
the transform coefficients may be converted to a residual block of
the current block using an inverse transform module (e.g., the
inverse transform module 420 in FIG. 4).
[0075] If the current region needs to be decoded in a lossy
decoding mode (determined by block 806), the method 800 may proceed
to step 830, where a matrix of quantized transform coefficients may
be decoded in an entropy decoder (e.g., the entropy decoder 610 in
FIG. 6). Next, in step 832, the quantized transform coefficients
may be de-quantized to recover an original scale of the transform
coefficients. Next, in step 834, the transform coefficients may be
inverse transformed to a residual block of the current block.
[0076] After obtaining the residual block using either a lossless
or lossy decoding mode, in step 840, a prediction block may be
generated. The prediction block may be based on information
(decoded from the bitstream using the entropy encoder) comprising a
prediction mode, as well as one or more previously coded frames or
blocks. Next, in step 842, the residual block may be added to the
prediction block, thus generating a reconstructed block. Depending
on the encoding and decoding schemes used, the reconstructed block
may be an exact, approximate, or noisy version of the original
block (before encoding). Barring distortion introduced during
transmission, all information from the original block may be
preserved in transform bypass coding. Depending on properties of
transform and inverse transform, all (or nearly all) information
may be preserved in transform without quantization coding. Certain
information may be lost in lossy coding, and the degree of loss may
mostly depend on the quantization and de-quantization steps. To
facilitate continuous decoding of blocks, some pixels of the
reconstructed block may also serve as reference pixels for decoding
of future blocks. Likewise, the current frame may also serve as a
reference frame for decoding of future frames.
[0077] As mentioned previously, when a sequence of video frames is
being coded, sometimes certain regions may remain stable for a
relatively long period of time. For example, in video conferencing
applications, a background region of each user may remain unchanged
for tens of minutes. For another example, in computer screen
sharing applications (e.g., used in online video gaming), one or
more regions containing text and/or graphics may remain unchanged
for tens of seconds or minutes. Since continuous coding of these
stable regions may consume unnecessary computation resource and
time, it may be desirable to skip these regions from the coding
process.
[0078] In use, a RDO module (e.g., the RDO module 110 in FIG. 1,
the RDO module 310 in FIG. 3, or the RDO module 510 in FIG. 5) may
initiate a forced skip mode (also referred to hereafter as a skip
mode). Consider, for example, a CU currently being encoded in a
P-slice. It should be noted that any other type of block (e.g.,
macroblock or PU) and any other type of slice or frame (e.g.,
B-slice, I-slice, P-frame, B-frame, I-frame) may be coded using a
same or similar skip mode. The RDO module may select an optimal
coding mode for the current CU in the P-slice. In an embodiment,
before generating any residual value, the current CU may be first
compared with one or more corresponding CUs (referred to hereafter
as reference CUs) positioned at a same position in one or more
reference slices. The reference slices of the P-slice may be of any
type. If an exact match is found between all corresponding pixels
of the current CU and a reference CU, a forced skip mode may be
determined as the optimal coding mode for the current CU.
Alternatively, in an embodiment, if differences between all
corresponding pixels of the current CU and the reference CU are
found to be within a small pre-set boundary (e.g., .+-.1), the
forced skip mode may also be determined as the optimal coding mode
for the current CU.
[0079] In the forced skip mode, the RDO module may skip the rest of
the RDO and coding steps for the current CU, which may improve
encoding speed. For example, the RDO module may skip a RDO process
where RD or bit rate costs are calculated in various coding modes
(e.g., various inter/intra prediction modes and/or PU/TU
partitions). Instead, the current CU may be flagged or signaled as
a skipped CU. Information identifying the skipped CU and its
matching reference CU may be included in a bitstream. In an
embodiment, for each of the skipped CU and its matching reference
CU, the signaling information may comprise a size and/or a
plurality of coordinates (e.g. top-left and bottom-right
coordinates, or top-right and bottom-left coordinates). No residual
value or transform coefficient of the skipped CU may be needed in
the bitstream.
[0080] Upon receiving of the bitstream, a video decoder may check
to see if a current CU has been encoded in a forced skip mode based
on signaling information contained in the bitstream. If yes, then
pixel values of the matching reference CU may be used to
reconstruct the current CU. Since there may be potentially a large
number of CUs that may be coded in the forced skip mode, the bit
rate of coding these CUs may be significantly reduced. Further, the
coding process may be made faster, and computation resources may be
saved accordingly.
[0081] FIG. 9 illustrates an embodiment of an encoding mode
selection method 900. The method 900 may be complimentary to the
encoding method 700 in FIG. 7, thus, if desired, both methods may
be implemented in a same encoder. The method 900 may start in step
910, where a current block (e.g., a CU or macroblock) may be
compared with one or more corresponding reference blocks. The
corresponding reference blocks may be located in a reference frame
or slice of any type. In an embodiment, luma and/or chroma
components of each pixel within the current block may be compared
with luma and/or chroma components of each corresponding pixel
(located at a same position) within the one or more reference
blocks. A difference of pixel value may be generated for each pair
of compared pixels.
[0082] Next, in step 920, the method 900 may determine if all
differences are within a pre-set boundary or tolerance or range
(e.g., .+-.1). If the condition in the block 920 is met, the method
900 may proceed to step 930. Otherwise, the method 900 may proceed
to step 940. In step 930, information may be included to the
bitstream to signal that the current block is encoded in a forced
skip mode. Information may identify the skipped CU and its matching
reference CU. In an embodiment, for each of the skipped CU and its
matching reference CU, the signaling information may comprise a
size and/or a plurality of coordinates (e.g. top-left and
bottom-right coordinates, or top-right and bottom-left
coordinates). The rest of encoding steps (e.g., RDO mode selection,
encoding of residual block) may be skipped for the current
block.
[0083] In step 940, the method 900 may determine if the current
block is located within a lossless encoding region. If the
condition in the block 920 is met, the method 900 may proceed to
step 950. Otherwise, the method 900 may proceed to step 960. In
step 950, an encoding mode leading to a least number of bits may be
selected as an optimal mode. The optimal mode may be determined by
a RDO module (e.g., the RDO module 110 in FIG. 1 or the RDO module
310 in FIG. 3), which may test a plurality of combinations of
various block sizes, motion vectors, inter prediction reference
frames, intra prediction modes, and/or reference pixels. Since no
distortion or only slight distortion may be induced in the lossless
mode, the RDO module may exclude the distortion portion of a RD
cost function in determining the optimal coding mode. Next, in step
960, the current block may be encoded in a lossless mode using a
transform bypass lossless coding scheme and/or a transform without
quantization coding scheme.
[0084] In step 970, an encoding mode leading to a smallest RD cost
may be selected as an optimal mode. The RD cost of different
encoding modes may take into account both the bit rate portion and
the distortion portion in determining the optimal coding mode.
Next, in step 980, the current block may be encoded in a lossy mode
using a lossy encoding scheme. It should be understood that the
method 900 may only include a portion of all necessary encoding
steps, thus other steps, such as transform, quantization,
de-quantization, inverse transform, and transmission, may also be
incorporated into the encoding process wherever appropriate.
[0085] FIG. 10 illustrates an embodiment of a network unit 1000,
which may comprise an encoder and decoder that processes video
frames as described above, for example, within a network or system.
The network unit 1000 may comprise a plurality of ingress ports
1010 and/or receiver units (Rx) 1012 for receiving data from other
network units or components, logic unit or processor 1020 to
process data and determine which network unit to send the data to,
and a plurality of egress ports 1030 and/or transmitter units (Tx)
1032 for transmitting data to the other network units. The logic
unit or processor 1020 may be configured to implement any of the
schemes described herein, such as the transform bypass encoding
scheme 100, the transform without quantization encoding scheme 300,
at least one of the encoding method 700 and the decoding method
800, and/or the encoding mode selection method 900. The logic unit
1020 may be implemented using hardware, software, or both.
[0086] The schemes described above may be implemented on any
general-purpose network component, such as a computer or network
component with sufficient processing power, memory resources, and
network throughput capability to handle the necessary workload
placed upon it. FIG. 11 illustrates a schematic diagram of a
typical, general-purpose network component or computer system 1100
suitable for implementing one or more embodiments of the methods
disclosed herein, such as the encoding method 700 and the decoding
method 800. The general-purpose network component or computer
system 1100 includes a processor 1102 (which may be referred to as
a central processor unit or CPU) that is in communication with
memory devices including secondary storage 1104, read only memory
(ROM) 1106, random access memory (RAM) 1108, input/output (I/O)
devices 1110, and network connectivity devices 1112. Although
illustrated as a single processor, the processor 1102 is not so
limited and may comprise multiple processors. The processor 1102
may be implemented as one or more CPU chips, cores (e.g., a
multi-core processor), field-programmable gate arrays (FPGAs),
application specific integrated circuits (ASICs), and/or digital
signal processors (DSPs), and/or may be part of one or more ASICs.
The processor 1102 may be configured to implement any of the
schemes described herein, including the transform bypass encoding
scheme 100, the transform without quantization encoding scheme 300,
at least one of the encoding method 700 and the decoding method
800, and/or the encoding mode selection method 900. The processor
1102 may be implemented using hardware, software, or both.
[0087] The secondary storage 1104 is typically comprised of one or
more disk drives or tape drives and is used for non-volatile
storage of data and as an over-flow data storage device if the RAM
1108 is not large enough to hold all working data. The secondary
storage 1104 may be used to store programs that are loaded into the
RAM 1108 when such programs are selected for execution. The ROM
1106 is used to store instructions and perhaps data that are read
during program execution. The ROM 1106 is a non-volatile memory
device that typically has a small memory capacity relative to the
larger memory capacity of the secondary storage 1104. The RAM 1108
is used to store volatile data and perhaps to store instructions.
Access to both the ROM 1106 and the RAM 1108 is typically faster
than to the secondary storage 1104.
[0088] At least one embodiment is disclosed and variations,
combinations, and/or modifications of the embodiment(s) and/or
features of the embodiment(s) made by a person having ordinary
skill in the art are within the scope of the disclosure.
Alternative embodiments that result from combining, integrating,
and/or omitting features of the embodiment(s) are also within the
scope of the disclosure. Where numerical ranges or limitations are
expressly stated, such express ranges or limitations should be
understood to include iterative ranges or limitations of like
magnitude falling within the expressly stated ranges or limitations
(e.g., from about 1 to about 10 includes, 2, 3, 4, etc.; greater
than 0.10 includes 0.11, 0.12, 0.13, etc.). For example, whenever a
numerical range with a lower limit, R.sub.1, and an upper limit,
R.sub.u, is disclosed, any number falling within the range is
specifically disclosed. In particular, the following numbers within
the range are specifically disclosed:
R=R.sub.1+k*(R.sub.u-R.sub.1), wherein k is a variable ranging from
1 percent to 100 percent with a 1 percent increment, i.e., k is 1
percent, 2 percent, 3 percent, 4 percent, 7 percent, . . . , 70
percent, 71 percent, 72 percent, . . . , 95 percent, 96 percent, 97
percent, 98 percent, 99 percent, or 100 percent. Moreover, any
numerical range defined by two R numbers as defined in the above is
also specifically disclosed. The use of the term about means
.+-.10% of the subsequent number, unless otherwise stated. Use of
the term "optionally" with respect to any element of a claim means
that the element is required, or alternatively, the element is not
required, both alternatives being within the scope of the claim.
Use of broader terms such as comprises, includes, and having should
be understood to provide support for narrower terms such as
consisting of, consisting essentially of, and comprised
substantially of. Accordingly, the scope of protection is not
limited by the description set out above but is defined by the
claims that follow, that scope including all equivalents of the
subject matter of the claims. Each and every claim is incorporated
as further disclosure into the specification and the claims are
embodiment(s) of the present disclosure. The discussion of a
reference in the disclosure is not an admission that it is prior
art, especially any reference that has a publication date after the
priority date of this application. The disclosure of all patents,
patent applications, and publications cited in the disclosure are
hereby incorporated by reference, to the extent that they provide
exemplary, procedural, or other details supplementary to the
disclosure.
[0089] While several embodiments have been provided in the present
disclosure, it may be understood that the disclosed systems and
methods might be embodied in many other specific forms without
departing from the spirit or scope of the present disclosure. The
present examples are to be considered as illustrative and not
restrictive, and the intention is not to be limited to the details
given herein. For example, the various elements or components may
be combined or integrated in another system or certain features may
be omitted, or not implemented.
[0090] In addition, techniques, systems, subsystems, and methods
described and illustrated in the various embodiments as discrete or
separate may be combined or integrated with other systems, modules,
techniques, or methods without departing from the scope of the
present disclosure. Other items shown or discussed as coupled or
directly coupled or communicating with each other may be indirectly
coupled or communicating through some interface, device, or
intermediate component whether electrically, mechanically, or
otherwise. Other examples of changes, substitutions, and
alterations are ascertainable by one skilled in the art and may be
made without departing from the spirit and scope disclosed
herein.
* * * * *