U.S. patent application number 12/723287 was filed with the patent office on 2010-07-01 for method and apparatus for video compression.
This patent application is currently assigned to TANDBERG TELECOM AS. Invention is credited to Gisle Bjontegaard, Tom-Ivar JOHANSEN.
Application Number | 20100166059 12/723287 |
Document ID | / |
Family ID | 19914786 |
Filed Date | 2010-07-01 |
United States Patent
Application |
20100166059 |
Kind Code |
A1 |
JOHANSEN; Tom-Ivar ; et
al. |
July 1, 2010 |
METHOD AND APPARATUS FOR VIDEO COMPRESSION
Abstract
A unified solution to coding/decoding of different video formats
such as 4:2:0, 4:2:2 and 4:4:4 is provided. A method of video
coding includes transforming a first m.times.n macro block of
residual chrominance pixel values of moving pictures by a first
integer-transform function generating a corresponding second
m.times.n macro block of integer-transform coefficients, further
transforming DC values of the integer-transform coefficients by a
second integer-transform function to generate a third block of
integer-transformed DC coefficients. The method further includes
generating the second m.times.n macro block of integer-transform
coefficients by utilizing a k.times.k integer-transform function on
each k.times.k sub-block of the first m.times.n macro block,
wherein n and m are each a multiple of k, and generating the third
block of coefficients by utilizing a second ixj integer-transform
function on the DC values resulting in a (m/k).times.(n/k) third
block of integer-transformed DC coefficients.
Inventors: |
JOHANSEN; Tom-Ivar; (Oslo,
NO) ; Bjontegaard; Gisle; (Oppegard, NO) |
Correspondence
Address: |
OBLON, SPIVAK, MCCLELLAND MAIER & NEUSTADT, L.L.P.
1940 DUKE STREET
ALEXANDRIA
VA
22314
US
|
Assignee: |
TANDBERG TELECOM AS
Lysaker
NO
|
Family ID: |
19914786 |
Appl. No.: |
12/723287 |
Filed: |
March 12, 2010 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10844054 |
May 12, 2004 |
7684489 |
|
|
12723287 |
|
|
|
|
Current U.S.
Class: |
375/240.2 ;
375/E7.226 |
Current CPC
Class: |
H04N 19/593 20141101;
H04N 19/60 20141101; H04N 19/186 20141101; H04N 19/85 20141101 |
Class at
Publication: |
375/240.2 ;
375/E07.226 |
International
Class: |
H04N 7/30 20060101
H04N007/30 |
Foreign Application Data
Date |
Code |
Application Number |
May 22, 2003 |
NO |
20032319 |
Claims
1. A method of video coding comprising: transforming a first
m.times.n macro block of residual chrominance pixel values of
moving pictures by a first integer-transform function, thereby
generating a corresponding second m.times.n macro block of
integer-transform coefficients, the transforming including
generating the second m.times.n macro block of integer-transform
coefficients by utilizing a k.times.k integer-transform function on
each k.times.k sub-blocks of the first m.times.n macro block,
wherein each of n and m is a multiple of k; and transforming DC
values of the integer-transform coefficients by a second
integer-transform function, thereby generating a third block of
integer-transformed DC coefficients, the transforming including
generating the third block of coefficients by utilizing a second
i.times.j integer-transform function on the DC values resulting in
a (m/k).times.(n/k) third block of integer-transformed DC
coefficients.
2. A method of video decoding comprising: transforming a first
block of integer-transformed DC coefficients by a first inverse
integer-transform function, thereby generating a number of DC
values of a first m.times.n macro block of integer-transform
coefficients, the transforming including generating the number of
DC values of the first m.times.n macro block of integer-transform
coefficients by utilizing a first i.times.j inverse
integer-transform function on the first block of
integer-transformed DC coefficients; and transforming the first
m.times.n macro block of integer-transform coefficients by a second
inverse integer-transform function, thereby generating a second
m.times.n macro block of residual chrominance pixel values of
moving pictures, the transforming including generating the second
m.times.n macro block of residual chrominance pixel values by
utilizing a k.times.k inverse integer-transform function on each
k.times.k sub-blocks of the first m.times.n macro block of
integer-transform coefficients, wherein n and m is a multiple of k,
and the first block of integer-transformed DC coefficients is of
the size (m/k).times.(n/k).
Description
CROSS-REFERENCE TO RELATED APPLICATION
[0001] The present application is a continuation application of
U.S. Ser. No. 10/844,054 filed May 12, 2004, which is based upon
and claims the benefit of priority to Norwegian Application No.
20032319, filed May 22, 2003, each of which is incorporated herein
by reference.
BACKGROUND OF THE INVENTION
[0002] Transmission of moving pictures in real-time is employed in
several applications like e.g. video conferencing, net meetings, TV
broadcasting and video telephony.
[0003] However, representing moving pictures requires bulk
information as digital video typically is described by representing
each pixel in a picture with 8 bits (1 Byte) or more. Such
uncompressed video data results in large bit volumes, and cannot be
transferred over conventional communication networks and
transmission lines in real time due to limited bandwidth.
[0004] Thus enabling real time video transmission requires a large
extent of data compression. Data compression may, however,
compromise with picture quality. Therefore, great efforts have been
made to develop compression techniques allowing real time
transmission of high quality video over bandwidth limited data
connections.
[0005] In video compression systems, the main goal is to represent
the video information with as little capacity as possible. Capacity
is defined with bits, either as a constant value or as bits/time
unit. In both cases, the main goal is to reduce the number of
bits.
[0006] The most common video coding method is described in the
MPEG* and H.26* standards. The video data undergo four main
processes before transmission, namely prediction, transformation,
quantization and entropy coding.
[0007] The prediction process significantly reduces the amount of
bits required for each picture in a video sequence to be
transferred. It takes advantage of the similarity of parts of the
sequence with other parts of the sequence. Since the predictor part
is known to both encoder and decoder, only the difference has to be
transferred. This difference typically requires much less capacity
for its representation. The prediction is mainly based on vectors
representing movements. The prediction process is typically
performed on square block sizes (e.g. 16.times.16 pixels).
[0008] Note that in some cases, such as in H.264/AVC, predictions
of pixels based on the adjacent pixels in the same picture rather
than pixels of preceding pictures are used. This is referred to as
intra prediction, as opposed to inter prediction. In H.264/AVC,
there are many different modes for doing such prediction both for
luminance blocks and chrominance blocks. One of the prediction
modes is called DC-prediction. It predicts all pixels in a block to
have the same value. When we take into account the characteristics
of the particular transform that is used for residual coding it
means that only the DC coefficient of the residual block data is
changed compared to transformation of the block data without
prediction. All AC-coefficients are unchanged. For this reason the
prediction mode is named DC-prediction.
[0009] The residual represented as a block of data (e.g. 4.times.4
pixels) still contains internal correlation. A well-known method of
taking advantage of this is to perform a two dimensional block
transform. In H.263 an 8.times.8 Discrete Cosine Transform (DCT) is
used, whereas H.264 uses a 4.times.4 integer type transform. This
transforms 4.times.4 pixels into 4.times.4 transform coefficients
and they can usually be represented by fewer bits than the pixel
representation. Transform of a 4.times.4 array of pixels with
internal correlation will probability result in a 4.times.4 block
of transform coefficients with much fewer non-zero values than the
original 4.times.4 pixel block.
[0010] A macro block is a part of the picture consisting of several
sub blocks for luminance (luma) as well as for chrominance
(chroma).
[0011] There are typically two chrominance components (Cr, Cb) with
half the resolution both horizontally and vertically compared with
luminance. This is in contrast to for instance RGB (red, green,
blue) which is typically the representation used in the camera
sensor and the monitor display.
[0012] From the patent literature there are examples disclosing
video encoding/decoding and methods of compression. In particular
the patent U.S. Pat. No. 6,256,347 B1 (Yu et al.) should be
mentioned, which discloses an image processor that receives
prediction error values from decompressed MPEG coded digital video
signals in the form of pixel blocks containing luminance and
chrominance data in a 4:2:2 or 4:2:0 format and recompresses the
pixel blocks to a predetermined resolution. Luminance and
chrominance data are processed with different compression laws
during recompression. Luminance data are recompressed to an average
of six bits per pixel, whereas chrominance data are recompressed to
an average of four bits per pixel. Thus Yu et al. discloses a
method for bit compression of data on 4:2:2 and 4:2:0 formats, and
hence is not a general method applying to a plurality of
formats.
[0013] Further it should be mentioned that US 2003/0043921 A1
(Dufour et al.) discloses a method for video encoding applied to an
input signal which includes a sequence of frames represented by a
luminance matrix and two chrominance matrices.
[0014] Most video coding standards are mainly designed for 4:2:0.
MPEG2 professional profile covers 4:2:2 using a special chrominance
block arrangement. The same is true for H.263. Generally this means
that each format needs a special solution.
SUMMARY The invention is related to handling various picture
resolution in an extended version of the compression standard
H.264/AVC or other similar standards.
[0015] The present invention provides a unified solution to
coding/decoding of different video formats such as 4:2:0, 4:2:2 and
4:4:4.
[0016] In particular, the present invention provides a method of
video coding for transforming a first m.times.n macro block of
residual chrominance pixel values of moving pictures by a first
integer-transform function generating a corresponding second
m.times.n macro block of integer-transform coefficients, then
further transforming DC values of the integer-transform
coefficients by a second integer-transform function to generate a
third block of integer-transformed DC coefficients, wherein the
method further includes the steps of generating the second
m.times.n macro block of integer-transform coefficients by
utilizing a k.times.k integer-transform function on each k.times.k
sub-block of the first m.times.n macro block, wherein n and m are
each a multiple of k, and then generating the third block of
coefficients by utilizing a second ixj integer-transform function
on the DC values resulting in a (m/k).times.(n/k) third block of
integer-transformed DC coefficients.
[0017] The present invention also provides a method of video
decoding, being an inverted version of the method of video
coding.
BRIEF DESCRIPTION OF THE DRAWINGS
[0018] The foregoing and other objects, features and advantages of
the invention will be apparent from the following more particular
description of preferred embodiments of the invention, as
illustrated in the accompanying drawings in which like reference
characters refer to the same parts throughout the different views.
The drawings are not necessarily to scale, emphasis instead being
placed upon illustrating the principles of the invention.
[0019] FIG. 1 shows how a macro block with the 4:2:0 format of
16.times.16 luma pixels and two chroma components with 8.times.8
pixels each are divided into 4.times.4 blocks which in turn are
arranged in subgroups of four 4.times.4 blocks. It is also shown
how DC coefficients are extracted from each of 4 chroma blocks to
form separate chroma dc elements consisting of 2.times.2
blocks.
[0020] FIG. 2 shows one component of chroma pixels in a macro block
of different picture formats.
[0021] FIG. 3 shows a second level transform of DC values for
different formats. FIG. 4 indicates the basis of a DC prediction of
an 8.times.16 block.
DETAILED DESCRIPTION
[0022] The present invention provides an extension of the H.264/AVC
video coding standard to include formats like the above-described
4:2:2 and 4:4:4. The method is based on the way chrominance is
already treated in H.264/AVC. A macroblock consist of a part of the
picture with 16x16 luminance pixels and two chrominance components
with 8.times.8 pixels each. This is illustrated in FIG. 1.
[0023] The description is mainly related to the encoding process.
However, this has implications to how decoding must be performed.
This means for instance that if transformation is performed on two
levels at the encoder, the decoder must perform inverse
transformation on two levels. Generally the word "coding" is often
used as a short expression to include the whole process of encoding
and decoding. The invention covers the whole coding process which
is defined to contain both encoding and decoding.
[0024] As noted, FIG. 1 shows the macro block consists of
16.times.16 luminance pixels and two chrominance components with
8.times.8 pixels each. Each of the components is further broken
down into 4.times.4 blocks, which are represented by the small
squares. For coding purposes, both luma and chroma 4.times.4 blocks
are grouped together in 8.times.8 sub blocks and designated Y0-Y3
and Cr, Cb. The chroma part of this format is in some contexts
denoted as 4:2:0, and is shown to the left in FIG. 2. The
abbreviation is not very self-explanatory. It means that the
chrominance has half the resolution of luminance horizontally as
well as vertically. For the conventional video format CIF, this
means that a luminance frame has 352.times.288 pixels whereas each
of the chrominance components has 176.times.144 pixels.
[0025] In an alternative format, denoted 4:2:2 and shown in the
middle part of FIG. 2, chrominance has half of the luminance
resolution in the horizontal direction and the same resolution as
luminance in the vertical direction. This format is typically used
for high quality interlaced TV signals where the interlace
structure causes some challenges for use of half chrominance
resolution vertically.
[0026] In still an alternative format, denoted 4:4:4 and shown to
the right in FIG. 2, that luminance and chrominance signals have
the same resolution both in the horizontal and vertical direction.
One typical area of application is graphics material where colors
are used in a way such that it is desirable to have the same
resolution for chrominance as for luminance.
[0027] The first aspect of the present invention is related to the
coding describing the residual signal. In H.264/AVC the chrominance
residual signal is described with two level transforms. The 4:2:0
box in FIG. 2 indicates that the 8.times.8 pixel chrominance block
is divided into 4.times.4 pixel sub-blocks. The residual signal in
each of the 4.times.4 sub- blocks undergo a 4.times.4
transformation resulting in one DC coefficient and 15 AC
coefficients. The DC coefficient represents the average value over
the 4.times.4 block.
[0028] According to the first aspect of the present invention, the
4.times.4 block size of the first transform of the chrominance
residual signal is maintained. The number of such sub-blocks will
then be different for the different picture formats. In a general
denotation, a k.times.k transform is used on a macro block of
m.times.n (m in the horizontal direction, n in the vertical
direction) chrominance pixels.
[0029] A further transformation of the DC coefficients of each of
the 4.times.4 blocks undergo a 2.times.2 transform as indicated in
FIG. 3. In the general case, an i.times.j transform for the DC
coefficients is used. i and j will have values such that
i.times.k=horizontal number of chrominance pixels in a macroblock
and j.times.k=vertical number of chrominance pixels in a
macroblock. The transform type is preferably chosen to be
two-dimensional Hadamard transform.
[0030] The present invention also relates to intra prediction part
of the coding. In a preferred embodiment of the invention,
DC-prediction for the 4:2:2 format is provided. DC-prediction
predicts one value for a whole block. In this case we want to
predict one value for all the pixels in an 8.times.16 block from
the neighboring, already coded and decoded pixels. This is
indicated in FIG. 4 where the 8.times.16 shall be predicted from
the 24 neighboring pixels in bold.
[0031] A natural prediction would be to take the average of all 24
bold pixels:
[0032] Prediction=Sum(24 neighboring pixels)/24
[0033] However, it is desirable to avoid the division by 24.
Therefore we use the following definition:
[0034] Prediction=(2.times.Sum(8 pixels above)+Sum(16 pixels to the
left))/32
[0035] In this way, the division by 32 can easily be implemented
with a shift operation.
[0036] To take advantage of the shift operation in the general
case, the DC-prediction has to be executed on rectangular blocks of
size 2.sup.q.times.2.sup.r where q and r are integers; q>r; q is
defined to represent a first dimension of the block; and r is
defined to represent a second dimension of the block. The first
dimension may represent the vertical size and the second dimension
may represent the horizontal size of the block or visa versa. DC
prediction of the block is formed as:
[0037] Prediction=(Sum(neighboring pixels to the first
dimension)+2(.sup.q-r).times.(Sum(neighboring pixels to the second
dimension))/2.sup.q+1
[0038] It follows from the discussion above that m=2.sup.q and
n=2.sup.r.
[0039] With the present invention, the first level transform is
kept unchanged in the sense that the chrominance pixels of a
macroblock is divided into 4.times.4 subblocks as indicated in
4:2:2 and 4:4:4 of FIG. 2 and each subblock undergo a 4.times.4
transform. The second level transform of DC coefficients will be of
size 2.times.4 and 4.times.4 for the two higher formats as depicted
in FIG. 3. Hence the main difference between coding the different
formats is the second order residual chrominance transform.
[0040] Note that the scope of the present invention is not limited
to the H.264/AVC. It could advantageously also be utilized in
connection with other video coding standards like e.g. SIP.
[0041] While this invention has been particularly shown and
described with references to preferred embodiments thereof, it will
be understood by those skilled in the art that various changes in
form and details may be made therein without departing from the
scope of the invention encompassed by the appended claims.
* * * * *