U.S. patent application number 14/184688 was filed with the patent office on 2014-09-11 for video coding method using at least evaluated visual quality and related video coding apparatus.
This patent application is currently assigned to MEDIATEK INC.. The applicant listed for this patent is MEDIATEK INC.. Invention is credited to Ding-Yun Chen, Cheng-Tsai Ho, Chi-Cheng Ju.
Application Number | 20140254659 14/184688 |
Document ID | / |
Family ID | 51487782 |
Filed Date | 2014-09-11 |
United States Patent
Application |
20140254659 |
Kind Code |
A1 |
Ho; Cheng-Tsai ; et
al. |
September 11, 2014 |
VIDEO CODING METHOD USING AT LEAST EVALUATED VISUAL QUALITY AND
RELATED VIDEO CODING APPARATUS
Abstract
One exemplary video coding method includes at least the
following steps: utilizing a visual quality evaluation module for
evaluating visual quality based on data involved in a coding loop;
and referring to at least the evaluated visual quality for deciding
a target configuration of at least one of a coding unit, a
transform unit and a prediction unit. Another exemplary video
coding method includes at least the following steps: utilizing a
visual quality evaluation module for evaluating visual quality
based on data involved in a coding loop; and referring to at least
the evaluated visual quality for deciding a target coding parameter
associated with at least one of a coding unit, a transform unit and
a prediction unit in video coding.
Inventors: |
Ho; Cheng-Tsai; (Taichung
City, TW) ; Ju; Chi-Cheng; (Hsinchu City, TW)
; Chen; Ding-Yun; (Taipei City, TW) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
MEDIATEK INC. |
Hsin-Chu |
|
TW |
|
|
Assignee: |
MEDIATEK INC.
Hsin-Chu
TW
|
Family ID: |
51487782 |
Appl. No.: |
14/184688 |
Filed: |
February 19, 2014 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61776053 |
Mar 11, 2013 |
|
|
|
Current U.S.
Class: |
375/240.02 ;
375/240.12; 375/240.13; 375/240.16 |
Current CPC
Class: |
H04N 19/154 20141101;
H04N 19/117 20141101; H04N 19/82 20141101; H04N 19/46 20141101;
H04N 19/18 20141101; H04N 19/196 20141101 |
Class at
Publication: |
375/240.02 ;
375/240.12; 375/240.16; 375/240.13 |
International
Class: |
H04N 19/105 20060101
H04N019/105; H04N 19/18 20060101 H04N019/18; H04N 19/196 20060101
H04N019/196; H04N 19/139 20060101 H04N019/139 |
Claims
1. A video coding method, comprising: utilizing a visual quality
evaluation module for evaluating visual quality based on data
involved in a coding loop; and referring to at least the evaluated
visual quality for deciding a target configuration of at least one
of a coding unit, a transform unit and a prediction unit.
2. The video coding method of claim 1, wherein the data involved in
the coding loop is raw data of a source frame.
3. The video coding method of claim 1, wherein the data involved in
the coding loop is processed data derived from raw data of a source
frame.
4. The video coding method of claim 3, wherein the processed data
includes transformed coefficients, quantized coefficients,
reconstructed pixel data, motion-compensated pixel data, or
intra-predicted pixel data.
5. The video coding method of claim 1, wherein the evaluated visual
quality is derived from checking at least one image characteristic
that affects human visual perception, and the at least one image
characteristic includes sharpness, noise, blur, edge, dynamic
range, blocking artifact, mean intensity, color temperature, scene
composition, human face, animal presence, image content that
attracts more or less interest, spatial masking, temporal masking,
or frequency masking.
6. The video coding method of claim 1, wherein the step of
evaluating the visual quality comprises: calculating a single
visual quality metric according to the data involved in the coding
loop; and determining each evaluated visual quality solely based on
the single visual quality metric.
7. The video coding method of claim 1, wherein the step of
evaluating the visual quality comprises: calculating a plurality of
distinct visual quality metrics according to the data involved in
the coding loop; and determining each evaluated visual quality
based on the distinct visual quality metrics.
8. The video coding method of claim 7, wherein the step of
determining each evaluated visual quality based on the distinct
visual quality metrics comprises: determining a plurality of
weighting factors; and determining each evaluated visual quality by
combining the distinct visual quality metrics according to the
weighting factors.
9. The video coding method of claim 8, wherein the weighting
factors are determined by training.
10. The video coding method of claim 1, wherein the step of
deciding the target configuration of at least one of the coding
unit, the transform unit and the prediction unit comprises:
deciding a best mode from different intra modes of the coding unit;
deciding a best mode from different inter modes of the coding unit;
or deciding that the coding unit is an intra-mode coding unit or an
inter-mode coding unit.
11. The video coding method of claim 1, wherein the step of
deciding the target configuration of at least one of the coding
unit, the transform unit and the prediction unit comprises:
deciding a size of the prediction unit; or deciding that the
prediction unit is a symmetric prediction unit or an asymmetric
prediction unit.
12. The video coding method of claim 1, wherein the step of
deciding the target configuration of at least one of the coding
unit, the transform unit and the prediction unit comprises:
deciding a size of the transform unit; deciding a quad-tree depth
of the transform unit; or deciding that the transform unit is a
residual quad-tree (RQT) transform unit or a non-square quad-tree
(NSQT) transform unit.
13. The video coding method of claim 1, further comprising:
calculating pixel-based distortion based on at least a portion of
raw data of a source frame and at least a portion of processed data
derived from the raw data of the source frame; wherein the step of
deciding the target configuration of at least one of the coding
unit, the transform unit and the prediction unit comprises:
deciding the target configuration of at least one of the coding
unit, the transform unit and the prediction unit according to the
evaluated visual quality and the pixel-based distortion.
14. The video coding method of claim 13, wherein the step of
deciding the target configuration of at least one of the coding
unit, the transform unit and the prediction unit according to the
evaluated visual quality and the pixel-based distortion comprises:
performing a coarse decision according to one of the evaluated
visual quality and the pixel-based distortion to determine a
plurality of coarse configuration settings; and performing a fine
decision according to another of the evaluated visual quality and
the pixel-based distortion to determine at least one fine
configuration setting from the coarse configuration settings,
wherein the target configuration is derived from the at least one
fine configuration setting.
15. A video coding method, comprising: utilizing a visual quality
evaluation module for evaluating visual quality based on data
involved in a coding loop; and referring to at least the evaluated
visual quality for deciding a target coding parameter associated
with at least one of a coding unit, a transform unit and a
prediction unit in video coding.
16. The video coding method of claim 15, wherein the target coding
parameter is a quantization parameter or a transform parameter.
17. The video coding method of claim 15, wherein the data involved
in the coding loop is raw data of a source frame.
18. The video coding method of claim 15, wherein the data involved
in the coding loop is processed data derived from raw data of a
source frame.
19. The video coding method of claim 18, wherein the processed data
includes transformed coefficients, quantized coefficients,
reconstructed pixel data, motion-compensated pixel data, or
intra-predicted pixel data.
20. The video coding method of claim 15, wherein the evaluated
visual quality is derived from checking at least one image
characteristic that affects human visual perception, and the at
least one image characteristic includes sharpness, noise, blur,
edge, dynamic range, blocking artifact, mean intensity, color
temperature, scene composition, human face, animal presence, image
content that attracts more or less interest, spatial masking,
temporal masking, or frequency masking.
21. The video coding method of claim 15, wherein the step of
evaluating the visual quality comprises: calculating a single
visual quality metric according to the data involved in the coding
loop; and determining each evaluated visual quality solely based on
the single visual quality metric.
22. The video coding method of claim 15, wherein the step of
evaluating the visual quality comprises: calculating a plurality of
distinct visual quality metrics according to the data involved in
the coding loop; and determining each evaluated visual quality
based on the distinct visual quality metrics.
23. The video coding method of claim 22, wherein the step of
determining each evaluated visual quality based on the distinct
visual quality metrics comprises: determining a plurality of
weighting factors; and determining each evaluated visual quality by
combining the distinct visual quality metrics according to the
weighting factors.
24. The video coding method of claim 23, wherein the weighting
factors are determined by training.
25. The video coding method of claim 15, further comprising:
calculating pixel-based distortion based on at least a portion of
raw data of a source frame and at least a portion of processed data
derived from the raw data of the source frame; wherein the step of
deciding the target coding parameter comprises: deciding the target
coding parameter according to the evaluated visual quality and the
pixel-based distortion.
26. The video coding method of claim 25, wherein the step of
deciding the target coding parameter according to the evaluated
visual quality and the pixel-based distortion comprises: performing
a coarse decision according to one of the evaluated visual quality
and the pixel-based distortion to determine a plurality of coarse
parameter settings; and performing a fine decision according to
another of the evaluated visual quality and the pixel-based
distortion to determine at least one fine parameter setting from
the coarse parameter settings, wherein the target coding parameter
is derived from the at least one fine parameter setting.
27. The video coding method of claim 15, wherein the target coding
parameter is included in a bitstream generated by encoding a source
frame.
28. A video coding apparatus, comprising: a visual quality
evaluation module, arranged to evaluate visual quality based on
data involved in a coding loop; and a coding circuit, comprising
the coding loop, the coding circuit arranged to refer to at least
the evaluated visual quality for deciding a target configuration of
at least one of a coding unit, a transform unit and a prediction
unit.
29. A video coding apparatus, comprising: a visual quality
evaluation module, arranged to evaluate visual quality based on
data involved in a coding loop; and a coding circuit, comprising
the coding loop, the coding circuit arranged to refer to at least
the evaluated visual quality for deciding a target coding parameter
associated with at least one of a coding unit, a transform unit and
a prediction unit in video coding.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. provisional
application No. 61/776,053, filed on Mar. 11, 2013 and incorporated
herein by reference.
BACKGROUND
[0002] The disclosed embodiments of the present invention relate to
video coding, and more particularly, to a video coding method using
at least evaluated visual quality determined by one or more visual
quality metrics and a related video coding apparatus.
[0003] The conventional video coding standards generally adopt a
block based (or coding unit based) coding technique to exploit
spatial redundancy. For example, the basic approach is to divide
the whole source frame into a plurality of blocks (coding units),
perform prediction on each block (coding unit), transform residues
of each block (coding unit) using discrete cosine transform, and
perform quantization and entropy encoding. Besides, a reconstructed
frame is generated in a coding loop to provide reference pixel data
used for coding following blocks (coding units). For certain video
coding standards, in-loop filter(s) may be used for enhancing the
image quality of the reconstructed frame. For example, a
de-blocking filter is included in an H.264 coding loop, and a
de-blocking filter and a sample adaptive offset (SAO) filter are
included in an HEVC (High Efficiency Video Coding) coding loop.
[0004] Generally speaking, the coding loop is composed of a
plurality of processing stages, including transform, quantization,
intra/inter prediction, etc. Based on the conventional video coding
standards, one processing stage selects a video coding mode based
on pixel-based distortion value derived from a source frame (i.e.,
an input frame to be encoded) and a reference frame (i.e., a
reconstructed frame generated during the coding procedure). For
example, the pixel-based distortion value may be a sum of absolute
differences (SAD), a sum of transformed differences (SATD), or a
sum of square differences (SSD). However, the pixel-based
distortion value merely considers pixel value differences between
pixels of the source frame and the reference frame, and sometimes
is not correlated to the actual visual quality of a reconstructed
frame generated from decoding an encoded frame. Specifically, based
on experimental results, different processed images, each derived
from an original image and having the same pixel-based distortion
(e.g., the same mean square error (MSE)) with respect to the
original image, may present different visual quality to a viewer.
That is, the smaller pixel-based distortion does not mean better
visual quality in the human visual system. Hence, an encoded frame
generated based on video coding modes each selected due to a
smallest pixel-based distortion value does not guarantee that a
reconstructed frame generated from decoding the encoded frame would
have the best visual quality.
SUMMARY
[0005] In accordance with exemplary embodiments of the present
invention, a video coding method using at least evaluated visual
quality obtained by one or more visual quality metrics and a
related video coding apparatus are proposed.
[0006] According to a first aspect of the present invention, an
exemplary video coding method is disclosed. The exemplary video
coding method includes at least the following steps: utilizing a
visual quality evaluation module for evaluating visual quality
based on data involved in a coding loop; and referring to at least
the evaluated visual quality for deciding a target configuration of
at least one of a coding unit, a transform unit and a prediction
unit.
[0007] According to a second aspect of the present invention,
another exemplary video coding method is disclosed. The another
exemplary video coding method includes at least the following
steps: utilizing a visual quality evaluation module for evaluating
visual quality based on data involved in a coding loop; and
referring to at least the evaluated visual quality for deciding a
target coding parameter associated with at least one of a coding
unit, a transform unit and a prediction unit in video coding.
[0008] According to a third aspect of the present invention, an
exemplary video coding apparatus is disclosed. The exemplary video
coding apparatus includes a visual quality evaluation module and a
coding circuit. The visual quality evaluation module is arranged to
evaluate visual quality based on data involved in a coding loop.
The coding circuit has the coding loop included therein, and is
arranged to refer to at least the evaluated visual quality for
deciding a target configuration of at least one of a coding unit, a
transform unit and a prediction unit.
[0009] According to a fourth aspect of the present invention,
another exemplary video coding apparatus is disclosed. The another
exemplary video coding apparatus includes a visual quality
evaluation module and a coding circuit. The visual quality
evaluation module is arranged to evaluate visual quality based on
data involved in a coding loop. The coding circuit has the coding
loop included therein, and is arranged to refer to at least the
evaluated visual quality for deciding a target coding parameter
associated with at least one of a coding unit, a transform unit and
a prediction unit in video coding.
[0010] These and other objectives of the present invention will no
doubt become obvious to those of ordinary skill in the art after
reading the following detailed description of the preferred
embodiment that is illustrated in the various figures and
drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] FIG. 1 is a block diagram illustrating a video coding
apparatus according to an embodiment of the present invention.
[0012] FIG. 2 is a diagram illustrating inter modes for a coding
unit to be configured by the video coding apparatus shown in FIG.
1.
[0013] FIG. 3 is a diagram illustrating intra 4.times.4 modes for a
coding unit to be configured by the video coding apparatus shown in
FIG. 1.
[0014] FIG. 4 is a diagram illustrating intra 16.times.16 modes for
a coding unit to be configured by the video coding apparatus shown
in FIG. 1.
[0015] FIG. 5 is a diagram illustrating partition modes for a
prediction unit to be configured by the video coding apparatus
shown in FIG. 1.
[0016] FIG. 6 is a diagram illustrating partition modes for a
transform unit to be configured by the video coding apparatus shown
in FIG. 1.
[0017] FIG. 7 is a flowchart illustrating a video coding method
according to a first embodiment of the present invention.
[0018] FIG. 8 is a flowchart illustrating a video coding method
according to a second embodiment of the present invention.
DETAILED DESCRIPTION
[0019] Certain terms are used throughout the description and
following claims to refer to particular components. As one skilled
in the art will appreciate, manufacturers may refer to a component
by different names. This document does not intend to distinguish
between components that differ in name but not function. In the
following description and in the claims, the terms "include" and
"comprise" are used in an open-ended fashion, and thus should be
interpreted to mean "include, but not limited to . . . ". Also, the
term "couple" is intended to mean either an indirect or direct
electrical connection. Accordingly, if one device is coupled to
another device, that connection may be through a direct electrical
connection, or through an indirect electrical connection via other
devices and connections.
[0020] The concept of the present invention is to incorporate
characteristics of a human visual system into a video coding
procedure to improve the video compression efficiency or visual
quality. More specifically, visual quality evaluation is involved
in the video coding procedure such that a reconstructed frame
generated from decoding an encoded frame is capable of having
enhanced visual quality. Further details of the proposed visual
quality based video coding design are described as below.
[0021] FIG. 1 is a block diagram illustrating a video coding
apparatus according to an embodiment of the present invention. The
video coding apparatus 100 is used to encode a source frame
IMG.sub.IN to generate a bitstream BS carrying encoded frame
information corresponding to the source frame IMG.sub.IN. In this
embodiment, the video coding apparatus 100 includes a coding
circuit 102 and a visual quality evaluation module 104. By way of
example, but not limitation, the architecture of the coding circuit
102 may be configured based on any conventional video encoding
architecture. It should be noted that the coding circuit 102 may
follow the conventional video encoding architecture to have a
plurality of processing stages implemented therein; however, this
by no means implies that each of the processing stages included in
the coding circuit 102 must be implemented using a conventional
design. For example, any of the processing stages that is
associated with the visual quality evaluation performed by the
visual quality evaluation module 104 and/or is affected/controlled
by the visual quality obtained by the visual quality evaluation
module 104 still falls within the scope of the present
invention.
[0022] As shown in FIG. 1, the coding circuit 102 includes a coding
loop composed of a splitting module 111, a subtractor (i.e., an
adder configured to perform a subtraction operation) 112, a
transform module 113, a quantization module 114, an inverse
quantization module 116, an inverse transform module 117, an adder
118, a de-blocking filter 119, a sample adaptive offset (SAO)
filter 120, a frame buffer 121, an inter prediction module 122, and
an intra prediction module 123, where the inter prediction module
122 includes a motion estimation unit 124 and a motion compensation
unit 125. The coding circuit 102 further includes an entropy coding
module 115 arranged to generate the bitstream BS by performing
entropy encoding upon quantized coefficients generated from the
quantization module 114. It should be noted that one or both of the
de-blocking filter 119 and the SAO filter 120 may be
omitted/bypassed for certain applications. That is, the de-blocking
filter 119 and/or the SAO filter 120 may be optional, depending
upon actual design requirement. As a person skilled in the
pertinent art should readily understand fundamental operations of
the processing stages included in the coding circuit 102, further
description is omitted here for brevity. Concerning one or more of
the processing stages that are affected/controlled by the visual
quality determined by the visual quality evaluation module 104,
further description will be given as below.
[0023] The key feature of the present invention is using the visual
quality evaluation module 104 for evaluating visual quality based
on data involved in the coding loop of the coding circuit 102. In
one embodiment, the data involved in the coding loop and processed
by the visual quality evaluation module 104 may be raw data of the
source frame IMG.sub.IN. In another embodiment, the data involved
in the coding loop and processed by the visual quality evaluation
module 104 may be processed data derived from raw data of the
source frame IMG.sub.IN. For example, the processed data used for
evaluate the visual quality may be transformed coefficients
generated by the transform module 113, quantized coefficients
generated by the quantization module 114, reconstructed pixel data
before the optional de-blocking filter 119, reconstructed pixel
data after the optional de-blocking filter 119, reconstructed pixel
data before the optional SAO filter 120, reconstructed pixel data
after the optional SAO filter 120, reconstructed pixel data stored
in the frame buffer 121, motion-compensated pixel data generated by
the motion compensation unit 125, or intra-predicted pixel data
generated by the intra prediction module 123.
[0024] The visual quality evaluation performed by the visual
quality evaluation module 104 may calculate one or more visual
quality metrics to decide the evaluated visual quality. For
example, the evaluated visual quality is derived from checking at
least one image characteristic that affects human visual
perception, and the at least one image characteristic may include
sharpness, noise, blur, edge, dynamic range, blocking artifact,
mean intensity (e.g., brightness/luminance), color temperature,
scene composition (e.g., landscape, portrait, night scene, etc.),
human face, animal presence, image content that attracts more or
less interest (e.g., region of interest (ROI)), spatial masking
(i.e., human's visual insensitivity of more complex texture),
temporal masking (i.e., human's visual insensitivity of high-speed
moving object), or frequency masking (i.e., human's visual
insensitivity of higher pixel value variation). By way of example,
the noise metric may be obtained by calculating an ISO 15739 visual
noise value VN, where
VN=.sigma..sub.L*+0.852.sigma..sub.u*+0.323.sigma..sub.u*
Alternatively, the noise metric may be obtained by calculating
other visual noise metric, such as an S-CIELAB metric, a vSNR
(visual signal-to-noise ratio) metric, or a Keelan NPS (noise power
spectrum) based metric. The sharpness/blur metric may be obtained
by measuring edge widths. The edge metric may be a ringing metric
obtained by measuring ripples or oscillations around edges.
[0025] In one exemplary design, the visual quality evaluation
module 104 calculates a single visual quality metric (e.g., one of
the aforementioned visual quality metrics) according to the data
involved in the coding loop of the coding circuit 102, and
determines each evaluated visual quality solely based on the single
visual quality metric. In other words, each evaluated visual
quality may be obtained by referring to a single visual quality
metric only.
[0026] In another exemplary design, the visual quality evaluation
module 104 calculates a plurality of distinct visual quality
metrics (e.g., many of the aforementioned visual quality metrics)
according to the data involved in the coding loop of the coding
circuit 102, and determines each evaluated visual quality based on
the distinct visual quality metrics. In other words, each evaluated
visual quality may be obtained by referring to a composition of
multiple visual quality metrics. For example, the visual quality
evaluation module 104 maybe configured to assign a plurality of
pre-defined weighting factors to multiple visual quality metrics
(e.g., a noise metric and a sharpness metric), and decide one
evaluated visual quality by a weighted sum derived from the
weighting factors and the visual quality metrics. For another
example, the visual quality evaluation module 104 may employ a
Minkowski equation to determine a plurality of non-linear weighting
factors for the distinct visual quality metrics, respectively; and
then determine one evaluated visual quality by combining the
distinct visual quality metrics according to respective non-linear
weighting factors. Specifically, based on the Minkowski equation,
the evaluated visual quality .DELTA.Q.sub.m is calculated using
following equation:
.DELTA. Q m = ( i ( .DELTA. Q i ) n m ) 1 / n m , where
##EQU00001## n m = 1 + 2 tanh ( ( .DELTA. Q ) max 16.9 ) ,
##EQU00001.2##
.DELTA.Q.sub.i is derived from each of the distinct visual quality
metrics, and 16.9 is a single universal parameter based on
psychophysical experiments. For yet another example, the visual
quality evaluation module 104 may employ a training-based manner
(e.g., a support vector machine (SVM)) to determine a plurality of
trained weighting factors for the distinct visual quality metrics,
respectively; and then determines one evaluated visual quality by
combining the distinct visual quality metrics according to
respective trained weighting factors. Specifically, supervised
learning models with associated learning algorithms are employed to
analyze the distinct visual quality metrics and recognized
patterns, and accordingly determine the trained weighting
factors.
[0027] After the evaluated visual quality is generated by the
visual quality evaluation module 104, the evaluated visual quality
is referenced by the coding circuit 102 to control/configure one or
more of the processing stages within the coding circuit 102. As the
evaluated visual quality is involved in making the video coding
mode decision, the source frame IMG.sub.IN is encoded based on
characteristics of the human visual system to thereby allow a
decoded/reconstructed frame to have enhanced visual quality.
[0028] In a first application, the coding circuit 102 may be
arranged to refer to the evaluated visual quality decided by the
visual quality evaluation 104 for deciding a target configuration
of at least one of a coding unit at the splitting module 111, a
transform unit at the transform module 113 and a prediction unit at
the intra prediction module 123/inter prediction module 122, where
the evaluated visual quality in this case may provide visual
quality information for candidate video coding modes. In an
alternative design, both of the evaluated visual quality (which is
generated based on data involved in the coding loop) and the
pixel-based distortion (which is generated based on at least a
portion of raw data of the source frame IMG.sub.IN and at least a
portion of processed data derived from the raw data of the source
frame IMG.sub.IN) are used to decide the target configuration of at
least one of the coding unit at the splitting module 111, the
transform unit at the transform module 113 and the prediction unit
at the intra prediction module 123/inter prediction module 122,
where the evaluated visual quality in this case may provide visual
quality information for candidate video coding modes, and the
pixel-based distortion in this case may provide distortion
information for candidate video coding mode modes. Further details
are described as below.
[0029] Concerning the splitting module 111, the evaluated visual
quality (each is determined based on a single visual quality metric
or a composition of multiple visual quality metrics) may be
referenced to decide a best video coding mode for a coding unit.
Specifically, the splitting module 111 may refer to evaluated
visual quality determined by the visual quality evaluation module
104 for each candidate inter mode and each candidate intra mode to
decide which one of candidate inter modes and candidate intra modes
should be selected for a coding unit. Taking the video coding modes
for a coding unit as specified in an H.264 standard for example,
the inter modes include four MB-modes and four 8.times.8-modes as
shown in FIG. 2, and the intra modes include nine 4.times.4 modes
as shown in FIG. 3 and four 16.times.16 modes as shown in FIG. 4.
The conventional video coding design calculates pixel-based
distortion value Distortion (C, R) for each candidate mode, where C
represent pixels in a current source frame, R represent pixels in a
reconstructed frame, and the distortion value Distortion (C, R) may
be an SAD value, an SATD value or an SSD value. Next, the
conventional video coding design finds a best inter mode (e.g.,
min 16 .times. 16 , 16 .times. 8 , 8 .times. 16 , 8 .times. 8 , 4
.times. 8 , 8 .times. 4 { Distortion ( C , R ) } ) ,
##EQU00002##
finds a best intra 16.times.16 mode
min 4 '' '' 16 .times. 16 modes { Distortion ( C , R ) } ) ,
##EQU00003##
finds a best intra 4.times.4 mode (e.g.,
min 9 '' '' 4 .times. 4 modes { Distortion ( C , R ) } ) ,
##EQU00004##
finds a best intra mode (e.g., min (best intra 4.times.4 mode, best
intra 16.times.16 mode)), and finally decides a best video coding
mode for the coding unit (e.g., min (best intra mode, best inter
mode)). In contrast to the conventional video coding design, the
present invention proposes using the evaluated visual quality VQ(C
or R') derived from data involved in the coding loop of the coding
unit 102 to find the best video coding mode for a coding unit,
where each evaluated visual quality VQ(C or R') for each candidate
mode may be obtained by a single visual quality metric or a
composition of multiple visual quality metrics, C represents raw
data of the source frame IMG.sub.IN, and R' represents processed
data derived from raw data of the source frame IMG.sub.IN.
[0030] Preferably, the splitting module 111 finds a best inter mode
(e.g.,
best 16 .times. 16 , 16 .times. 8 , 8 .times. 16 , 8 .times. 8 , 4
.times. 8 , 8 .times. 4 { VQ ( C or R ' ) } ) , ##EQU00005##
finds a best intra 16.times.16 mode (e.g.,
best 4 '' '' 16 .times. 16 modes { VQ ( C or R ' ) } ) ,
##EQU00006##
finds a best intra 4.times.4 mode (e.g.,
best 9 '' '' 4 .times. 4 modes { VQ ( C or R ' ) } ) ,
##EQU00007##
finds a best intra mode (e.g., min (best intra 4.times.4 mode, best
intra 16.times.16 mode)), and finally determines a best video
coding mode for the coding unit (e.g., min (best intra mode, best
inter mode)). Briefly summarized, the operation of referring to the
evaluated visual quality for deciding the target configuration of
the coding unit at the splitting module 111 may include: deciding a
best mode from different intra modes of the coding unit; deciding a
best mode from different inter modes of the coding unit; and/or
deciding that the coding unit is an intra-mode coding unit or an
inter-mode coding unit.
[0031] Alternatively, both of the evaluated visual quality (e.g.,
VQ(C or R')) and the pixel-based distortion (e.g., Distortion (C,
R)) maybe involved in deciding the best video coding mode for a
coding unit. For example, the splitting module 111 refers to the
evaluated visual quality and the calculated pixel-based distortion
to find a best inter mode (e.g., one of
best 16 .times. 16 , 16 .times. 8 , 8 .times. 16 , 8 .times. 8 , 4
.times. 8 , 8 .times. 4 { VQ ( C or R ' ) } and min 16 .times. 16 ,
16 .times. 8 , 8 .times. 16 , 8 .times. 8 , 4 .times. 8 , 8 .times.
4 { Distortion ( C , R ) } ) , ##EQU00008##
find a best intra 16.times.16 mode (e.g., one of
best 4 '' '' 16 .times. 16 modes { VQ ( C or R ' ) } and min 4 ''
'' 16 .times. 16 modes { Distortion ( C , R ) } ) ,
##EQU00009##
find a best intra 4.times.4 mode (e.g., one of
best 9 '' '' 4 .times. 4 modes { VQ ( C or R ' ) } and min 9 '' ''
4 .times. 4 modes { Distortion ( C , R ) } ) , ##EQU00010##
find a best intra mode (e.g., min (best intra 4.times.4 mode, best
intra 16.times.16 mode)), and finally determine a best video coding
mode for the coding unit (e.g., min (best intra mode, best inter
mode)).
[0032] For another example, the splitting module 111 performs a
coarse decision according to one of the evaluated visual quality
and the calculated pixel-based distortion to determine a plurality
of coarse configuration settings for a coding unit, and performs a
fine decision according to another of the evaluated visual quality
and the pixel-based distortion to determine at least one fine
configuration setting from the coarse configuration settings,
wherein the target configuration for the coding unit is derived
from the at least one fine configuration setting. Taking the video
coding modes for the coding unit as specified in an H.264 standard
for example, the evaluated visual quality may be used to find M
coarse configuration settings for a coding unit, such as some inter
modes, some intra 4.times.4 modes and/or some intra 16.times.16
modes, from all possible N candidate configuration settings, and
then the pixel-based distortion may be used to selected P fine
configuration settings for the coding unit (N>M &
M>P.gtoreq.1) from inter modes, intra 4.times.4 modes and/or
intra 16.times.16 modes selected based on the evaluated visual
quality. In a case where P=1, a best video coding mode for the
coding unit is determined by the fine decision based on the
pixel-based distortion. In an alternative design, the pixel-based
distortion may be used to find M coarse configuration settings for
a coding unit, such as some inter modes, some intra 4.times.4 modes
and/or some intra 16.times.16 modes, from all possible N candidate
configuration settings, and then the evaluated visual quality may
be used to selected P fine configuration settings for the coding
unit (N>M & M>P.gtoreq.1) from inter modes, intra
4.times.4 modes and/or intra 16.times.16 modes selected based on
the pixel-based distortion. In a case where P=1, a best video
coding mode for the coding unit is determined by the fine decision
based on the evaluated visual quality.
[0033] Concerning the intra prediction module 123/inter prediction
module 122, the evaluated visual quality (each is determined based
on a single visual quality metric or a composition of multiple
visual quality metrics) may be referenced to decide a best video
coding mode for a prediction unit. Taking the video coding modes
for a prediction unit as specified in an HEVC (High Efficiency
Video Coding) standard for example, the intra prediction include
two partition modes for the prediction unit, and the inter
prediction includes eight partition modes for the prediction unit,
as shown in FIG. 5. The conventional video coding design calculates
a distortion value Distortion (C, R) for each candidate mode, where
C represent pixels in a current source frame, R represent pixels in
a reconstructed frame, and the distortion value Distortion (C, R)
may be an SAD value, an SATD value or an SSD value. Next, the
conventional video coding design decides the configuration of a
prediction unit by finding a best video coding mode with a smallest
distortion value among distortion values of the candidate modes. In
contrast to the conventional video coding design, the present
invention proposes using the evaluated visual quality VQ (C or R')
derived from data involved in the coding loop of the coding unit
102 to find a best video coding mode for a prediction unit, where
each evaluated visual quality VQ(C or R') may be a single visual
quality metric or a composition of multiple visual quality metrics,
C represents raw data of the source frame IMG.sub.IN, and R'
represents processed data derived from raw data of the source frame
IMG.sub.IN. Briefly summarized, the operation of referring to the
evaluated visual quality for deciding the target configuration of
the prediction unit at the intra prediction module 123/inter
prediction module 122 may include: deciding a size of the
prediction unit; and/or deciding that the prediction unit is a
symmetric prediction unit or an asymmetric prediction unit.
[0034] Alternatively, both of the evaluated visual quality (e.g.,
VQ(C or R')) and the pixel-based distortion (e.g., Distortion (C,
R)) may be involved in deciding the best video coding mode for a
prediction unit. For example, the intra prediction module 123/inter
prediction module 122 refers to the evaluated visual quality to
find a first video coding mode with best visual quality, refers to
the calculated pixel-based distortion to find a second video coding
mode with smallest distortion, and selects one of the first video
coding mode and the second video coding mode as the best video
coding mode for the prediction unit. For another example, the intra
prediction module 123/inter prediction module 122 performs a coarse
decision according to one of the evaluated visual quality and the
pixel-based distortion to determine a plurality of coarse
configuration settings for a prediction unit, and performs a fine
decision according to another of the evaluated visual quality and
the pixel-based distortion to determine at least one fine
configuration setting from the coarse configuration settings,
wherein the target configuration for the prediction unit is derived
from the at least one fine configuration setting.
[0035] Concerning the transform module 123, the evaluated visual
quality (each is determined based on a single visual quality metric
or a composition of multiple visual quality metrics) maybe
referenced to decide a best video coding mode for a transform unit.
Taking the video coding modes for a transform unit as specified in
an HEVC (High Efficiency Video Coding) standard for example, the
transform unit includes several partition modes as shown in FIG. 6.
The conventional video coding design calculates a distortion value
Distortion (C, R) for each candidate mode, where C represent pixels
in a current source frame, R represent pixels in a reconstructed
frame, and the distortion value Distortion (C, R) may be an SAD
value, an SATD value or an SSD value. Next, the conventional video
coding design decides the configuration of a transform unit by
finding a best mode with a smallest distortion value among
distortion values of the candidate modes. In contrast to the
conventional video coding design, the present invention proposes
using the evaluated visual quality VQ(C or R') derived from data
involved in the coding loop of the coding unit 102 to find a best
video coding mode for a transform unit, where each evaluated visual
quality VQ(C or R') may be a single visual quality metric or a
composition of multiple visual quality metrics, C represents raw
data of the source frame IMG.sub.IN, and R' represents processed
data derived from raw data of the source frame IMG.sub.IN. Briefly
summarized, the operation of referring to the evaluated visual
quality for deciding the target configuration of the transform unit
at the transform module 113 may include: deciding a size of the
transform unit; deciding a quad-tree depth of the transform unit;
and/or deciding that the transform unit is a residual quad-tree
(RQT) transform unit or a non-square quad-tree (NSQT) transform
unit.
[0036] Alternatively, both of the evaluated visual quality (e.g.,
VQ(C or R')) and the pixel-based distortion (e.g., Distortion (C,
R)) may be involved in deciding a best video coding mode for a
transform unit. For example, the transform module 113 refers to the
evaluated visual quality to find a first video coding mode with
best visual quality, refers to the calculated pixel-based
distortion to find a second video coding mode with smallest
distortion, and selects one of the first video coding mode and the
second video coding mode as the best video coding mode for the
transform unit. For another example, the transform module 113
performs a coarse decision according to one of the evaluated visual
quality and the pixel-based distortion to determine a plurality of
coarse configuration settings for a transform unit, and performs a
fine decision according to another of the evaluated visual quality
and the pixel-based distortion to determine at least one fine
configuration setting from the coarse configuration settings,
wherein the target configuration for the prediction unit is derived
from the at least one fine configuration setting.
[0037] FIG. 7 is a flowchart illustrating a video coding method
according to a first embodiment of the present invention. Provided
that the result is substantially the same, the steps are not
required to be executed in the exact order shown in FIG. 7. The
video coding method may be briefly summarized as below.
[0038] Step 700: Start.
[0039] Step 702: Evaluate visual quality based on data involved in
a coding loop, wherein the data involved in the coding loop may be
raw data of a source frame or processed data derived from the raw
data of the source frame, and each evaluated visual quality may be
obtained from a single visual quality metric or a composition of
multiple visual quality metrics.
[0040] Step 704: Check if pixel-based distortion should be used for
video coding mode decision. If yes, go to step 706; otherwise, go
to step 710.
[0041] Step 706: Calculate the pixel-based distortion based on at
least a portion of raw data of the source frame and at least a
portion of processed data derived from the raw data of the source
frame.
[0042] Step 708: Refer to both of the evaluated visual quality and
the calculated pixel-based distortion for deciding a target
configuration of at least one of a coding unit, a transform unit
and a prediction unit. Go to step 712.
[0043] Step 710: Refer to the evaluated visual quality for deciding
a target configuration of at least one of a coding unit, a
transform unit and a prediction unit.
[0044] Step 712: End.
[0045] As a person skilled in the art can readily understand
details of each step in FIG. 7 after reading above paragraphs,
further description is omitted here for brevity.
[0046] As mentioned above, the evaluated visual quality determined
by the visual quality evaluation module 104 can be used to
determine a target configuration of at least one of a coding unit,
a transform unit and a prediction unit. However, this is not meant
to be a limitation of the present invention. In a second
application, the coding circuit 102 may be arranged to refer to the
aforementioned visual quality determined by the visual quality
evaluation module 104 for deciding target coding parameter(s)
associated with at least one of a coding unit, a transform unit and
a prediction unit in video coding, where the evaluated visual
quality in this case may provide visual quality information for
candidate video coding modes, the coding unit may be configured at
the splitting module 111, the transform unit may be configured at
the transform module 113, and the prediction unit maybe configured
at the intra prediction module 123/inter prediction module 122. By
way of example, but not limitation, the target coding parameter(s)
may include a quantization parameter (which is used by quantization
module 114 and inverse quantization module 116) and/or a transform
parameter (which is used by transform module 113 and inverse
transform module 117). In addition, the target coding parameter(s)
set based on the evaluated visual quality may be included in the
bitstream BS generated by encoding the source frame IMG.sub.IN.
That is, the target coding parameter(s) can be transmitted to a
video decoding apparatus to facilitate the decoder-side video
processing operation. As the visual quality evaluation performed by
the visual quality evaluation module 104 has been detailed above,
further description directed to obtaining the evaluated visual
quality based on one or more visual quality metrics is omitted here
for brevity.
[0047] In an alternative design, both of the evaluated visual
quality (which is obtained based on data involved in the coding
loop) and the pixel-based distortion (which is generated based on
at least a portion of raw data of the source frame IMG.sub.IN and
at least a portion of processed data derived from the raw data of
the source frame IMG.sub.IN) are used to decide target coding
parameter(s) associated with at least one of a coding unit, a
transform unit and a prediction unit in video coding, wherein the
evaluated visual quality in this case may provide visual quality
information for candidate video coding modes, and the calculated
pixel-based distortion in this case may provide distortion
information for candidate video coding modes. For example, the
transform module 113 refers to the evaluated visual quality to
decide a first transform parameter setting with best visual
quality, refers to the calculated pixel-based distortion to decide
a second transform parameter setting with smallest distortion, and
selects one of the first transform parameter setting and the second
transform parameter setting to set the transform parameter.
Similarly, the quantization module 114 refers to the evaluated
visual quality to decide a first quantization parameter setting
with best visual quality, refers to the calculated pixel-based
distortion to decide a second quantization parameter setting with
smallest distortion, and selects one of the first quantization
parameter setting and the second quantization parameter setting to
set the quantization parameter.
[0048] For another example, the transform module 113 performs a
coarse decision according to one of the evaluated visual quality
and the pixel-based distortion to determine a plurality of coarse
parameter settings for a transform parameter, and performs a fine
decision according to another of the evaluated visual quality and
the pixel-based distortion to determine at least one fine parameter
setting from the coarse parameter settings, wherein a target coding
parameter (i.e., the transform parameter) is derived from the at
least one fine parameter setting. Similarly, the quantization
module 114 performs a coarse decision according to one of the
evaluated visual quality and the pixel-based distortion to
determine a plurality of coarse parameter settings for a
quantization parameter, and performs a fine decision according to
another of the evaluated visual quality and the pixel-based
distortion to determine at least one fine parameter setting from
the coarse parameter settings, wherein a target coding parameter
(i.e., the quantization parameter) is derived from the at least one
fine parameter setting.
[0049] FIG. 8 is a flowchart illustrating a video coding method
according to a second embodiment of the present invention. Provided
that the result is substantially the same, the steps are not
required to be executed in the exact order shown in FIG. 8. The
video coding method may be briefly summarized as below.
[0050] Step 800: Start.
[0051] Step 802: Evaluate visual quality based on data involved in
a coding loop, wherein the data involved in the coding loop may be
raw data of a source frame or processed data derived from the raw
data of the source frame, and each evaluated visual quality may be
obtained from a single visual quality metric or a composition of
multiple visual quality metrics.
[0052] Step 804: Check if pixel-based distortion should be used for
coding parameter decision. If yes, go to step 806; otherwise, go to
step 810.
[0053] Step 806: Calculate the pixel-based distortion based on at
least a portion of raw data of the source frame and at least a
portion of processed data derived from the raw data of the source
frame.
[0054] Step 808: Refer to both of the evaluated visual quality and
the calculated pixel-based distortion for deciding a target coding
parameter (e.g., a quantization parameter or a transform parameter)
associated with at least one of a coding unit, a transform unit and
a prediction unit in video coding. Go to step 812.
[0055] Step 810: Refer to the evaluated visual quality for deciding
a target coding parameter (e.g., a quantization parameter or a
transform parameter) associated with at least one of a coding unit,
a transform unit and a prediction unit in video coding.
[0056] Step 812: End.
[0057] As a person skilled in the art can readily understand
details of each step in FIG. 8 after reading above paragraphs,
further description is omitted here for brevity.
[0058] Those skilled in the art will readily observe that numerous
modifications and alterations of the device and method may be made
while retaining the teachings of the invention. Accordingly, the
above disclosure should be construed as limited only by the metes
and bounds of the appended claims.
* * * * *