U.S. patent application number 11/501746 was filed with the patent office on 2006-11-30 for image processing apparatus and method, recording medium, and program.
This patent application is currently assigned to Sony Corporation. Invention is credited to Kazushi Sato, Kuniaki Takahashi, Yoichi Yagasaki.
Application Number | 20060269156 11/501746 |
Document ID | / |
Family ID | 19055309 |
Filed Date | 2006-11-30 |
United States Patent
Application |
20060269156 |
Kind Code |
A1 |
Takahashi; Kuniaki ; et
al. |
November 30, 2006 |
Image processing apparatus and method, recording medium, and
program
Abstract
A normalized activity calculating unit calculates a normalized
activity that is calculated from luminescence component pixel
values of an original image on the basis of information from an
information buffer such as bit rate, an amount of bits generated in
each frame, and an amount of bits generated and quantization step
size in each macroblock. The normalized activity calculating unit
then outputs the normalized activity to a code amount control unit.
The code amount control unit calculates a quantization scale code
corresponding to a target bit rate which quantization scale code
matches visual characteristics, using the normalized activity
inputted from the normalized activity calculating unit, and then
outputs the quantization scale code to a quantizing unit. The
quantizing unit quantizes discrete cosine transform coefficients
inputted from a band limiting unit on the basis of the quantization
scale code inputted to the quantizing unit.
Inventors: |
Takahashi; Kuniaki;
(Kanagawa, JP) ; Sato; Kazushi; (Chiba, JP)
; Yagasaki; Yoichi; (Tokyo, JP) |
Correspondence
Address: |
RADER FISHMAN & GRAUER PLLC
LION BUILDING
1233 20TH STREET N.W., SUITE 501
WASHINGTON
DC
20036
US
|
Assignee: |
Sony Corporation
|
Family ID: |
19055309 |
Appl. No.: |
11/501746 |
Filed: |
August 10, 2006 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10196259 |
Jul 17, 2002 |
7116835 |
|
|
11501746 |
Aug 10, 2006 |
|
|
|
Current U.S.
Class: |
382/251 ;
375/E7.119; 375/E7.139; 375/E7.145; 375/E7.155; 375/E7.158;
375/E7.166; 375/E7.198; 375/E7.211; 375/E7.26 |
Current CPC
Class: |
H04N 19/40 20141101;
H04N 19/56 20141101; H04N 19/59 20141101; H04N 19/152 20141101;
H04N 19/523 20141101; H04N 19/176 20141101; H04N 19/132 20141101;
H04N 19/124 20141101; H04N 19/15 20141101; H04N 19/186 20141101;
H04N 19/18 20141101; H04N 19/61 20141101; H04N 19/172 20141101 |
Class at
Publication: |
382/251 |
International
Class: |
G06K 9/00 20060101
G06K009/00 |
Foreign Application Data
Date |
Code |
Application Number |
Jul 26, 2001 |
JP |
P2001-221674 |
Claims
1. An image processing apparatus for converting image data coded at
a first bit rate into image data coded at a second bit rate, the
image processing apparatus comprising: a first quantization scale
code calculating unit for calculating a first quantization scale
code required to code the image data at the first bit rate on the
basis of the image data coded at the first bit rate; a second
quantization scale code calculating unit for calculating a second
quantization scale code obtained by subjecting the first
quantization scale code to adaptive quantization on the basis of
the image data coded at the first bit rate; a normalized activity
calculating unit for calculating a normalized activity on the basis
of the first quantization scale code and the second quantization
scale code; a decoding unit for decoding image data coded at the
first bit rate; a band limiting unit for performing a band limiting
in the decoded image data at the first bit rate; a third
quantization scale code calculating unit for calculating a third
quantization scale code required to code the image data at the
second bit rate; and a fourth quantization scale code calculating
unit for calculating a fourth quantization scale code obtained by
the basis of the third quantization scale code and the normalized
activity; and an encoding unit for encoding the band limited image
data at the first bit rate based on the fourth quantization scale
code.
2. The image processing apparatus according to claim 1, wherein the
band limiting unit cuts high-frequency component coefficients in a
horizontal direction in the decoded image data.
3. The image processing apparatus according to claim 1, wherein the
coded image data is image data compressed by an MPEG method.
4. The image processing apparatus according to claim 1, wherein the
image data coded at the first bit rate includes an amount of codes
of the image data itself, an amount of bits generated in each
frame, an amount of bits generated in each macroblock, and a
quantization step size in each macroblock.
5. The image processing apparatus according to claim 4, wherein the
first quantization scale code calculating unit calculates the first
quantization scale code on the basis of the amount of codes, the
amount of bits generated in each frame, and the amount of bits
generated in each macroblock.
6. The image processing apparatus according to claim 1, wherein the
normalized activity calculating unit calculates the normalized
activity by dividing the second quantization scale code by the
first quantization scale code.
7. An image processing method for converting image data coded at a
first bit rate into image data coded at a second bit rate, the
image processing method comprising: calculating a first
quantization scale code required to code the image data at the
first bit rate on the basis of the image data coded at the first
bit rate; calculating a second quantization scale code obtained by
subjecting the first quantization scale code to adaptive
quantization on the basis of the image data coded at the first bit
rate; calculating a normalized activity on the basis of the first
quantization scale code and the second quantization scale code;
decoding image data coded at the first bit rate; performing a band
limiting in the decoded image data at the first bit rate;
calculating a third quantization scale code required to code the
image data at the second bit rate; calculating a fourth
quantization scale code obtained by the basis of the third
quantization scale code and the normalized activity; and encoding
the band limited image data at the first bit rate based on the
fourth quantization scale code.
8. The image processing method according to claim 7, wherein the
band limiting cuts high-frequency component coefficients in a
horizontal direction in the decoded image data.
9. The image processing method according to claim 7, wherein the
coded image data is image data compressed by an MPEG method.
10. The image processing method according to claim 7, wherein the
image data coded at the first bit rate includes an amount of codes
of the image data itself, an amount of bits generated in each
frame, an amount of bits generated in each macroblock, and a
quantization step size in each macroblock.
11. The image processing method according to claim 10, wherein the
first quantization scale code calculating calculates the first
quantization scale code on the basis of the amount of codes, the
amount of bits generated in each frame, and the amount of bits
generated in each macroblock.
12. The image processing method according to claim 7, wherein the
normalized activity calculating calculates the normalized activity
by dividing the second quantization scale code by the first
quantization scale code.
13. A computer program product, for converting image data coded at
a first bit rate into image data coded at a second bit rate, the
computer program product stored on a computer readable medium and
adapted to perform operations comprising: calculating a first
quantization scale code required to code the image data at the
first bit rate on the basis of the image data coded at the first
bit rate; calculating a second quantization scale code obtained by
subjecting the first quantization scale code to adaptive
quantization on the basis of the image data coded at the first bit
rate; calculating a normalized activity on the basis of the first
quantization scale code and the second quantization scale code;
decoding image data coded at the first bit rate; performing a band
limiting in the decoded image data at the first bit rate;
calculating a third quantization scale code required to code the
image data at the second bit rate; calculating a fourth
quantization scale code obtained by the basis of the third
quantization scale code and the normalized activity; and encoding
the band limited image data at the first bit rate based on the
fourth quantization scale code.
14. The computer program product according to claim 13, wherein the
band limiting cuts high-frequency component coefficients in a
horizontal direction in the decoded image data.
15. The computer program product according to claim 13, wherein the
coded image data is image data compressed by an MPEG method.
16. The computer program product according to claim 13, wherein the
image data coded at the first bit rate includes an amount of codes
of the image data itself, an amount of bits generated in each
frame, an amount of bits generated in each macroblock, and a
quantization step size in each macroblock.
17. The computer program product according to claim 16, wherein the
first quantization scale code calculating calculates the first
quantization scale code on the basis of the amount of codes, the
amount of bits generated in each frame, and the amount of bits
generated in each macroblock.
18. The computer program product according to claim 13, wherein the
normalized activity calculating calculates the normalized activity
by dividing the second quantization scale code by the first
quantization scale code.
Description
[0001] This is a divisional of application Ser. No. 10/196,259,
filed Jul. 7, 2002, the entire contents of which are hereby
incorporated by reference. The present application also claims
priority based on Japanese Patent Application No. 2001-221674,
filed Jul. 17, 2002, the entire contents of which are also hereby
incorporated herein by reference.
BACKGROUND OF THE INVENTION
[0002] The present invention relates to an apparatus and a method
for image processing, a recording medium, and a program and,
particularly, to an apparatus and a method for image processing, a
recording medium, and a program that make it possible to realize
high-speed processing for converting image information (bit stream)
compressed by an orthogonal transform, such as a discrete cosine
transform in MPEG (Moving Picture Experts Group) or the like, and
motion compensation into image data with a lower bit rate.
[0003] Recently, apparatuses based on MPEG and other schemes that
handle image data as digital information and compress the image
data by an orthogonal transform, such as a discrete cosine
transform, and motion compensation using redundancy specific to the
image data for the purposes of highly efficient transmission and
storage of the information have been spreading for use by
broadcasting stations or the like in distributing information or in
ordinary households in receiving information.
[0004] MPEG2 (ISO/IEC 13818-2), in particular, is defined as a
general-purpose image-coding scheme, and is a standard provided for
interlaced-scanning images and progressive scanning images as well
as standard-resolution images and high-resolution images. MPEG2 is
therefore expected to continue to be used in the future in a wide
variety of application software for business use by the
broadcasting industry and for use by general users. With an MPEG2
compression method, by assigning a code amount (bit rate) of 4 to 8
Mbps to an interlaced-scanning image of standard resolution formed
of 720 pixels.times.480 pixels and a code amount (bit rate) of 18
to 22 Mbps to an interlaced-scanning image of high resolution
formed of 1920 pixels.times.1088 pixels, for example, it is
possible to compress the images while maintaining a high
compression ratio and good image quality and realize transmission
and storage of the compressed images.
[0005] However, an image of high resolution has an enormous amount
of information. Even with compression using a coding scheme such as
MPEG, as described above, a code amount (bit rate) of about 18 to
about 22 Mbps or more is required for a 30-Hz interlaced-scanning
image of 1920 pixels.times.1080 pixels, for example, in order to
obtain sufficient image quality. Hence, the code amount (bit rate)
needs to be further reduced while minimizing degradation in image
quality, so as to adjust to the bandwidth of a transmission line
when image information compressed by the MPEG method is to be
transmitted via a network, such as cable television or satellite
broadcasting, for example, and so as to adjust to the capacity of a
recording medium when image information compressed by the MPEG
method is to be stored (recorded) on a recording medium, such as an
optical disk, a magnetic disk, or a magneto-optical disk. Such
reduction of the code amount (bit rate) also may be required when
compressed image information (bit stream) of an image of standard
resolution (for example a 30-Hz interlaced-scanning image of 720
pixels.times.480 pixels) as well as an image of high resolution is
to be transmitted via a network or recorded on a recording medium
as described above.
[0006] As means for solving such a problem, there are methods such
as hierarchical coding (scalability) processing and image
information converting (transcoding) processing. In relation to the
former method, SNR (Signal to Noise Ratio) scalability is
standardized in MPEG2 to thereby enable the hierarchical coding of
high-SNR compressed image information (bit stream) and low-SNR
compressed image information (bit stream). However, although the
hierarchical coding requires that the restraining condition of the
bandwidth of the network medium or the storage capacity of the
recording medium be known at the time of the coding, such
information is unknown in an actual system in most cases. Thus, the
latter may be said to be a method having a higher degree of freedom
and suited to an actual system.
[0007] The image information converting (transcoding) processing,
for example, converts compressed image information compressed by
the MPEG2 method into compressed image information with a lower bit
rate. In the image information converting processing, information
such as a picture coding type, a quantization width in each
macroblock, and a quantization matrix is first extracted from the
compressed image information compressed by the MPEG2 method. Then,
the compressed image information is variable-length-decoded and
rearranged into two-dimensional data as quantized discrete cosine
transform coefficients. The quantized discrete cosine transform
coefficients rearranged in the form of two-dimensional data are
then inversely quantized on the basis of the quantization width and
the quantization matrix mentioned above. Predetermined
high-frequency component coefficients are cut from the inversely
quantized discrete cosine transform coefficients. The resulting
inversely quantized discrete cosine transform coefficients are
requantized with a quantization width (quantization scale code)
generated on the basis of a target bit rate (lower than the
original bit rate), variable-length-coded again by the MPEG2
method, and then outputted.
[0008] The quantization width (quantization scale code)
corresponding to the image information compressed by the MPEG2
method is determined by processing, to be explained with reference
to the flowchart of FIG. 1, to thereby control the amount of codes.
The following description will be made by taking as an example
compressed image information compressed by an MPEG2 Test Model 5
(ISO/IEC JTC1/SC 9/WG11N400) method. In this code amount control, a
target code amount (target bit rate) and a GOP (Group of Pictures)
formation are input variables. The GOP in this case is a group of
three picture types: an I (Intra Code) picture (picture coded
separately by itself), a P (Predictive Code) picture (picture coded
by a temporally previous (past) I-picture or P-picture), and a B
(Bidirectionally Predictive Code) picture (picture coded by a
temporally previous or subsequent (past or future) I-picture or
P-picture) used in image compression by the MPEG2 method.
[0009] At a step S1, an amount of bits are allocated to each
picture in the GOP on the basis of an amount of bits (hereinafter
referred to as an assigned bit amount R) to be assigned to pictures
not decoded yet in the GOP including the picture targeted for the
allocation. The allocation is repeated in the order of coded
pictures in the GOP. In this case, an amount of codes is assigned
to each picture using two assumptions described below.
[0010] As a first assumption, it is assumed that a product of an
average quantization scale code used in coding each picture and an
amount of codes generated is constant for each picture type unless
the screen is changed. Thus, variables X.sub.i, X.sub.p, and
X.sub.b (global complexity measure) indicating the complexity of
the screen are updated by the following equations (1) to (3). The
relation between the amount of codes generated and the quantization
scale code when the next picture is coded is estimated from the
parameters. X.sub.i=S.sub.iQ.sub.i (1) X.sub.p=S.sub.pQ.sub.p (2)
X.sub.b=S.sub.bQ.sub.b (3) where S.sub.i, S.sub.p, and S.sub.b
denote the amount of code bits generated at the time of coding the
picture; and Q.sub.i, Q.sub.p, and Q.sub.b denote an average
quantization scale code at the time of coding the picture. Initial
values are set as expressed by the following equations (4) to (6)
using a target code amount (target bit rate) bit_rate (bits/sec).
X.sub.i=160.times.bit_rate/115 (4) X.sub.p=60.times.bit_rate/115
(5) X.sub.b=42.times.bit_rate/115 (6)
[0011] As a second assumption, it is assumed that overall picture
quality is optimized at all times when the ratios K.sub.p and
K.sub.b of the quantization scale codes of a P-picture and a
B-picture with respect to the quantization scale code of an
I-picture are values defined by equations (7) and (8).
K.sub.p=Q.sub.p/Q.sub.i=1.0 (7) K.sub.b=Q.sub.b/Q.sub.i=1.4 (8)
[0012] Specifically, the quantization scale code of a B-picture is
1.4 times the quantization scale codes of an I-picture and a
P-picture at all times. This assumes that when the B-picture is
coded somewhat more roughly than the I-picture and the P-picture
and an amount of codes thus saved in the B-picture are added to the
I-picture and the P-picture, the picture quality of the I-picture
and the P-picture is improved and, in turn, the picture quality of
the B-picture using the I-picture and the P-picture as a reference
is improved.
[0013] On the basis of the above two assumptions, bit amounts
(T.sub.i, T.sub.p, and T.sub.b) assigned to the pictures in the GOP
are values expressed by equations (9) to (11). T i = max .times. {
R 1 + N p X p X i K p + N b X b X i K b , bit_rate 8 .times.
picture_rate } ( 9 ) T p = max .times. { R N p + N b K p X b K b X
p , bit_rate 8 .times. picture_rate } ( 10 ) T b = max .times. { R
N b + N p K b X p K p X b , bit_rate 8 .times. picture_rate } ( 11
) ##EQU1## where N.sub.p and N.sub.b denote the numbers of
P-pictures and B-pictures not coded yet in the GOP. On the basis of
the thus obtained assigned code amounts, the assigned bit amount R
assigned to pictures not coded yet in the GOP is updated by the
following equation (12) each time a picture is coded.
R=R-S.sub.i,p,b; (12)
[0014] When a first picture in the GOP is coded, the assigned bit
amount R is updated by an equation (13). R = bit_rate .times. N
picture_rate + R ( 13 ) ##EQU2## where N denotes the number of
pictures in the GOP. An initial value of the assigned bit amount R
at the start of the sequence is zero.
[0015] At a step S2, in order that the bit amounts (T.sub.i,
T.sub.p, and T.sub.b) assigned to the pictures obtained by the
equations (9) to (11) in the processing of the step S1 coincide
with the amounts of codes actually generated, the quantization
scale code is obtained by feedback control in macroblock units on
the basis of the capacity of three virtual buffers set
independently for each picture. In the following description, a
macroblock is of a two-dimensional 8.times.8 formation.
[0016] Prior to the coding of a .sub.jth macroblock, the occupancy
quantity of the virtual buffers is obtained by equations (14) to
(16), d j i = d 0 i + B j - 1 - T i .times. ( j - 1 ) MB cnt ( 14 )
d j p = d 0 p + B j - 1 - T p .times. ( j - 1 ) MB cnt ( 15 ) d j b
= d 0 b + B j - 1 - T b .times. ( j - 1 ) MB cnt ( 16 ) ##EQU3##
where d.sub.0.sup.i, d.sub.0.sup.p, and d.sub.0.sup.b denote the
initial occupancy quantity of the virtual buffers for the
I-picture, P-picture, and B-picture, respectively; B.sub.j denotes
an amount of bits generated from a head to a .sub.jth macroblock of
a picture; and MB.sub.cnt denotes a number of macroblocks within
one picture.
[0017] The virtual buffer occupancy quantity at the time of the end
of the coding of each picture (d.sub.MBcnt.sup.i,
d.sub.MBcnt.sup.p, d.sub.MBcnt.sup.b) is used as an initial value
(d.sub.0.sup.i, d.sub.0.sup.p, and d.sub.0.sup.b) of the virtual
buffer occupancy quantity for a next picture in the same picture
type.
[0018] Next, the quantization scale code for the .sub.jth
macroblock is calculated by the following equation (17): Q j = d j
.times. 31 r ( 17 ) ##EQU4## where r is a parameter for controlling
response speed of a feedback loop, referred to as a reaction
parameter. The parameter r is given by the following equation (18):
r = 2 .times. bit_rate picture_rate ( 18 ) ##EQU5##
[0019] Initial values of the virtual buffers at the start of a
sequence are given by the following equations (19) to (21): d 0 i =
10 .times. r 31 ( 19 ) d 0 p = K p d 0 i ( 20 ) d 0 b = K b d 0 i (
21 ) ##EQU6##
[0020] At a step S3, the quantization scale code obtained by the
processing of the step S2 is changed by a variable referred to as
activity for each macroblock such that finer quantization is
performed in a flat portion where degradation tends to be visually
more noticeable and rougher quantization is performed in a portion
of a complex pattern where degradation tends to be less
noticeable.
[0021] The activity is given by the following equations (22) to
(24) using pixel values of a luminescence signal of an original
image, or pixel values of a total of eight blocks, that is, four
blocks in a frame discrete cosine transform mode and four blocks in
a field discrete cosine transform mode: [ Equation .times. .times.
.times. 1 ] act j = 1 + min sblk = 1 , 8 .times. ( var_sblk ) ( 22
) [ Equation .times. .times. 2 ] var_blk = 1 64 .times. k = 1 64
.times. ( P k - P _ ) 2 ( 23 ) [ Equation .times. .times. 3 ] P _ =
1 64 .times. k = 1 64 .times. P k ( 24 ) ##EQU7## where P.sub.k is
a pixel value within a block of the luminescence signal of the
original image. A minimum value is obtained in the equation (22)
because quantization is made finer when there is a flat portion
even in a part of the macroblock.
[0022] Then, a normalized activity Nact.sub.j having a value in a
range of 0.5 to 2 is obtained by an equation (25). Nact j = 2
.times. act j + avg_act act + 2 .times. avg_act ( 25 ) ##EQU8##
where Avg_act is an average value of act.sub.j in an immediately
preceding coded picture. A quantization scale code mquant.sub.j
where visual characteristics are taken into consideration is given
by an equation (26) on the basis of the value of the quantization
scale code Q.sub.j obtained at the step S2.
mquant.sub.j=Q.sub.j.times.Nact.sub.j (26)
[0023] With the quantization scale code mquant.sub.j thus obtained,
the compressed image information compressed by the MPEG2 method is
converted into compressed image information with a lower target bit
rate.
[0024] However, the method described above requires the calculation
of an average pixel value for each macroblock in the equations (22)
to (24) every time the image conversion processing is performed,
thus requiring an enormous amount of processing for the
calculation. As a result, the processing takes time, and the cost
of the apparatus is increased because hardware capable of the
enormous calculations is required.
[0025] In addition, while the activity described above is
calculated using the pixel values of a luminescence signal of an
original image, it is not possible to know the pixel values of the
luminescence signal of the original image in the image conversion
processing. Therefore, when the input compressed image information
has been subjected to efficient adaptive quantization adapted to
the complexity of the image by detection of skin color or detection
of red, for example, adaptive quantization using similar normalized
activity information cannot be performed at the time of
requantization.
SUMMARY OF THE INVENTION
[0026] The present invention has been made in view of such a
situation, and it is accordingly an object of the present invention
to realize high-speed processing for converting image information
(bit stream) compressed by an orthogonal transform, such as a
discrete cosine transform in MPEG or the like, and motion
compensation and then coded into image data with a lower bit rate
when the image information is to be transmitted via a network, such
as satellite broadcasting, cable television, or the Internet, and
when the image information is to be recorded (stored) on a
recording medium, such as an optical disk, a magnetic disk, or a
magneto-optical disk.
[0027] According to the present invention, there is provided an
image processing apparatus which includes: first quantization scale
code calculating means for calculating a first quantization scale
code required to code image data at a first bit rate on the basis
of the image data coded at the first bit rate; second quantization
scale code calculating means for calculating a second quantization
scale code obtained by subjecting the first quantization scale code
to adaptive quantization according to visual characteristics on the
basis of the image data coded at the first bit rate; normalized
activity calculating means for calculating a normalized activity on
the basis of the first quantization scale code and the second
quantization scale code; third quantization scale code calculating
means for calculating a third quantization scale code required to
code the image data at a second bit rate; and fourth quantization
scale code calculating means for calculating a fourth quantization
scale code obtained by subjecting the third quantization scale code
to adaptive quantization according to visual characteristics on the
basis of the third quantization scale code and the normalized
activity.
[0028] The second bit rate can be lower than the first bit
rate.
[0029] The coded image data can be image data compressed by an MPEG
method.
[0030] The image data coded at the first bit rate can include an
amount of codes of the image data itself, an amount of bits
generated in each frame, an amount of bits generated in each
macroblock, and a quantization step size in each macroblock.
[0031] The first quantization scale code calculating means can
calculate the first quantization scale code on the basis of the
amount of codes, the amount of bits generated in each frame, and
the amount of bits generated in each macroblock.
[0032] The second quantization scale code calculating means can
calculate the second quantization scale code obtained by subjecting
the first quantization scale code to adaptive quantization
according to visual characteristics by dividing the quantization
step size in each macroblock by 2.
[0033] The normalized activity calculating means can calculate the
normalized activity by dividing the second quantization scale code
by the first quantization scale code.
[0034] According to the present invention, there is provided an
image processing method which includes: a first quantization scale
code calculating step for calculating a first quantization scale
code required to code image data at a first bit rate on the basis
of the image data coded at the first bit rate; a second
quantization scale code calculating step for calculating a second
quantization scale code obtained by subjecting the first
quantization scale code to adaptive quantization according to
visual characteristics on the basis of the image data coded at the
first bit rate; a normalized activity calculating step for
calculating a normalized activity on the basis of the first
quantization scale code and the second quantization scale code; a
third quantization scale code calculating step for calculating a
third quantization scale code required to code the image data at a
second bit rate; and a fourth quantization scale code calculating
step for calculating a fourth quantization scale code obtained by
subjecting the third quantization scale code to adaptive
quantization according to visual characteristics on the basis of
the third quantization scale code and the normalized activity.
[0035] According to the present invention, there is provided a
program on a recording medium, the program including: a first
quantization scale code calculation control step for controlling
the calculation of a first quantization scale code required to code
image data at a first bit rate on the basis of the image data coded
at the first bit rate; a second quantization scale code calculation
control step for controlling the calculation of a second
quantization scale code obtained by subjecting the first
quantization scale code to adaptive quantization according to
visual characteristics on the basis of the image data coded at the
first bit rate; a normalized activity calculation control step for
controlling the calculation of a normalized activity on the basis
of the first quantization scale code and the second quantization
scale code; a third quantization scale code calculation control
step for controlling the calculation of a third quantization scale
code required to code the image data at a second bit rate; and a
fourth quantization scale code calculation control step for
controlling the calculation of a fourth quantization scale code
obtained by subjecting the third quantization scale code to
adaptive quantization according to visual characteristics on the
basis of the third quantization scale code and the normalized
activity.
[0036] According to the present invention, there is provided a
program executed by a computer, the program including: a first
quantization scale code calculation control step for controlling
the calculation of a first quantization scale code required to code
image data at a first bit rate on the basis of the image data coded
at the first bit rate; a second quantization scale code calculation
control step for controlling the calculation of a second
quantization scale code obtained by subjecting the first
quantization scale code to adaptive quantization according to
visual characteristics on the basis of the image data coded at the
first bit rate; a normalized activity calculation control step for
controlling the calculation of a normalized activity on the basis
of the first quantization scale code and the second quantization
scale code; a third quantization scale code calculation control
step for controlling the calculation of a third quantization scale
code required to code the image data at a second bit rate; and a
fourth quantization scale code calculation control step for
controlling calculation of a fourth quantization scale code
obtained by subjecting the third quantization scale code to
adaptive quantization according to visual characteristics on the
basis of the third quantization scale code and the normalized
activity.
[0037] The image processing apparatus and method and the program
according to the present invention calculate a first quantization
scale code required to code image data at a first bit rate on the
basis of the image data coded at the first bit rate, calculate a
second quantization scale code obtained by subjecting the first
quantization scale code to adaptive quantization according to
visual characteristics on the basis of the image data coded at the
first bit rate, calculate a normalized activity on the basis of the
first quantization scale code and the second quantization scale
code, calculate a third quantization scale code required to code
the image data at a second bit rate, and calculate a fourth
quantization scale code obtained by subjecting the third
quantization scale code to adaptive quantization according to
visual characteristics on the basis of the third quantization scale
code and the normalized activity.
BRIEF DESCRIPTION OF THE DRAWINGS
[0038] FIG. 1 is a flowchart of assistance in explaining
conventional quantization scale code determination processing;
[0039] FIG. 2 is a block diagram showing a configuration of an
embodiment of a transcoder to which the present invention is
applied;
[0040] FIG. 3 is a flowchart of assistance in explaining bit rate
conversion processing;
[0041] FIG. 4 is a flowchart of assistance in explaining normalized
activity calculation processing;
[0042] FIGS. 5A and 5B are diagrams of assistance in explaining
scan methods;
[0043] FIGS. 6A and 6B are diagrams of assistance in explaining the
processing for cutting high-frequency components;
[0044] FIGS. 7A and 7B are diagrams of assistance in explaining the
processing for cutting high-frequency components;
[0045] FIG. 8 is a flowchart of assistance in explaining motion
compensation error correcting processing;
[0046] FIGS. 9A, 9B, 9C, and 9D are diagrams of assistance in
explaining motion compensative prediction processing with a
1/4-pixel precision;
[0047] FIG. 10 is a diagram of assistance in explaining inverse
discrete cosine transform processing and discrete cosine transform
processing based on Wang's fast algorithm;
[0048] FIG. 11 is a diagram of assistance in explaining the
processing for cutting high-frequency components; and
[0049] FIG. 12 is a diagram of assistance in explaining a
medium.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0050] FIG. 2 is a diagram showing a configuration of an embodiment
of a transcoder 1 according to the present invention. The
transcoder 1 converts bit rate of input compressed image
information compressed by a predetermined compression method and
then outputs the result. Specifically, the transcoder 1, for
example, performs the process of converting compressed image
information with a bit rate of 20 Mbps already compressed by the
MPEG2 method into compressed image information with a bit rate of
16 Mbps compressed by the same MPEG2 method. Of course, the bit
rate of the input compressed image information and the bit rate of
the compressed image information after the conversion may be other
than the above bit rates.
[0051] A code buffer 11 receives input compressed image information
with a high bit rate (compressed image information (bit stream)
having a large amount of codes), temporarily stores the input
compressed image information, and then sequentially outputs the
input compressed image information to a compressed information
analyzing unit 12. The compressed image information (bit stream)
stored in the code buffer 11 is coded so as to satisfy a
restraining condition of a not shown VBV (Video Buffering Verifier:
device for monitoring the overflow or underflow of the bit stream)
specified by MPEG2. Thus, the compressed image information is coded
so as to prevent an overflow or underflow in the code buffer
11.
[0052] The compressed information analyzing unit 12 analyzes the
compressed image information inputted from the code buffer 11
according to a syntax defined by MPEG2. The compressed information
analyzing unit 12 thereby extracts from the compressed image
information (bit stream) information such as an amount of codes, an
amount of bits generated in each frame, an amount of bits generated
in each macroblock, a quantization scale code, a quantization step
size in each macroblock, a q_scale_type flag, a quantization
matrix, and a scan method, and then stores the information in an
information buffer 21. The compressed information analyzing unit 12
outputs information of the quantization width, the quantization
matrix, and the scan method to a variable length decoding unit 13
in addition to the compressed image information. The amount of
codes is a value indicating the so-called bit rate. The amount of
bits generated in each frame or the amount of bits generated in
each macroblock is an amount of bits used in the frame unit or the
macroblock unit of the compressed image. The quantization scale
code is a code for specifying a value serving as a reference in
quantization processing. The quantization step size in each
macroblock is a value indicating a quantization step interval
specified by the quantization scale code. The q_scale_type flag
indicates by 1 or 0 whether a relation between the quantization
scale code and a quantizing value in quantization is linear (Linear
Q) or non-linear (Non-Linear Q). The quantization matrix is used in
the quantizing operation. The scan method is information indicating
a scan method such, for example, as a zigzag scan or an alternate
scan.
[0053] The variable length decoding unit 13 first subjects, to
variable length decoding, data of the compressed image information
inputted from the compressed information analyzing unit 12, which
data is coded as a difference value with respect to an adjacent
block for a direct-current component of an intra-macroblock, and
data of the compressed image information inputted from the
compressed information analyzing unit 12, which data is coded by
run (number of consecutive zeros in the codes) and level (value
other than zero in the codes) for other coefficients. The variable
length decoding unit 13 thereby obtains quantized one-dimensional
discrete cosine transform coefficients. The variable length
decoding unit 13 next rearranges the quantized discrete cosine
transform coefficients as two-dimensional data on the basis of the
information on the scan method (zigzag scan or alternate scan) of
the image which information is extracted by the compressed
information analyzing unit 12, and outputs the result to an inverse
quantization unit 14 together with the information of the
quantization width and the quantization matrix. Incidentally, the
scan method will be described later with reference to FIGS. 5A and
5B.
[0054] The inverse quantization unit 14 inversely quantizes the
quantized discrete cosine transform coefficients in the form of the
two-dimensional data inputted from the variable length decoding
unit 13 on the basis of the information on the quantization width
and the quantization matrix and then outputs the result as discrete
cosine transform coefficients to an adder 15. The adder 15
subtracts error components of the discrete cosine transform
coefficients inputted from a motion compensation error correcting
unit 20 from the discrete cosine transform coefficients inputted
from the inverse quantization unit 14. The adder 15 then outputs
the motion-compensated discrete cosine transform coefficients to a
band limiting unit 16 and also to an adder 32 in the motion
compensation error correcting unit 20.
[0055] The band limiting unit 16 cuts high-frequency component
coefficients in a horizontal direction in each 8.times.8 block on
the basis of the motion-compensated, discrete cosine transform
coefficients obtained as an output of the adder 15 and then outputs
the result to a quantizing unit 17.
[0056] The quantizing unit 17 quantizes the 8.times.8 discrete
cosine transform coefficients inputted from the band limiting unit
16 on the basis of a quantization scale code corresponding to a
target bit rate, which quantization scale code is inputted from a
code amount control unit 23. The quantizing unit 17 outputs the
quantized discrete cosine transform coefficients to a variable
length coding unit 18 and to an inverse quantization unit 31 in the
motion compensation error correcting unit 20. The variable length
coding unit 18 subjects the quantized discrete cosine transform
coefficients inputted from the quantizing unit 17 to variable
length coding by the MPEG2 method. The variable length coding unit
18 outputs the result to a code buffer 19, so that the result is
temporarily stored in the code buffer 19. The code buffer 19
temporarily stores the compressed image information converted to a
low bit rate. The code buffer 19 outputs the compressed image
information; and it also outputs the compressed image information
to the code amount control unit 23.
[0057] A normalized activity calculating unit 22 calculates a
normalized activity that is calculated from the luminescence
component pixel values of the original image on the basis of
information such as the bit rate, the amount of bits generated in
each frame, and the amount of bits generated and the quantization
step size in each macroblock from the information buffer 21. The
normalized activity calculating unit 22 then outputs the normalized
activity to the code amount control unit 23.
[0058] The code amount control unit 23 calculates the quantization
scale code corresponding to the target bit rate, which quantization
scale code matches visual characteristics, using the normalized
activity inputted from the normalized activity calculating unit 22,
and then outputs the quantization scale code to the quantizing unit
17.
[0059] The inverse quantization unit 31 in the motion compensation
error correcting unit 20 inversely quantizes the quantized discrete
cosine transform coefficients inputted from the quantizing unit 17
on the basis of the information on the quantization width and the
quantization matrix and then outputs the result to the adder 32.
The adder 32 calculates the differences between the discrete cosine
transform coefficients inputted from the inverse quantization unit
31 and the discrete cosine transform coefficients inputted from the
adder 15 and then outputs the differences to an inverse discrete
cosine transform unit 33. The inverse discrete cosine transform
unit 33 subjects the difference values between the discrete cosine
transform coefficients inputted from the inverse quantization unit
31 and the discrete cosine transform coefficients inputted from the
adder 15 to inverse discrete cosine transform processing, and
thereby generates motion compensation error correcting information
(error data). The inverse discrete cosine transform unit 33 outputs
the motion compensation error correcting information to a video
memory 34, so that the motion compensation error correcting
information is stored in the video memory 34.
[0060] A motion compensative predicting unit 35 performs motion
compensation processing based on the error data in the video memory
34 on the basis of motion vector information and motion
compensative prediction mode information (field motion compensative
prediction mode or frame motion compensative prediction mode and
forward prediction mode, backward prediction mode, or
bi-directional prediction mode) in the input compressed image
information (bit stream). The motion compensative predicting unit
35 thereby generates error correcting values in a spatial domain
and then outputs the error correcting values to a discrete cosine
transform unit 36. The discrete cosine transform unit 36 subjects
the error correcting values in the spatial domain inputted from the
motion compensative predicting unit 35 to discrete cosine transform
processing. The discrete cosine transform unit 36 thereby obtains
error correcting values in a frequency domain and then outputs the
error correcting values in the frequency domain to the adder
15.
[0061] The bit rate conversion processing for converting compressed
image information compressed by the MPEG2 method and with a bit
rate of 20 Mbps into compressed image information compressed by the
MPEG2 method and with a bit rate of 16 Mbps will be described next
with reference to the flowchart of FIG. 3. At a step S11, the code
buffer 11 temporarily stores the input compressed image
information, and then it outputs the input compressed image
information to the compressed information analyzing unit 12. At
step S12, the compressed information analyzing unit 12 extracts
information such as the amount of bits generated in each frame, the
amount of bits generated in each macroblock, the quantization scale
code, the quantization step size, the q_scale_type flag, the
quantization matrix, and the scan method from the compressed image
information (bit stream) inputted from the code buffer. The
compressed information analyzing unit 12 then stores the
information in the information buffer 21. The compressed
information analyzing unit 12 outputs the compressed image
information to the variable length decoding unit 13.
[0062] At a step S13, the normalized activity calculating unit 22
performs normalized activity calculation (reverse operation)
processing for calculating a normalized activity that is calculated
from luminescence component pixel values of the original image, by
the following method using the input information for each
macroblock stored in the information buffer 21.
[0063] The normalized activity calculation processing will next be
described with reference to the flowchart of FIG. 4. The normalized
activity calculating unit 22 calculates the normalized activity
using the following method, regardless of whether the q_scale_type
flag indicating a quantization type, which flag is extracted from
the input compressed image information (bit stream) compressed by
the MPEG2 method, indicates Linear Q or Non-Linear Q.
[0064] At a step S31, prior to the coding of a .sub.jth macroblock,
the normalized activity calculating unit 22 calculates the
occupancy quantity of virtual buffers at the time of coding the
input bit stream by equations (27) to (29). d in , j i = d in , 0 i
+ B in , j - 1 - T in , i .times. ( j - 1 ) MB cnt ( 27 ) d in , j
p = d in , 0 p + B in , j - 1 - T in , p .times. ( j - 1 ) MB cnt (
28 ) d in , j b = d in , 0 b + B in , j - 1 - T in , b .times. ( j
- 1 ) MB cnt ( 29 ) ##EQU9## d.sub.in,o.sup.i, d.sub.in,o.sup.P,
and d.sub.in,o.sup.b denote the initial occupancy quantity of the
virtual buffers supposed for the time of coding the input
compressed image information (bit stream); B.sub.in,j denotes an
amount of bits generated from a head to a .sub.jth macroblock of a
picture of the input compressed image information (bit stream);
MB.sub.cnt denotes a number of macroblocks within one picture; and
T.sub.in,i, T.sub.in,p, and T.sub.in,b denote an amount of bits
generated in an input frame. The virtual buffer occupancy quantity
at the time of end of the coding of each picture
(d.sub.in,MBcnt.sup.i,d.sub.in,MBcnt.sup.p, d.sub.in,MBcnt.sup.b)
is used as an initial value (d.sub.in,o.sup.i, d.sub.in,o.sup.p,
and d.sub.in,o.sup.b) of the virtual buffer occupancy quantity for
a next picture in the same picture type.
[0065] At a step S32, the normalized activity calculating unit 22
calculates a quantization scale code Q.sub.in,.sup.j for the
.sub.jth macroblock by the following equation (30): Q in , j = d in
, j .times. 31 r in ( 30 ) ##EQU10## where r.sub.in is a parameter
for controlling the response speed of a feedback loop, referred to
as a reaction parameter. The parameter r.sub.in is given by the
following equation (31) using the amount of codes (bit rate) of the
input bit stream: r in = 2 .times. input_bit .times. _rate
picture_rate ( 31 ) ##EQU11##
[0066] Initial values of the virtual buffers at the start of a
sequence are given by the following equations (32) to (34): d in ,
0 i = 10 .times. r in 31 ( 32 ) d in , 0 p = K p d in , 0 i ( 33 )
d in , 0 b = K b d in , 0 i ( 34 ) ##EQU12## where K.sub.p and
K.sub.b are calculated for the input compressed image information
(bit stream) as in the method used for controlling the amount of
codes of output compressed image information (bit stream).
[0067] At a step S33, the normalized activity calculating unit 22
determines a quantization scale code mquant.sub.in,j with visual
characteristics taken into consideration (quantization scale code
mquant.sub.in,j matching the visual characteristics) by an equation
(35) using the quantization step size Q_step_size.sub.in,j of the
input compressed image information (bit stream).
mquant.sub.in,j=Q_step_size.sub.in,j/2 (35)
[0068] At a step S34, the normalized activity calculating unit 22
calculates a normalized activity Nact.sub.j used within the frame
in code amount control at the time of coding the input compressed
image information (bit stream), by an equation (36) using the
quantization scale code mquant.sub.in,j determined by the equation
(35) in consideration of the visual characteristics and the
quantization scale code Q.sub.in,j for the .sub.jth macroblock
calculated from the virtual buffer occupancy quantity at the time
of coding the input bit stream. Nact j = mquant in , j Q in , j (
36 ) ##EQU13##
[0069] Such processing eliminates the conventional need for
restoring the compressed image information of the original image to
information of each pixel and determining the average value and the
variance of the luminescence pixel values to calculate a normalized
activity for each macroblock in requantization based on discrete
cosine transform coefficients.
[0070] Returning to the flowchart of FIG. 3, a description will be
made in the following.
[0071] After the normalized activity is calculated by the
normalized activity calculation processing at step S13, the
variable length decoding unit 13 rearranges the quantized discrete
cosine transform coefficients of the input compressed image
information as two-dimensional data on the basis of the scan method
in the order of a zigzag scan, as shown in FIG. 5A or an alternate
scan as shown in FIG. 5B, at a step S14. The variable length
decoding unit 13 then outputs the result to the inverse
quantization unit 14 together with the information of the
quantization width and the quantization matrix.
[0072] FIG. 5A and FIG. 5B show the scan order of 8.times.8
discrete cosine transform coefficients by numbers. Specifically, as
shown in FIG. 5A, the zigzag scan is performed sequentially from
"0" at the upper left of the figure to "1" to the right thereof,
and then to "2" to the lower left thereof, "3" thereunder, "4" to
the upper right thereof, "5" to the upper right thereof, "6" to the
right thereof, "7" to the lower left thereof, "8" to the lower left
thereof, . . . in that order until "63" at the far-right column and
the lowest row is reached. As shown in FIG. 5B, the alternate scan
is performed sequentially from "0" at the upper left of the figure
to "1" thereunder, and then to "2" thereunder, "3" thereunder, "4"
one column to the right and three rows above, "5" thereunder, "6"
to the upper right thereof, "7"thereunder, "8" to the lower left
thereof, "9" thereunder, "10" to the lower left thereof, . . . in
that order until "63" at the far-right column and the lowest row is
reached.
[0073] At a step S15, the inverse quantization unit 14 inversely
quantizes the quantized discrete cosine transform coefficients
inputted thereto on the basis of the quantization scale code and
the quantization matrix extracted by the compressed information
analyzing unit 12 and then outputs the result as discrete cosine
transform coefficients to the adder 15.
[0074] At a step S16, the adder 15 subtracts motion compensation
errors resulting from motion compensation error correcting
processing by the motion compensation error correcting unit 20,
which will be described later, from the discrete cosine transform
coefficients inputted from the inverse quantization unit 14. The
adder 15 then outputs the error-corrected discrete cosine transform
coefficients to the band limiting unit 16 and also to the adder
32.
[0075] At a step S17, the band limiting unit 16 cuts high-frequency
component coefficients in a horizontal direction in each block of
the error-corrected discrete cosine transform coefficients inputted
from the adder 15. In this case, the band limiting unit 16 cuts the
high-frequency component coefficients in the horizontal direction
of a luminescence signal and a color-difference signal separately.
Specifically, as to a luminescence signal, as shown in FIG. 6A, the
values of 8.times.6 discrete cosine transform coefficients (black
circles in the figure) or the horizontal-direction low-frequency
components of the 8.times.8 discrete cosine transform coefficients
are preserved, and the remaining values are replaced with zero (0).
As to a color-difference signal, as shown in FIG. 6B, the values of
8.times.4 discrete cosine transform coefficients (black circles in
the figure), or the horizontal-direction low-frequency components
of the 8.times.8 discrete cosine transform coefficients are
preserved, and the remaining values are replaced with zero (0). In
a case of the input compressed image information (bit stream) of an
interlaced scanning image, a band limitation in a vertical
direction is not placed because the band limitation, by including
information on the difference in time between fields in
high-frequency components in the vertical direction of the discrete
cosine transform coefficients in a frame discrete cosine transform
mode, leads to a significant degradation in image quality. As shown
in this example, by placing a greater band limitation on a
color-difference signal whose degradation is less visible to the
human eye than on a luminescence signal whose degradation is more
visible to the human eye, requantization distortion is reduced
while minimizing degradation in image quality.
[0076] It is to be noted that the processing of the band limiting
unit 16 may be performed by methods other than those illustrated in
FIGS. 6A and 6B; for example, instead of the replacement with zero
(0), the horizontal-direction high-frequency components of the
discrete cosine transform coefficients may be multiplied by a
weighting factor provided in advance so as to produce a similar
effect.
[0077] At a step S18, the code amount control unit 23 calculates a
quantization scale code Q.sub.out,j corresponding to compressed
image information with the target bit rate (16 Mbps in this case).
This processing is performed by the equation (17) in the
conventional method and, therefore a description of the processing
will be omitted.
[0078] At a step S19, the code amount control unit 23 determines a
quantization scale code mquant.sub.out,j corresponding to the
target bit rate in consideration of visual characteristics, using
the following equation (37) on the basis of the quantization scale
code Q.sub.out,j calculated by the processing of step S18 and the
normalized activity Nact.sub.j:
[0079] mquant.sub.out,j=Q.sub.out,j.times.Nact.sub.j (37)
[0080] At a step S20, the code amount control unit 23 determines
whether the quantization scale code mquant.sub.in,j of the input
compressed image information is smaller than the quantization scale
code mquant.sub.out,j calculated by the processing of step S19.
When the code amount control unit 23 determines that the
quantization scale code mquant.sub.in,j of the input, original
compressed image information is greater than the quantization scale
code mquant.sub.out,j, the processing proceeds to a step S21.
[0081] At step S21, the code amount control unit 23 outputs the
calculated quantization scale code mquant.sub.out,j to the
quantizing unit 17.
[0082] At step S22, the quantizing unit 17 quantizes the
band-limited compressed image information inputted from the band
limiting unit 16 on the basis of the quantization scale code
mquant.sub.out,j or the quantization scale code mquant.sub.in,j
inputted from the code amount control unit 23 and then outputs the
result to the variable length coding unit 18.
[0083] At step S23, the variable length coding unit 18
variable-length-codes a difference between a direct-current
component of the quantized discrete cosine transform coefficients
inputted thereto and a direct-current component coefficient of an
immediately preceding block, using the direct-current component
coefficient of the immediately preceding block as a predicted
value. The variable length coding unit 18 rearranges other
components into data in an one-dimensional arrangement on the basis
of the preset scan method (zigzag scan or alternate scan) and
performs variable length coding using a combination of numbers of
consecutive zero coefficients (runs) and non-zero coefficients
(levels), whereby the compressed image information inputted via the
code buffer 11 is converted into compressed image information with
the lower bit rate.
[0084] In this case, when coefficients subsequent in the scan order
within the block are zero, the variable length coding unit 18
outputs a code referred to as EOB (End of Block) to thereby end the
variable length coding of the block. When the coefficients of a
block in the input compressed image information (bit stream) are as
shown in FIG. 7A (in FIGS. 7A and 7B, black circles indicate
non-zero coefficients and white circles indicate zero
coefficients), for example, and discrete cosine transform
coefficients are cut as shown in FIG. 6A, the non-zero coefficients
are distributed as shown in FIG. 7B. When the discrete cosine
transform coefficients are subjected to variable length coding by
the zigzag scan, as shown in FIG. 5A, a last non-zero coefficient
is situated at a scan number of "50" (coefficient at a fifth column
from the left and a seventh row from the top in FIG. 5A). On the
other hand, when the scan method is changed and the discrete cosine
transform coefficients are subjected to variable length coding by
the alternate scan, as shown in FIG. 5B, the last non-zero
coefficient is situated at a scan number of "44" (coefficient at
the fifth column from the left and the seventh row from the top in
FIG. 5B). Thus, the EOB signal is set at the number detected by the
alternate scan rather than the zigzag scan. Therefore, a
correspondingly smaller value can be assigned to the quantization
width to reduce quantization distortion resulting from
requantization.
[0085] When the code amount control unit 23 determines at step S20
that the quantization scale code mquant.sub.in,j of the original
compressed image information is not greater than the quantization
scale code mquant.sub.out,j, the code amount control unit 23
outputs the quantization scale code mquant.sub.in,j to the
quantizing unit 17 at a step S24.
[0086] Specifically, when the code amount control unit 23
determines by the processing of step S20 that the quantization
scale code mquant.sub.in,j< the quantization scale code
mquant.sub.out,j, it means that a macroblock once quantized roughly
is requantized more finely. Distortion resulting from rough
quantization is not reduced by finer requantization. In addition, a
large amount of bits are used for the macroblock. This results in a
reduction of the bits assigned to other macroblocks, causing
further degradation in image quality. Thus, when the code amount
control unit 23 determines that the quantization scale code
mquant.sub.in,j< the quantization scale code mquant.sub.out,j,
the code amount control unit 23 uses the quantization scale code
mquant.sub.in,j rather than the quantization scale code
mquant.sub.out,j as the quantization scale code to be used in
quantization.
[0087] The motion compensation error correcting processing of the
motion compensation error correcting unit 20 will be described next
with reference to a flowchart of FIG. 8.
[0088] First, a cause of the occurrence of a motion compensation
error will be described. Supposing that a pixel value of an
original image is zero, consideration will be given to a reference
pixel value L(Q1) decoded with a quantization width Q1 of input
compressed image information (bit stream), which value corresponds
to the pixel value of zero, and a pixel value L(Q2) of a reference
image when decoded with a quantization width Q2 of compressed image
information (bit stream) after recoding, which value corresponds to
the pixel value of zero.
[0089] In the case of a pixel of an inter-macroblock of a P-picture
or a B-picture, the difference value 0-L(Q1) is subjected to
discrete cosine transform and then coded. However, the output
compressed image information (bit stream) with a reduced amount of
codes (bit rate) is decoded on the assumption that 0-L(Q2) has been
subjected to discrete cosine transform and coded. In this case,
supposing that the motion compensation error correcting processing
of the motion compensation error correcting unit 20 is not
performed, the quantization width Q1 is not equal to the
quantization width Q2 in general, and, therefore, the difference
value 0-L(Q1) and the difference value 0-L(Q2) are different from
each other. Such a phenomenon occurs in a P-picture and a
B-picture, thus causing a motion compensation error.
[0090] Moreover, degradation in picture quality occurring in a
P-picture is propagated to a subsequent P-picture and a B-picture
using the P-picture as a reference, thereby causing further
degradation in picture quality. According to such a principle,
there occurs a phenomenon (drift) in which as decoding of pictures
of a GOP proceeds toward a later stage, an accumulation of motion
compensation errors degrades picture quality, and good picture
quality is regained at the start of the next GOP. Thus, the motion
compensation error correcting processing makes compensation so as
to prevent the values of the quantization width Q1 and the
quantization width Q2 from becoming different from each other.
[0091] At a step S51, the inverse quantization unit 31 inversely
quantizes discrete cosine transform coefficients inputted from the
quantizing unit 17 on the basis of information on quantization
width and a quantization matrix stored in the information buffer 21
and then outputs the result to the adder 32. At a step S52, the
adder 32 subtracts discrete cosine transform coefficients inputted
from the adder 15 from the discrete cosine transform coefficients
inputted from the inverse quantization unit 31 and then outputs the
difference values to the inverse discrete cosine transform unit 33.
At a step S53, the inverse discrete cosine transform unit 33
subjects the difference values inputted thereto to inverse discrete
cosine transform processing and then stores the result as motion
compensation error correcting information in the video memory
34.
[0092] At a step S54, the motion compensative predicting unit 35
performs motion compensative prediction processing on the basis of
the error data in the video memory 34, motion vector information
and motion compensative prediction mode information (field motion
compensative prediction mode or frame motion compensative
prediction mode and forward prediction mode, backward prediction
mode, or bi-directional prediction mode) in input compressed image
information (bit stream). The motion compensative predicting unit
35 then outputs error correcting values in a spatial domain thereby
generated to the discrete cosine transform unit 36. At a step S55,
the discrete cosine transform unit 36 subjects the error correcting
values inputted from the motion compensative predicting unit 35 to
discrete cosine transform processing. The discrete cosine transform
unit 36 then outputs the result as error correcting values in a
frequency domain to the adder 15.
[0093] A fast algorithm, as illustrated in "A fast computational
algorithm for the discrete cosine transform" (IEEE Trans. Commun.,
vol. 25, no.9, pp. 1004-1009, 1977), for example, may be applied to
the processing of the inverse discrete cosine transform unit 33 and
the discrete cosine transform unit 36. Regarding the processing of
the inverse discrete cosine transform unit 33 and the discrete
cosine transform unit 36, since high-frequency component
coefficients of discrete cosine transform coefficients in the
horizontal direction are replaced with zero by the band limiting
unit 16, the inverse discrete cosine transform processing and the
discrete cosine transform processing for the high-frequency
component coefficients can be omitted to thereby reduce the amount
of computational processing. It is therefore possible to simplify
the configuration of the hardware for the computational
processing.
[0094] In addition, since the color-difference signal of an image
has a characteristic that degradation in the color-difference
signal is less visible to the human eye than degradation in a
luminescence signal of the image, by applying the above motion
compensation error correcting processing to only the luminescence
signal, the amount of computational processing can be reduced while
maintaining the degradation in image quality at a minimum level. It
is therefore possible to simplify the configuration of the hardware
for the computational processing.
[0095] Moreover, while an error in a P-picture is propagated to a
B-picture, an error in a B-picture is not further propagated. A
B-picture has a bi-directional prediction mode, thus requiring an
enormous amount of computational processing. Thus, by applying the
motion compensation error correcting processing only to P-pictures,
the amount of computational processing can be reduced while
maintaining the degradation in image quality at a minimum level. It
is therefore possible to simplify the configuration of the hardware
for the computational processing. In addition, by omitting
processing for B-pictures, it is possible to save the capacity of
the video memory 34.
[0096] Furthermore, in the example described above, all of the
8.times.8 discrete cosine transform coefficient components are used
as error correcting value components. In a particular case where
the discrete cosine transform mode is a frame DCT mode and the scan
method of input compressed image information (bit stream) is the
interlaced scanning method, the motion compensation error
correcting processing of the motion compensation error correcting
unit 20 omits errors of high-frequency components in the vertical
direction, thereby causing degradation in image quality. However,
it is known that the omission of four high-frequency components in
the horizontal direction does not cause visually affecting
degradation in image quality. By taking advantage of this fact, the
amount of computational processing can be reduced while maintaining
the degradation in image quality at a minimum level. It is
therefore possible to simplify the configuration of the hardware
for the computational processing. In addition, it is possible to
save the capacity of the video memory 34.
[0097] Thus, the inverse discrete cosine transform unit 33 and the
discrete cosine transform unit 36 may perform usual eighth-order
processing in the vertical direction and perform processing in the
horizontal direction using only fourth-order coefficients, which
are low-frequency components. This reduces the horizontal-direction
resolution of the video memory 34 to 1/2 to thereby save the
capacity of the video memory 34.
[0098] In this case, however, the motion compensative predicting
unit 35 requires motion compensation processing with a 1/4-pixel
precision. As shown in FIGS. 9A, 9B, 9C, and 9D, this processing
can sufficiently suppress degradation in image quality resulting
from motion compensation errors by performing linear interpolation
according to the value of a motion vector in compressed image
information (bit stream). Processing for the horizontal direction
includes the following means.
[0099] When there are pixels of original motion vectors mv.sub.1 to
mv.sub.4, as shown in FIG. 9A, and the pixels are situated at a
position displaced to the right in the horizontal direction by 1/4
of the distance between the pixels as shown in FIG. 9B, a motion
vector mv.sub.1' is calculated, as shown in the following equation
(38): mv 1 ' = 3 4 .times. mv 1 + 1 4 .times. mv 2 ( 38 )
##EQU14##
[0100] Thus, by a weighted average corresponding to the different
positions, the motion vector at the position displaced by a 1/4
pixel is calculated. Similarly, in the case of a position displaced
to the right by the distance of a 2/4 (=1/2) pixel as shown in FIG.
9C, a motion vector mv.sub.1'' is calculated as shown in the
following equation (39): mv 1 '' = 1 2 .times. mv 1 + 1 2 .times.
mv 2 ( 39 ) ##EQU15##
[0101] Similarly, in a case of a position displaced to the right by
a distance of a 3/4 pixel as shown in FIG. 9D, a motion vector
mv.sub.1''' is calculated as shown in the following equation (40):
mv 1 ''' = 1 4 .times. mv 1 + 3 4 .times. mv 2 ( 40 ) ##EQU16##
[0102] Thus, with the above processing, the inverse discrete cosine
transform unit 33 subjects only low-frequency fourth-order
coefficients of eighth-order discrete cosine coefficients to
fourth-order inverse discrete cosine transform, and the discrete
cosine transform unit 36 subjects 8.times.8 error correcting values
of each block in a pixel domain generated by motion compensation
based on the error data from the video memory 34 to fourth-order
discrete cosine transform processing omitting high-frequency
components in the horizontal direction, and thereby outputs error
correcting values in a 4.times.8 frequency domain.
[0103] By using a fast algorithm for the fourth-order, inverse
discrete cosine transform processing and discrete cosine transform
processing on the horizontal low-frequency components, it is
possible to further reduce the amount of processing. FIG. 10
illustrates a method based on a Wang's algorithm (Zhone de Wang.,
"Fast Algorithms for the Discrete W Transform and for the Discrete
Fourier Transform," IEEE Tr. ASSP-32, No. 4, pp. 803-816, Aug.
1984), which is an example of a fast algorithm. In FIG. 10,
processing using F(0) to F(3) as an input value and f(0) to f(3) as
an output value realizes an inverse discrete cosine transform,
while processing using f(0) to f(3) as an input value and F(0) to
F(3) as an output realizes a discrete cosine transform. In this
case, operators A to D are defined as expressed by the following
equations (41) to (44): A = 1 2 ( 41 ) B = - cos .function. ( .pi.
8 ) + cos .function. ( 3 .times. .pi. 8 ) ( 42 ) C = cos .times. (
.pi. 8 ) + cos .function. ( 3 .times. .pi. 8 ) ( 43 ) D = cos
.function. ( 3 .times. .pi. 8 ) ( 44 ) ##EQU17##
[0104] In performing the inverse discrete cosine transform,
operations of the following equations (45) to (48) are performed:
f(0)=(F(0)+F(2)).times.A+F(1).times.C+(F(1)-F(3)).times.D (45)
f(1)=(F(0)-F(2)).times.A+F(3).times.B-(F(1)-F(3)).times.D (46)
f(2)=(F(0)-F(2)).times.A-F(3).times.B+(F(1)-F(3)).times.D (47)
f(3)=(F(0)+F(2)).times.A-F(1).times.C-(F(1)-F(3)).times.D (48)
[0105] In performing the discrete cosine transform, the inputs and
the outputs are interchanged with each other, and the operations of
the equations (45) to (48) are performed.
[0106] It is known that, in general, degradation in a
color-difference signal is less visible to the human eye than
degradation in a luminescence signal. Thus, as shown in FIG. 11,
the inverse discrete cosine transform unit 33 and the discrete
cosine transform unit 36 may use, for correction, only
low-frequency component coefficients in the vertical direction (for
example, 4.times.4) of error correcting components of a
color-difference signal of 4.times.8 error correcting signals, as
mentioned above, and replace remaining high-frequency components
with zero, thereby to reduce further the amount of computational
processing involved in error correction and thus further reduce the
amount of processing. In the figure, black circles indicate
low-frequency components, and white circles indicate high-frequency
components.
[0107] As described above, when calculating a quantization scale
code in processing for converting the bit rate of compressed image
information, it is not necessary to perform variance operations as
expressed by the equations (22) to (24) on each pixel to obtain the
normalized activity. Therefore, it is possible to reduce the amount
of computational processing and thereby increase the processing
speed.
[0108] The series of processing steps described above can be
carried out not only by hardware but also by software. When the
series of processing steps is to be carried out by software, a
program forming the software is installed from a program storing
medium onto a computer that is incorporated in special hardware or
a general purpose personal computer that can perform various
functions by installing various programs thereon, for example.
[0109] FIG. 12 shows a configuration of an embodiment of a personal
computer when the transcoder 1 is realized by software. A CPU 101
of the personal computer controls the entire operation of the
personal computer. When a command is inputted from an input unit
106 formed by a keyboard, a mouse and the like by a user via a bus
104 and an input/output interface 105, the CPU 101 executes a
program stored in a ROM (Read Only Memory) 102 in response to the
command. Alternatively, the CPU 101 loads into a RAM (Random Access
Memory) 103 a program that has been read from a magnetic disk 111,
an optical disk 112, a magneto-optical disk 113, or a semiconductor
memory 114 connected to a drive 110 and that has been installed in
a memory unit 108, and then the CPU 101 executes the program. The
functions of the above-described image processing apparatus 1 are
thus realized by software. The CPU 101 also controls a
communication unit 109 to communicate and exchange data with the
outside.
[0110] As shown in FIG. 12, the program storing medium having a
program recorded thereon is not only formed by packaged media
distributed to users to provide the program separately from the
computer, the packaged media being formed by the magnetic disk 111
(including a flexible disk), the optical disk 112 (including CD-ROM
(Compact Disk-Read Only Memory), and DVD (Digital Versatile Disk)),
the magneto-optical disk 113 (including MD (Mini-Disk)), the
semiconductor memory 114 or the like having the program recorded
thereon, but also it is formed by the ROM 102, a hard disk included
in the memory unit 108 or the like which has the program recorded
thereon and is provided to the user in a state of being
preincorporated in the computer.
[0111] It is to be noted that in the present specification, the
steps describing the program recorded on the program storing medium
include not only processing carried out in time series in the
described order but also processing carried out in parallel or
individually and not necessarily in time series.
[0112] The image processing apparatus and method and the program
according to the present invention calculate a first quantization
scale code required to code image data at a first bit rate on the
basis of the image data coded at the first bit rate, calculate a
second quantization scale code obtained by subjecting the first
quantization scale code to adaptive quantization according to
visual characteristics on the basis of the image data coded at the
first bit rate, calculate a normalized activity on the basis of the
first quantization scale code and the second quantization scale
code, calculate a third quantization scale code required to code
the image data at a second bit rate, and calculate a fourth
quantization scale code obtained by subjecting the third
quantization scale code to adaptive quantization according to
visual characteristics on the basis of the third quantization scale
code and the normalized activity. It is therefore possible to
realize high-speed processing for converting image information
compressed by an orthogonal transform and motion compensation and
coded into image data with a lower bit rate.
[0113] While the preferred embodiments of the present invention
have been described using specific terms, such description is for
illustrative purposes only, and it is to be understood that changes
and variations may be made without departing from the spirit or
scope of the following claims.
* * * * *