U.S. patent application number 10/473441 was filed with the patent office on 2004-07-08 for image processing apparatus, image processing method, image processing program, and recording medium.
Invention is credited to Sugiyama, Akira, Togashi, Haruo.
Application Number | 20040131116 10/473441 |
Document ID | / |
Family ID | 26612508 |
Filed Date | 2004-07-08 |
United States Patent
Application |
20040131116 |
Kind Code |
A1 |
Sugiyama, Akira ; et
al. |
July 8, 2004 |
Image processing apparatus, image processing method, image
processing program, and recording medium
Abstract
In the case that the amount of a code generated in one frame is
controlled, when the picture quality is optimized using an adaptive
quantization, even if a scene change occurs between two frames, the
adaptive quantization can be accurately performed. When data for
eight lines that compose a macro block of only a first field of an
N-th picture is stored in a buffer, the calculation of activities
of macro blocks of the first field is started. Corresponding to the
activities of the first field, an average activity is calculated.
The average activity of the first field is applied to the first
field and the second field of the N-th picture. As a result, a
normalized activity of the N-th picture is calculated.
Corresponding to the normalized activity, a quantizer scale mqaunt
in consideration of a visual characteristic is calculated. Thus,
the N-th picture is adaptively quantized. Since the normalized
activity is calculated corresponding to the own data, even if a
scene change occurs, the picture quality of the following picture
does not deteriorate. In addition, since only the first field is
used for the calculation, the system delay can be suppressed to a
small value. In addition, the capacity of the buffer memory can be
reduced.
Inventors: |
Sugiyama, Akira; (Kanagawa,
JP) ; Togashi, Haruo; (Kanagawa, JP) |
Correspondence
Address: |
William S Frommer
Frommer Lawrence & Haug
745 Fifth Avenue
New York
NY
10151
US
|
Family ID: |
26612508 |
Appl. No.: |
10/473441 |
Filed: |
September 26, 2003 |
PCT Filed: |
March 28, 2002 |
PCT NO: |
PCT/JP02/03060 |
Current U.S.
Class: |
375/240.03 ;
348/439.1; 375/240.24; 375/E7.129; 375/E7.139; 375/E7.14;
375/E7.143; 375/E7.15; 375/E7.156; 375/E7.163; 375/E7.176;
375/E7.181; 375/E7.199; 375/E7.211; 375/E7.217; 375/E7.274 |
Current CPC
Class: |
H04N 19/172 20141101;
H04N 19/112 20141101; H04N 19/126 20141101; H04N 21/23602 20130101;
H04N 19/46 20141101; H04N 19/15 20141101; H04N 19/124 20141101;
H04N 21/4342 20130101; H04N 19/176 20141101; H04N 19/61 20141101;
H04N 19/70 20141101; H04N 19/122 20141101; H04N 19/137
20141101 |
Class at
Publication: |
375/240.03 ;
348/439.1; 375/240.24 |
International
Class: |
H04N 007/12 |
Foreign Application Data
Date |
Code |
Application Number |
Mar 29, 2001 |
JP |
2001-95298 |
May 25, 2001 |
JP |
2001-156818 |
Claims
1. A picture processing apparatus, comprising: average activity
calculating means for calculating an average activity with a first
field of picture data; normalized activity calculating means for
applying the average activity calculated with the first field by
the average activity calculating means to the first field and a
second field of the same frame as the first frame so as to
calculate a normalized activity; and quantizing means for
quantizing the first field and the second field with the normalized
activity calculated by the normalized activity calculating
means.
2. The picture processing apparatus as set forth in claim 1,
wherein the average activity calculating means starts calculating
the activity when minimum picture data for contracting one of
blocks into which the picture data is divided is input, cumulates
the activity in the first field, and calculates the average
activity.
3. The picture processing apparatus as set forth in claim 1,
wherein the activity calculating means calculates the activity with
data corresponding to a macro block of only the first field, the
macro block being dealt in the MPEG.
4. A picture processing method, comprising the steps of:
calculating an average activity with a first field of picture data;
applying the average activity calculated with the first field at
the average activity calculating step to the first field and a
second field of the same frame as the first frame so as to
calculate a normalized activity; and quantizing the first field and
the second field with the normalized activity calculated at the
normalized activity calculating step.
5. A picture processing program for causing a computer apparatus to
execute a picture processing method for quantizing picture data,
the picture processing method, comprising the steps of: calculating
an average activity with a first field of picture data; applying
the average activity calculated with the first field at the average
activity calculating step to the first field and a second field of
the same frame as the first frame so as to calculate a normalized
activity; and quantizing the first field and the second field with
the normalized activity calculated at the normalized activity
calculating step.
6. A recording medium on which a picture processing program has
been recorded, the picture processing program for causing a
computer apparatus to execute a picture processing method for
quantizing picture data, the picture processing method, comprising
the steps of: calculating an average activity with a first field of
picture data; applying the average activity calculated with the
first field at the average activity calculating step to the first
field and a second field of the same frame as the first frame so as
to calculate a normalized activity; and quantizing the first field
and the second field with the normalized activity calculated at the
normalized activity calculating step.
Description
TECHNICAL FIELD
[0001] The present invention relates to a picture processing
apparatus, a picture processing method, a picture processing
program, and a recording medium for controlling a generated code
amount in compression encoding for a picture signal using
quantization of each block of the picture signal so that the code
amount of each frame does not exceed a predetermined amount.
BACKGROUND ART
[0002] In a conventional picture data compression-encoding system,
picture data is quantized in the unit of one block that is composed
of a predetermined number of pixels. For example, in the MPEG2
(Moving Pictures Experts Group 2), such a compression-encoding
system is used. In the MPEG2, picture data is transformed in the
unit of one block that is composed of a plurality of pixels by the
DCT (Discrete Cosine Transform) method. The obtained DCT
coefficients are quantized. As a result, the picture data has been
compression-encoded. In the MPEG2, a quantizer step is designated
with a quantizer scale.
[0003] In a conventional picture quality optimizing method using
the compression-encoding of the MPEG2, an activity that is a
coefficient that represents the complexity and smoothness of a
picture to be compressed is calculated. With an adaptive
quantization based on the activity, the picture quality is
optimized.
[0004] This method is performed as follows. In a simple and smooth
picture region, when the compressing process is performed, the
deterioration of the picture quality is remarkable. In this region
(referred to as plane region), the picture data is finely quantized
with a quantizer scale whose quantizer step is small. In contrast,
in a complicated picture region, when the compressing process is
performed, the deterioration of the picture quality is not
remarkable. In this region, the picture data is coarsely quantized
with a quantizer scale whose quantizer step is large. Thus, with a
limited code amount, the picture quality can be effectively
optimized.
[0005] When picture data is compressed, as described above, each
picture region is divided into pixel blocks each having a
predetermined size. The picture data is quantized and
DCT-transformed in the unit of one pixel block. The MPEG2 standard
prescribes a block of eight pixels.times.eight lines as the minimum
process unit. In addition, the MPEG2 standard prescribes that each
block of eight pixels.times.eight lines should be DCT-transformed
and that the resultant DCT coefficients are quantized in the unit
of a macro block of 16 pixels.times.16 lines.
[0006] On the other hand, although the MPEG2 standard does not
clearly prescribe the unit for calculating the forgoing activity,
however, the MPEG2 TM5 (Test Model 5) has proposed that the
activity should be processed in the unit of one sub block of eight
pixels.times.eight lines, which is the same as one DCT block.
[0007] Next, the activity calculating method in "adaptive
quantization considering visual characteristic" that has been
proposed in the MPEG2 TM5 will be described.
[0008] The adaptive quantization is to vary a quantizer scale Qj
that depends on the state of a picture with an activity of each
macro block so as to control the generated code amount of for
example one frame and improve the picture quality. The quantizer
scale Qj is varied with the activity so that in a plane region,
where the deterioration of the picture quality is remarkable, the
picture data is quantized with the quantizer scale Qj whose
quantizer step is small and in a complicated picture region, where
the deterioration of the picture quality is not remarkable, the
picture data is quantized with the quantizer scale Qj whose
quantizer step is large.
[0009] The activity is obtained with pixel values of a luminance
signal of an original picture rather than a predictively error. For
example, the activity act.sub.j is obtained by calculating the
following Formulas (1) to (3) in the reverse order for the j-th
macro block with four blocks of the frame DCT encoding mode and
four blocks of the field DCT encoding mode.
act.sub.j=1+min[sblk=1, 8](var.sub.--sblk) (1)
var.sub.--sblk={fraction (1/64)}.SIGMA.[k=1, 64](Pk-Pavg).sup.2
(2)
Pavg={fraction (1/64)}.SIGMA.[k=1, 64]P.sub.k (3)
[0010] where P.sub.k is a pixel value of a block of a luminance
signal of the original picture. In Formula (3), 64 pixel values of
a block of 8.times.8 are summed and the result is divided by 64. As
a result, the average value Pavg of the pixel value P.sub.k of the
block is obtained. Next, in Formula (2), the difference between the
average value Pavg and the pixel value P.sub.k is obtained. As a
result, the average difference value var_sblk of the block of
8.times.8 is calculated. In Formula (1), with the minimum value of
the average difference values var_sblk, the activity act.sub.j of
the j-th macro block is obtained. The minimum value is used because
even if a part of the macro block contains a plain portion, it is
necessary to finely quantize the macro block.
[0011] In the MPEG2 TM5, a normalized activity Nact.sub.j that has
values in the range from "2.0" to "0.5" is obtained from activities
act.sub.j of macro blocks corresponding to the following Formula
(4).
Nact.sub.j=(2.times.act.sub.j+avg.sub.--act)/(act.sub.j+2.times.avg.sub.---
act) (4)
[0012] where "avg_act" represents an average value (average
activity) of activities act.sub.j of an encoded frame immediately
preceded by a frame (picture) that is currently being
processed.
[0013] A quantizer scale mquant.sub.j that considers a visual
characteristic is given by the following Formula (5) corresponding
to a quantizer scale Q.sub.j that is obtained for controlling a
generated code amount of one frame.
mquant.sub.j=Q.sub.j.times.Nact.sub.j (5)
[0014] When each macro block is quantized with such a quantizer
scale mquant.sub.j, while the code amount of one whole frame is
kept in a predetermined range, each macro block is optimally
quantized corresponding to flatness and complexity of a picture of
the frame. As a result, while a limited code amount is effectively
used, the picture is effectively compressed with much suppression
of the picture quality.
[0015] However, if there is a scene change between two frames,
pictures largely change before and after it. Thus, the correlation
between the frames is lost.
[0016] FIG. 1A, FIG. 1B, FIG. 1C, and FIG. 1D show an example of a
process for obtaining a normalized activity Nact.sub.j according to
related art. FIG. 1A shows a frame signal of which one frame is
composed of one set of a low region and a high region. FIG. 1B
shows a picture that is input. One picture corresponds to one
frame. In the interlace scanning, one frame is composed of two
fields that are a first field as a first half portion and a second
field as a second half portion.
[0017] Next, the case that a normalized activity Nact.sub.j for an
input picture will be described. As was described above, in the
MPEG2, to process a picture signal in the unit of one block, an
input picture signal that is input in the unit of one line should
be converted into block so that each block has a predetermined
pixel size. In the case that one block is composed of 16
pixels.times.16 lines, when data of first eight lines of a second
field of a frame has been input, a first block of the frame can be
formed. Thus, to obtain a normalized activity Nact.sub.j for an
input picture, as shown in FIG. 1C, a delay for at least 1 field
(0.5 frame)+8 lines has occurred.
[0018] With the delay for 0.5 field+8 lines of the input picture,
an average activity avg_act of an (N-1)-th picture is calculated
with one frame, namely a first field and a second field, of the
(N-1)-th picture. A quantizer scale mquant.sub.j, which was
obtained corresponding to the forgoing Formula (5), is calculated
with the normalized activity Nact.sub.j obtained with the average
activity avg_act of the (N-1)-th picture. With the quantizer scale
mquant.sub.j, the next N-th picture is quantized. As a result, an
output picture is generated (see FIG. 1D).
[0019] Now, it is considered that a scene change has occurred when
an N-th picture has changed to an (N+1)-th picture of input
pictures. In this case, as shown in FIG. 1D, the (N+1)-th picture,
which has been input after the scene change, is quantized with the
quantizer scale mquant.sub.j obtained with the normalized activity
Nact.sub.j obtained with the average activity avg_act of the
picture (N-th picture), which has been output before the scene
change. The resultant quantized picture is output.
[0020] In the method described in the section of the Background
Art, to obtain the normalized activity Nact.sub.j, the average
activity avg_act of the immediately preceding frame that has been
encoded is used. When the correlation between two frames is lost
due to for example a frame change, if an activity of a frame that
has been input after the scene change is normalized with the
average activity avg_act obtained with a frame that has not been
input before the scene change, the average activity avg_act used to
normalize the frame that has been input after the scene change
becomes different from the average activity avg_act of the frame
that has been input before the scene change. Thus, the activity of
the frame that has been input after the scene change cannot be
properly optimized. As a result, the picture quality
deteriorates.
[0021] Therefore, an object of the present invention is to provide
a picture processing apparatus, a picture processing method, a
picture processing program, and a recording medium that allow
pictures to be accurately adaptively quantized so as to control a
generated code amount of one frame even if a scene change occurs
between frames and optimize the picture quality using the adaptive
quantization.
DISCLOSURE OF THE INVENTION
[0022] To solve the forgoing problem, the present invention is a
picture processing apparatus, comprising average activity
calculating means for calculating an average activity with a first
field of picture data; normalized activity calculating means for
applying the average activity calculated with the first field by
the average activity calculating means to the first field and a
second field of the same frame as the first frame so as to
calculate a normalized activity; and quantizing means for
quantizing the first field and the second field with the normalized
activity calculated by the normalized activity calculating
means.
[0023] In addition, the present invention is a picture processing
method, comprising the steps of calculating an average activity
with a first field of picture data; applying the average activity
calculated with the first field at the average activity calculating
step to the first field and a second field of the same frame as the
first frame so as to calculate a normalized activity; and
quantizing the first field and the second field with the normalized
activity calculated at the normalized activity calculating
step.
[0024] In addition, the present invention is a picture processing
program for causing a computer apparatus to execute a picture
processing method for quantizing picture data, the picture
processing method, comprising the steps of calculating an average
activity with a first field of picture data; applying the average
activity calculated with the first field at the average activity
calculating step to the first field and a second field of the same
frame as the first frame so as to calculate a normalized activity;
and quantizing the first field and the second field with the
normalized activity calculated at the normalized activity
calculating step.
[0025] The present invention is a recording medium on which a
picture processing program has been recorded, the picture
processing program for causing a computer apparatus to execute a
picture processing method for quantizing picture data, the picture
processing method, comprising the steps of calculating an average
activity with a first field of picture data; applying the average
activity calculated with the first field at the average activity
calculating step to the first field and a second field of the same
frame as the first frame so as to calculate a normalized activity;
and quantizing the first field and the second field with the
normalized activity calculated at the normalized activity
calculating step.
[0026] As was described above, according to the present invention,
since an average activity calculated with a first field of picture
data is used to quantize a first field and a second field of the
same frame as the first frame with a normalized activity calculated
with the first field and the second field, picture data of one
frame can be quantized with a normalized activity of own picture
data with a small system delay.
BRIEF DESCRIPTION OF DRAWINGS
[0027] FIG. 1A, FIG. 1B, FIG. 1C, and FIG. 1D show a time chart
that represents an example of a process for obtaining a normalized
activity Nact.sub.j according to background art;
[0028] FIG. 2A, FIG. 2B, FIG. 2C, FIG. 2D, FIG. 2E, and FIG. 2F
show a time chart that represents an example of a process for
obtaining a normalized activity Nact.sub.j according to a first
embodiment of the present invention;
[0029] FIG. 3A, FIG. 3B, FIG. 3C, FIG. 3D, FIG. 3E, and FIG. 3F
show a time chart that represents an example of a process for
pre-calculating an average activity avg_act of a frame to be
normalized and normalizing an activity of the frame;
[0030] FIG. 4A and FIG. 4B show a block diagram that represents an
example of the structure of a digital VTR according to an
embodiment of the present invention;
[0031] FIG. 5A, FIG. 5B, and FIG. 5C show a block diagram that
represents in reality an example of the structure of an MPEG
encoder;
[0032] FIG. 6A, FIG. 6B, and FIG. 6C show schematic diagrams that
represent examples of the structures of streams transferred by each
portion of the MPEG encoder;
[0033] FIG. 7A, FIG. 7B, and FIG. 7C show schematic diagrams that
represent examples of the structures of streams transferred by each
portion of the MPEG encoder;
[0034] FIG. 8A and FIG. 8B show schematic diagrams that represent
examples of the structures of streams transferred by each portion
of the MPEG encoder;
[0035] FIG. 9A and FIG. 9B show schematic diagrams that represent
examples of the structures of streams transferred by each portion
of the MPEG encoder;
[0036] FIG. 10A, FIG. 10B, and FIG. 10C show schematic diagrams
that represent examples of the structures of streams transferred by
each portion of the MPEG encoder;
[0037] FIG. 11A, FIG. 11B, and FIG. 11C show schematic diagrams
that represent examples of the structures of streams transferred by
each portion of the MPEG encoder;
[0038] FIG. 12 shows a schematic diagram that represents an example
of the structure of a stream transferred by each portion of the
MPEG encoder;
[0039] FIG. 13 shows a flow chart that represents an example of
which a process of the MPEG encoder is implemented by software;
and
[0040] FIG. 14 shows a schematic diagram for describing a block
segmentation for calculating an activity according to an embodiment
of the present invention.
BEST MODES FOR CARRYING OUT THE INVENTION
[0041] Next, with reference to the accompanying drawings, the
present invention will be described. According to the present
invention, a normalized activity Nact.sub.j of a frame is
calculated with an average activity avg_act obtained from picture
data of a first field of the frame. According to an embodiment of
the present invention, the average activity avg_act and the
normalized activity Nact.sub.j can be also calculated in the method
corresponding to Formula (1) to Formula (5) described in the
section of the Related Art.
[0042] FIG. 2A, FIG. 2B, FIG. 2C, FIG. 2D, FIG. 2E, and FIG. 2F
show an example of a process for obtaining a normalized activity
Nact.sub.j according to a first embodiment of the present
invention. FIG. 2A shows a frame signal of which one frame is
composed of one set of a low region and a high region. FIG. 2B
shows input pictures. One picture corresponds to one frame. In the
interlace scanning, one frame is composed of two fields that are a
first field as a first half portion and a second field as a second
half portion. Now, a process for an N-th picture will be
considered. In addition, it is assumed that an average activity
avg_act is calculated in the unit of one macro block of 16
pixels.times.16 lines.
[0043] Since the average activity avg_act is calculated with data
of only a first field, when first eight lines of the first field
have been input and stored in a buffer, data of a macro block of
the first field is obtained. At that point, the calculation of the
average activity avg_act can be started. Thus, the delay after the
N-th picture is input until the calculation is started is at least
eight lines as shown in FIG. 2C.
[0044] When all data of a first field of an N-th picture has been
input and then an average activity avg_act of the first field has
been calculated, the calculated result is applied to the first and
second fields. As a result, a normalized activity Nact.sub.j of the
N-th picture is obtained (see FIG. 2E). With the normalized
activity Nact.sub.j and a quantizer scale Q.sub.j obtained in a
method that will be described later, the calculation corresponding
to Formula (5) is performed. As a result, a quantizer scale
mqaunt.sub.j that considers a visual characteristic is calculated.
With the quantizer scale mqaunt.sub.j, the N-th picture is
quantized.
[0045] Since the normalized activity Nact.sub.j of the N-th picture
is obtained only from the average activity avg_act of the first
field of the N-th picture, as shown in FIG. 2F, the delay after the
N-th picture is input until the N-th picture is adaptively
quantized with the quantizer scale mqauntj, which considers the
visual characteristic, is performed becomes 0.5 frame+.alpha..
[0046] Now, it is supposed that a scene change has occurred when
the N-th picture has changed to the (N+1)-th picture. According to
the present invention, as was described above, with the average
activity avg_act calculated with the first field of the N-th
picture, the normalized activity Nact.sub.j of the N-th picture is
obtained. In other words, unlike with the description in the
section of the Related Art, according to the present invention,
when the normalized activity Nact.sub.j is obtained, since an
average activity of the immediately preceding frame is not used,
even if a scene change occurs between two frames, the normalized
activity Nact.sub.j can be properly calculated. Thus, immediately
after a scene change has occurred, the picture quality of a picture
that is input after a scene change can be prevented from
deteriorating. Consequently, when pictures are edited for example
frame by frame, a good picture quality is obtained.
[0047] Alternatively, an average activity avg_act of a frame to be
normalized may be pre-calculated. With the pre-calculated average
activity avg_act, the activity of the frame may be normalized. In
this method, regardless of occurrence of a scene change between two
frames, with the optimum average activity avg_act, the activity can
be normalized.
[0048] FIG. 3A, FIG. 3B, FIG. 3C, FIG. 3D, FIG. 3E, and FIG. 3F
show an example of a process for pre-calculating an average
activity avg_act of a frame to be normalized and then normalizing
an activity of the frame with the pre-calculated average activity
avg_act. The process shown in FIG. 3A and FIG. 3B is the same as
the process shown in FIG. 1A and FIG. 1B. In the example shown in
FIG. 3, as shown in FIG. 3D, when all data of a picture (for
example, an N-th picture) to be quantized, namely a first field and
a second field of the N-th picture, has been stored in a buffer,
with the data stored in the buffer, the average activity avg_act of
the picture is calculated.
[0049] In this case, with a delay for at least 0.5 field+8 lines
after the picture (N-th picture) has been input, the average
activity avg_act is calculated (see FIG. 3C and FIG. 3D). With the
obtained average activity avg_act of the picture, the normalized
activity Nact.sub.j of the own picture is obtained.
[0050] Since the normalized activity Nact.sub.j of the picture is
obtained with the average activity avg_act calculated with the own
data, even if a scene change occurs immediately after for example
the N-th picture, the normalized activity Nact.sub.j can be
properly calculated (see FIG. 3E).
[0051] However, in this case, after the average activity avg_act
has been calculated with all data of the first and second fields of
the picture, the normalization of the activity of the picture is
started. Thus, a delay (system delay) after the average activity
avg_act is calculated until the activity of the picture is
normalized becomes large.
[0052] In the example shown in FIG. 3A, FIG. 3B, FIG. 3C, FIG. 3D,
FIG. 3E, and FIG. 3F, the calculation of the average activity
avg_act is started with a delay for 0.5 frame+8 lines after the
N-th picture is input. In addition, after the average activity
avg_act of all the first and second fields of the N-th picture has
been calculated, since the activity is normalized, a delay for at
least one frame is added. In other words, as shown in FIG. 3F, the
total delay amount (system delay) becomes at least 1.5
frames+.alpha..
[0053] When the system delay becomes large, although it is not
suitable for a real-time editing operation, it seems that the delay
adversely affects a video unit used in a broadcasting station. In
addition, after the average activity avg_act for one picture has
been calculated, since the picture is normalized, a buffer memory
that stores data of the first and second fields of the picture is
required. As a result, the capacity of the required buffer memory
becomes large.
[0054] In contrast, in the method according to the present
invention described with reference to FIG. 2A, FIG. 2B, FIG. 2C,
FIG. 2D, FIG. 2E, and FIG. 2F, since an average activity is
calculated only with a first field, the calculation time becomes
shorter than that for the case that an average activity is
calculated with a frame (two fields) whose average activity is to
be calculated. As a result, the system delay can be shortened.
Consequently, the time period necessary for the picture compressing
process can be shortened. Thus, the time restriction for each
operation and each process for a VTR, an encoder, and so forth can
be alleviated.
[0055] Unless a special editing process is performed, a scene
change in a VTR or the like occurs in each frame. Thus, when a
frame is processed with a first field thereof, the picture quality
can be prevented from being affected by a scene change.
[0056] Next, an embodiment of the present invention will be
described. The embodiment that follows is a preferred embodiment of
the present invention. Although the embodiment contains many
technically preferred limitations, it should be noted that the
scope of the present invention is not limited to such an embodiment
unless otherwise specified.
[0057] FIG. 4A and FIG. 4B show an example of a structure of a
digital VTR according to an embodiment of the present invention.
The digital VTR can directly record a digital video signal that has
been compressed and encoded corresponding to the MPEG system onto a
recording medium.
[0058] First of all, the structure and the processing operation of
a recording system of the digital VTR will be described. Signals
that are input from the outside to the recording system are two
types of serial digital interface signals that are an SDI (Serial
Data Interface) signal and an SDTI (Serial Data Transport
Interface) signal, an analog interface signal, and an external
reference signal REF that is a control signal.
[0059] The SDI is an interface prescribed by the SMPTE so as to
transmit a (4:2:2) component video signal, a digital audio signal,
and additional data. The SDTI is an interface through which an MPEG
elementary stream (referred to as MPEG ES) that is a stream of
which a digital video signal has been compression-encoded according
to the MPEG system. The ES is 4:2:2 components. As described above,
the ES is a stream of all I pictures having the relation of 1 GOP=1
picture. In the SDTI-CP (Content Package) format, the MPEG ES is
separated into access units and packed to a packet in the unit of
one frame. The SDTI-CP uses a sufficient transmission band (clock
rate: 27 MHz or 36 MHz or stream bit rate: 270 M bps or 360 M bps).
In one frame period, the ES can be transmitted as a burst.
[0060] The SDI signal transmitted through the SDI is input to an
SDI input portion 101. An analog input signal as an analog video
signal is input to an analog input portion 120. The analog input
portion 120 converts the analog input signal into a digital signal,
maps the digital signal to for example the aforementioned SDI
format, and outputs the resultant SDI signal. The SDI signal, where
the analog input signal has been converted and mapped to the SDI
format, is supplied to the SDI input portion 101.
[0061] The SDI input portion 101 converts the supplied SDI signal
as a serial signal into a parallel signal. In addition, the SDI
input portion 101 extracts an input synchronous signal as an input
phase reference from the SDI signal and outputs the extracted input
synchronous signal to a timing generator TG 102.
[0062] In addition, the SDI input portion 101 separates a video
signal and an audio signal from the converted parallel signal. The
separated video input signal and audio signal are output to an MPEG
encoder 103 and a delay circuit 104, respectively.
[0063] The timing generator TG 102 extracts a reference synchronous
signal from an input external reference signal REF. In
synchronization with a designated one of the reference synchronous
signal or the input synchronous signal, which has been supplied
from the SDI input portion 101, the timing generator TG 102
generates a timing signal necessary for the digital VTR and
supplies it as timing pulses to each block.
[0064] The MPEG encoder 103 converts the input video signal into
coefficient data according to the DCT method, quantizes the
coefficient data, and encodes the quantized data with a variable
length code. Variable-length-coded (VLC) data, which has been
output from the MPEG encoder 103, is an elementary stream (ES)
according to the MPEG2. The elementary stream, which has been
output from the MPEG encoder 103, is supplied to one of input
terminals of a recording side multi-format converter (hereinafter
referred to as recording side MFC).
[0065] The delay circuit 104 functions as a delay line that delays
the input audio signal as non-compressed signal corresponding to
the delay of the video signal of the MPEG encoder 103. The audio
signal, which has been delayed by the delay circuit 104, is output
to an ECC encoder 107. This is because the digital VTR according to
the embodiment of the present invention treats the audio signal as
a non-compressed signal.
[0066] The SDTI signal supplied from the outside through the SDTI
is input to an SDTI input portion 105. The SDTI input portion 105
detects the synchronization of the SDTI signal. The SDTI signal is
temporarily buffered and the elementary stream is extracted
therefrom. The extracted elementary stream is supplied to the other
input terminal of the recording side MFC 106. The synchronous
signal, which has been synchronously detected, is supplied to the
timing generator TG 102 (not shown).
[0067] The SDTI input portion 105 extracts a digital audio signal
from the input SDTI signal. The extracted digital audio signal is
supplied to the ECC encoder 107.
[0068] In the digital VTR according to the present invention, an
MPEG ES can be directly input independent from the base band video
signal, which has been input from the SDI input portion 101.
[0069] The recording side MFC 106 has a stream converter and a
selector. The recording side MFC 106 selects one of the MPEG ES
supplied from the SDI input portion 101 and the MPEG ES supplied
from the SDTI input portion 105, and collects DCT coefficients of
DCT blocks of one macro block so that frequency components are
rearranged in the order of frequency components. The resultant
stream, of which the coefficients of the MPEG ES have been
rearranged, is referred to as converted elementary stream. Since
the MPEG ES is rearranged, when a search reproduction is performed,
as many DC coefficients and low order AC coefficients can be
collected as possible, which contributes the improvement of the
quality of the search picture. The converted elementary stream is
supplied to the ECC encoder 107.
[0070] A main memory (not shown) having a large storage capacity is
connected to the ECC encoder 107. The ECC encoder 107 has a packing
and shuffling portion, an audio outer code encoder, a video outer
code encoder, an inner code encoder, an audio shuffling portion, a
video shuffling portion, and so forth. In addition, the ECC encoder
109 has an sync block ID adding circuit and a synchronous signal
adding circuit. According to the first embodiment of the present
invention, as an error correction code for the video signal and
audio signal, a product code is used. In the product code, a data
symbol is dually encoded in such a manner that the two-dimensional
array of the video signal or audio signal is encoded in the
vertical direction with an outer code and that the two-dimensional
array is encoded in the horizontal direction with an inner code. As
the outer code and inner code, the Reed-Solomon code can be
used.
[0071] The converted elementary stream, which has been output from
the recording side MFC 106, is supplied to the ECC encoder 107. In
addition, the audio signals, which are output from the SDTI input
portion 105 and the delay circuit 104, are supplied to the ECC
encoder 107. The ECC encoder 107 shuffles the converted elementary
stream and audio signals, encodes them with an error correction
code, adds IDs and a synchronous signal to sync blocks, and outputs
the resultant signal as record data.
[0072] The record data, which has been output from the ECC encoder
107, is converted into a record RF signal by the equalizer EQ 108
that has a recording amplifier. The record RF signal is supplied to
a rotating dram 109. The rotating dram 109 records the record
signal on a magnetic tape 110. The rotating dram 109 has a rotating
head disposed in a predetermined manner. In reality, a plurality of
magnetic heads whose azimuths are different from each other and
that form adjacent tracks are disposed.
[0073] When necessary, the record data may be scrambled. When data
is recorded, it may be digitally modulated. In addition, the
partial response class 4 and Viterbi code may be used. The
equalizer EQ 108 has both a recording side structure and a
reproducing side structure.
[0074] Next, the structure and the processing operation of the
reproducing system of the digital VTR will be described. In the
reproduction mode, a reproduction signal that is reproduced from
the magnetic tape 110 by the rotating dram 109 is supplied to the
reproducing side structure of the equalizer EQ 108 that has a
reproducing amplifier and so forth. The equalizer EQ 108 equalizes
the reproduction signal and trims the wave shape thereof. When
necessary, the equalizer EQ 108 demodulates the digital modulation
and decodes the Viterbi code. An output of the equalizer EQ 108 is
supplied to an ECC decoder 111.
[0075] The ECC decoder 111 performs the reverse process of the ECC
encoder 107. The ECC decoder 111 has a main memory having a large
storage capacity, an inner code decoder, an audio deshuffling
portion, a video deshuffling portion, and an outer code decoder. In
addition, the ECC decoder 111 has a deshuffling and depacking
portion and a data interpolating portion for video data. Likewise,
the ECC decoder 111 has an audio AUX separating portion and a data
interpolating portion for audio data.
[0076] The ECC decoder 111 detects the synchronization of
reproduction data. In other words, the ECC decoder 111 detects a
synchronous signal added at the beginning of a sync block and
extracts the sync block from the reproduction signal. The ECC
decoder 111 corrects an error of each sync block of the
reproduction data with an inner code. Thereafter, the ECC decoder
111 performs an ID interpolating process for each sync block. The
ECC decoder 111 separates video data and audio data from the
reproduction data, where IDs have been interpolated. The ECC
decoder 111 deshuffles the video data and audio data so that they
are restored to the original data. The ECC decoder 111 corrects an
error of the deshuffled data with an outer code.
[0077] When the ECC decoder 111 cannot correct an error of data,
which exceeds its error correcting performance, the ECC decoder 111
sets an error flag to the data. For an error of video data, the ECC
decoder 111 outputs a signal ERR that represents that data has an
error.
[0078] The reproduction audio data, whose error has been corrected,
is supplied to an SDTI output portion 115. A delay circuit 114
delays the reproduction audio data for a predetermined amount and
supplies the delayed reproduction audio data to an SDI output
portion 116. The delay circuit 114 absorbs the delay of the video
data processed in an MPEG decoder 113 that will be described
later.
[0079] On the other hand, the video data, whose error has been
corrected, is supplied as converted reproduction elementary stream
to a reproducing side MFC circuit 112. The aforementioned signal
ERR is also supplied to the reproducing side MFC circuit 112. The
reproducing side MFC circuit 112 performs the reverse process of
the recording side MFC 106. The reproducing side MFC circuit 112
has a stream converter. The stream converter performs the reverse
process of the recording side stream converter. In other words, the
stream converter rearranges DCT coefficients arranged in the order
of frequency components to those arranged in the order of DCT
blocks. As a result, the reproduction signal is converted into an
elementary stream according to the MPEG2. At that point, when the
signal ERR is supplied from the ECC decoder 111 to the reproducing
side MFC circuit 112, it replaces the corresponding data with a
signal that perfectly complies with the MPEG2.
[0080] The MPEG ES, which has been output from the reproducing side
MFC circuit 112, is supplied to the MPEG decoder 113 and the SDTI
output portion 115. The MPEG decoder 113 decodes the supplied MPEG
ES so as to restore it to the original video signal, which has not
been compressed. In other words, the MPEG decoder 113 performs an
inversely quantizing process and an inverse DCT process for the
supplied MPEG ES. The decoded video signal is supplied to an SDI
output portion 116.
[0081] As described above, the audio data, which had been separated
from the video data by the ECC decoder 111, has been supplied to
the SDI output portion 116 through the delay circuit 114. The SDI
output portion 116 maps the supplied video data and audio data in
the SDI format so as to convert them into the SDI signal having the
SDI data structure. The SDI signal is output to the outside.
[0082] On the other hand, as described above, the audio data, which
had been separated from the video data by the ECC decoder 111, has
been supplied to the SDTI output portion 115. The SDTI output
portion 115 maps the video data and audio data as the supplied
elementary stream in the SDTI format so as to convert them into the
SDTI signal having the SDTI data structure. The SDTI signal is
output to the outside.
[0083] A system controller 117 (abbreviated as sys-con 117 in FIG.
4A and FIG. 4B) is composed of for example a micro computer. The
system controller 117 communicates with each block using a digital
signal SY_IO so as to control the entire operation of the digital
VTR. A servo 118 communicates with the system controller 117 using
a signal SY_SV. Using the signal SV_IO, the servo 118 controls the
traveling of the magnetic tape 110 and drives and controls the
rotating dram 109.
[0084] FIG. 5A, FIG. 5B, and FIG. 5C more practically show the
structure of the forgoing example of the MPEG encoder. FIG. 6A,
FIG. 6B, FIG. 6C, FIG. 7A, FIG. 7B, FIG. 7C, FIG. 8A, FIG. 8B, FIG.
9A, FIG. 9B, FIG. 10A, FIG. 10B, FIG. 10C, FIG. 11A, FIG. 11B, FIG.
11C, and FIG. 12 show examples of the structures of streams
transferred in the individual portions of FIG. 5A, FIG. 5B, and
FIG. 5C.
[0085] The MPEG encoder 103 is composed of an input field activity
averaging process portion 103A, a pre-encoding process portion
103B, and an encode portion 103C. The input field activity
averaging process portion 103A obtains the average value of
activities of the input video data and supplies the obtained
average value to the pre-encoding process portion 103B. The
pre-encoding process portion 103B estimates the generated code
amount of quantized input video data with the average value of the
activities. According to the estimated result, while controlling
the code amount, the encode portion 103C actually quantizes the
input video data, encodes the quantized video data with a variable
length code, and outputs the resultant data as an MPEG ES.
[0086] A timing generator TG 220 generates a timing signal
necessary for the MPEG encoder 103 with a horizontal synchronous
signal HD, a vertical synchronous signal VD, and a field
synchronous signal FLD that have been supplied from for example the
timing generator TG 103 shown in FIG. 4A and FIG. 4B. A CPU I/F
block 221 is an interface with the system controller 117 shown in
FIG. 4A and FIG. 4B. With a control signal and data transferred
through the CPU I/F block 221, the operation of the MPEG encoder
103 is controlled.
[0087] First of all, the process of the input field activity
averaging process portion 103A will be described. Video data, which
has been output from the SDI input portion 101 and input to the
MPEG encoder 103, is supplied to an input portion 201. The input
field activity averaging process portion 103A converts the video
data into data that can be stored in a main memory 203 and checks a
parity for the video data. The video data, which has been output
from the input portion 201, is supplied to a header creating
portion 202. Using a vertical blanking region or the like, headers
according to the MPEG, for example sequence_header,
quantizer_matrix, and gop_header, are extracted. The extracted
headers are stored in the main memory 203. These headers are
designated mainly by the CPU I/F block 221. In other than the
vertical blanking region, in the header creating portion 202, the
video data supplied from the input portion 201 is stored in the
main memory 203.
[0088] The main memory 203 is a frame memory for a picture. The
main memory 203 rearranges video data and absorbs the system delay.
The video data is rearranged by for example an address controller
(not shown) that controls read addresses of the main memory 203. In
FIG. 8A, FIG. 8B, and FIG. 8C, 8 lines, 0.5 frame, and 1 frame in
the block of the main memory 203 represent delay values as read
timings of the main memory 203. They are properly controlled
corresponding to a command issued from the timing generator TG
220.
[0089] A raster scan/block scan converting portion 204 extracts
macro blocks according to the MPEG from each line of the video
data, which has been stored in the main memory 203, and supplies
the extracted macro blocks to an activity portion 205 disposed
downstream thereof. According to the embodiment, with only the
first field, activities are calculated. Thus, the macro blocks that
are output from the raster scan/block scan converting portion 204
are composed of video data of the first field.
[0090] As shown in FIG. 6A, at the beginning of the stream that is
output from the raster scan/block scan converting portion 204,
address information of the vertical and horizontal directions of
the macro block are placed. The address information is followed by
a blank area having a predetermined size, followed by picture data
for one macro block.
[0091] The stream has a data length of 576 words each of which is
composed of for example eight bits. The last 512 words (referred to
as data portion) are assigned a picture data area for one macro
block. The first 64 words (referred to as header portion) contain
the aforementioned address information of the macro block. The
other portion is a blank area for data and flags embedded by each
portion disposed downstream of the raster scan/block scan
converting portion 204.
[0092] A macro block according to the MPEG is a matrix of 16
pixels.times.16 lines. However, as described with reference to FIG.
3A, FIG. 3B, FIG. 3C, FIG. 3D, FIG. 3E and FIG. 3F, the MPEG
encoder 103 performs the process for obtaining activities with only
the first field. Thus, when eight lines of the first field have
been stored in the main memory 203, the process can be started. In
reality, corresponding to a command issued from the timing
generator TG 220, the process is properly started.
[0093] The activity portion 205 calculates an activity of each
macro block. In the MPEG encoder 103, an activity of each macro
block of only a first field is calculated. The calculated result is
output as a field activity signal field_act. The signal field_act
is supplied to the averaging portion 206. The signal field_act is
cumulated for one field. As a result, an average value avg_act is
obtained. The average value avg_act is supplied to the activity
portion 209 of the pre-encoding process portion 103B. The activity
portion 209 performs a pre-encoding process with the average values
avg_act of the first and second fields.
[0094] Thus, after the average value avg_act of the activities of
the first field has been obtained, a pre-encoding process can be
performed with the average value in consideration of adaptive
quantization.
[0095] Next, the pre-encoding process portion 103B will be
described. The raster scan/block scan converting portion 207A
performs basically the same process as the forgoing raster
scan/block scan converting portion 204. However, since the raster
scan/block scan converting portion 207A is to perform a
pre-encoding process for estimating a code amount, the raster
scan/block scan converting portion 207A requires video data of both
the first field and the second field. Thus, when eight lines of the
second field have been stored in the main memory 203, the raster
scan/block scan converting portion 207A can form a macro block
having a size of 16 pixels.times.16 lines, which is dealt with the
MPEG. At that point, the raster scan/block scan converting portion
207A can start the process. In reality, the raster scan/block scan
converting portion 207A properly starts the process corresponding
to a command received from the timing generator TG 220.
[0096] Video data that is output from the raster scan/block scan
converting portion 207A is supplied to the DCT mode portion 208.
The DCT mode portion 208 decides to select a field DCT encoding
mode or a frame DCT encoding mode to encode video data.
[0097] In this case, instead of actually encoding video data, the
DCT mode portion 208 calculates the sum of absolute values of
difference values of vertically adjacent pixels in both the DCT
encoding mode and the frame DCT encoding mode, compares them, and
selects the mode whose calculated sum is smaller than the other.
The selected result is temporarily placed as DCT mode type data
that is a flag in the stream. The flag is transferred to each
portion downstream of the DCT mode portion 208. As shown in FIG.
6B, the DCT mode type data dct_type is placed on the rear end side
of a blank area of the header portion.
[0098] The activity portion 209 performs basically the same process
as the forgoing activity portion 205. However, as was described
above, the activity portion 209 is to perform the pre-encoding
process. Thus, the activity portion 209 calculates an activity of
each macro block using data of both the first field and the second
field. The activity portion 209 first obtains an activity act and
places it after a macro block address of the header portion as
shown in FIG. 6C. Thereafter, with the activity act and the average
value avg_act of the field activity obtained from the forgoing
averaging portion 206, the activity portion 209 obtains a
normalized activity Nact.
[0099] The obtained normalized activity Nact is temporarily placed
as normalized activity data norm_act that is a flag in the header
portion of the stream as shown in FIG. 7A. The flag is transferred
to the individual portions downstream of the averaging portion 206.
The activity act is overwritten to the normalized activity data
norm_act in the stream.
[0100] An output of the activity portion 209 is supplied to a DCT
portion 210A. The DCT portion 210A divides the supplied macro block
into DCT blocks each of which is composed of eight
pixels.times.eight pixels, performs the two-dimensional DCT for
each DCT block, and generates DCT coefficients. As shown in FIG.
7B, the DCT coefficients are placed in the data portion of the
stream and supplied to a quantizer table portion 211A.
[0101] The quantizer table portion 211A quantizes the DCT
coefficients, which have been transformed by the DCT portion 210A,
with a quantizer matrix (quantizer_matrix). As shown in FIG. 7C,
the DCT coefficients, which have been quantized by the quantizer
table portion 211A, are placed in the data portion of the stream
and then output. An output of the quantizer table portion 211A is
supplied to a multi-staged quantizer portion composed of a
plurality of Q_n (quantizer) portions 212, 212, . . . , VLC
portions 213, 213, . . . , cumulating portions .SIGMA. 214, 214, .
. . , and cumulating portions .SIGMA. 215, 215, . . . . The DCT
coefficients quantized by the quantizer table portion 211A are
quantized on multiple stages of the quantizer portions.
[0102] The Q_n portions 212, 212, . . . quantize the DCT
coefficients with different quantizer scales (quantizer_scale) Q.
The values of the quantizer scales Q are prescribed in for example
the MPEG2 standard. The Q_n portions 212, 212, . . . are composed
of for example 31 quantizers according to the standard. At that
point, since n=31, the Q_n portions 212, 212, . . . are a Q.sub.--1
portion, a Q.sub.--2 portion, . . . , and a Q.sub.--31 portion.
With the quantizer scales Qn, the Q_n portions 212 quantize DCT
coefficients with the quantizer scales Qn assigned thereto at a
total of 31 steps. Hereinafter, the quantizer scale values of the
Q_n portions 212, 212, . . . are denoted by the quantizer scales Qn
values.
[0103] The Q_n portions 212, 212, . . . quantize DCT coefficients
with their quantizer scale Qn values. At that point, with the
quantizer scale mqaunt, which is in consideration of the visual
characteristic, and which has been obtained using the following
Formula (6) with the normalized activity data norm_act obtained by
the activity portion 209, the adaptive quantization is
performe.
mqaunt=Q.sub.--n.times.norm.sub.--act (6)
[0104] The DCT coefficients adaptively quantized by the Q_n
portions 212, 212, . . . with the quantizer scales Qn are placed in
the data portion of the stream as shown in FIG. 8A and supplied to
the VLC portions 213, 213, . The VLC portions 213, 213, . . . scan
DCT coefficients for the individual quantizer scales Qn according
to for example the zigzag scanning method and encodes them with a
variable length code with reference to a VLC table according to for
example the two-dimensional Huffman code.
[0105] The data, which has been encoded with the variable length
code by the VLC portions 213, 213, . . . , is placed in the data
portion of the stream as shown in FIG. 8B and then output. Outputs
of the VLC portions 213, 213, . . . are supplied to the
corresponding cumulating portions .SIGMA. 214, 214, . . . .
[0106] The cumulating portions .SIGMA. 214, 214, . . . cumulate the
generated code amounts for each macro block. As described above,
when 31 types of quantizing devices are used, 31 types of generated
code amounts are obtained for each macro block. As shown in FIG.
9A, the generated code amounts cumulated by the cumulating portions
.SIGMA. 214, 214, . . . are placed in the header portion of the
stream. In other words, the generated code amounts quantized by the
Q.sub.--1 portion 212 to Q_n portion 212 for each macro block are
placed in the header portion of the stream. The data portion of the
stream is deleted. The stream of each macro block is supplied to
the main memory 203.
[0107] The generated code amounts for each macro block, which have
been output from the cumulating portions .SIGMA. 214, 214, . . . ,
are supplied to the respective cumulating portions .SIGMA. 215,
215, . . . . The cumulating portions .SIGMA. 215, 215, . . . select
generated code amounts for each macro block quantized with
quantizer_scale (=mquant), in which the forgoing visual
characteristic has been considered, from those obtained by the
cumulating portions .SIGMA. 214 and cumulate them for one
frame.
[0108] The generated code amounts (frame data rates) cumulated by
the cumulating portions .SIGMA. 215, 215, . . . for the quantizer
scales Qn are supplied as an n-word stream as shown in FIG. 9B to a
rate controlling portion 217. When 31 types of quantizing devices
are used as the forgoing example, corresponding 31 types of
generated code amounts for each frame are obtained.
[0109] Next, a method for obtaining a generated code amount will be
described more practically. For example, "generated code amount of
Q.sub.--4 portion 212" can be obtained as follows.
[0110] For example, in the case
[0111] norm_act [1]=1.3
[0112] norm_act [2]=1.5
[0113] norm_act [3]=0.8
[0114] norm_act [4]=1.0
[0115] . . . ,
[0116] mqaunt [1]=4.times.1.3=5.2
[0117] : The generated code amount of the Q.sub.--5 portion 212 is
obtained from the header portion of FIG. 9A.
[0118] mqaunt [2]=4.times.1.5=6.0
[0119] : The generated code amount of the Q.sub.--6 portion 212 is
obtained from the header portion of FIG. 9A.
[0120] mqaunt [3]=4.times.0.8=3.2
[0121] : The generated code amount of the Q.sub.--3 portion 212 is
obtained from the header portion of FIG. 9A.
[0122] mqaunt [4]=4.times.1.0=4.0
[0123] :The generated code amount of the Q.sub.--4 portion 212 is
obtained from the header portion of FIG. 9A.
[0124] . . .
[0125] They are cumulated for one frame. They are performed for
each of the Q.sub.--1 portion 212 to the Q_n portion 212. As a
result, the generated code amount for one frame is obtained.
[0126] Next, the encoding process portion 103C will be described.
The encoding process portion 103C performs the final encoding
process. As described above, the pre-encoding process portion 103B
estimates the generated code amounts for one frame in various
quantizing operations. The encoding process portion 103C encodes
data corresponding to the generated code amount, which has been
estimated for one frame so that the generated code amount does not
exceed the pre-designated target generated code amount and outputs
an MPEG ES.
[0127] The data used in the encoding process portion 103C has been
stored in the main memory 203. However, as described above, when
the generated code amounts for one frame have been estimated in
various quantizing operations by the pre-encoding process portion
103B, the encoding process portion 103C can start the process. As
described above, the process of each portion of the encoding
process portion 103C can be properly started corresponding to a
command issued from the timing generator TG 220.
[0128] Video data that has been read from the main memory 203 is
supplied to a raster scan/block scan converting portion 207B. The
raster scan/block scan converting portion 207B performs a process
similar to the process of the raster scan/block scan converting
portion 207A and extracts a macro block of 16 pixels.times.16 lines
from the video data. As shown in FIG. 10A, the extracted macro
block is placed in a data portion corresponding to the header
portion shown in FIG. 9A and supplied to a DCT mode portion
216.
[0129] Like the DCT mode portion 208, the DCT mode portion 216
decides to use the field DCT encoding mode or the frame DCT
encoding mode to encode data. At that point, the DCT mode portion
208 has decided the encoding mode and temporarily placed the result
as DCT type data dct_typ in the stream (see FIG. 10A). The DCT mode
portion 216 detects the DCT type data dct_typ from the stream and
switches to the field encoding mode or the frame encoding mode
corresponding to the detected DCT type data dct_typ. An output of
the DCT mode portion 216 is shown in FIG. 10B.
[0130] A macro block that has been output from the DCT mode portion
216 is supplied to a DCT portion 210B. Like the DCT portion 210A,
the DCT portion 210B two-dimensionally transforms the macro block
into DCT coefficients in the unit of one DCT block of eight
pixels.times.eight pixels. As shown in FIG. 10C, the DCT
coefficients, into which the macro block has been two-dimensionally
transformed corresponding to the two-dimensional DCT method, are
placed in the data portion of the stream and then output from the
DCT portion 210B.
[0131] A quantizer table portion 211B can be structured in the same
manner as the forgoing quantizer table portion 211A. The quantizer
table portion 211B quantizes the DCT coefficients transformed by
the DCT portion 210B with a quantizer matrix. As shown in FIG. 11A,
the DCT coefficients quantized by the quantizer table portion 211B
are placed in the data portion of the stream and supplied to a rate
controlling portion 217.
[0132] The rate controlling portion 217 selects one from the frame
data rates, which have been obtained by the cumulating portions
.SIGMA. 215, 215, . . . of the pre-encoding process portion 103B
for each quantizer scale Qn so that the selected one does not
exceed the maximum generated code amount per frame designated by
the system controller 117 and is the closest to the designated
value. The quantizer scale (mquant) for each macro block used in
the quantizing device corresponding to the selected frame data rate
is obtained from the normalized activity data norm_act placed in
the stream and supplied to a quantizing portion 218.
[0133] The quantizer scale for each macro block is placed as
quantizer_scale on the rear end side of the header portion of the
stream as shown in FIG. 11B and then sent to the quantizing portion
218.
[0134] The maximum generated code amount per frame is designated by
for example the system controller 117 and supplied to the rate
controlling portion 217 through the CPU I/F block 221.
[0135] At that point, the value of the quantizer scale (mquant) for
each macro block can be decreased by one size in the range that
does not exceed the difference between the maximum generated code
amount per frame, which has been designated by the system
controller 117 and transferred through the CPU I/F block 221, and
the generated code amount corresponding to the quantizer scale
(mquant) for each macro block, which has been obtained from the
normalized activity data norm_act placed in the stream. Thus, since
a code amount close to the maximum generated code amount designated
by the system controller 117 and transferred trough the CPU I/F
block 221 is obtained, high picture quality can be
accomplished.
[0136] The quantizing portion 218 extracts the quantizer scale
(quantizer_scale) designated by the rate controlling portion 217 in
the forgoing manner from the stream and quantizes DCT coefficients
with the quantizer table portion 211B corresponding to the
extracted quantizer scale. At that point, since the quantizer scale
supplied from the rate controlling portion 217 is the value of the
quantizer scale (mquant) obtained from the normalized activity data
norm_act, adaptive quantization is performed in consideration of
visual characteristic.
[0137] The DCT coefficients quantized by the quantizing portion 218
are placed in the data portion of the stream as shown in FIG. 11C
and supplied to a VLC portion 219. The DCT coefficients supplied to
the VLC portion 219 are scanned corresponding to for example the
zigzag scanning method. The resultant DCT coefficients are encoded
with a variable length code with reference to a VLC table according
to the two-dimensional Huffman code. The variable length code is
bit-shifted so that it is byte-aligned and then output as an MPEG
ES.
[0138] At that point, as shown in FIG. 12, the header portion as
the first half portion of the stream is replaced with the MPEG
header portion in which the MPEG header information of the slice
layer or below is placed. The variable length code is placed in the
data portion on the second half side of the stream.
[0139] The forgoing example shows that the process of the MPEG
encoder 103 is implemented by hardware. However, according to the
present invention, the process of the MPEG encoder 103 is not
limited to such an example. In other words, the process of the MPEG
encoder 103 can be implemented by software. In this case, for
example, a computer_apparatus is provided with analog and digital
interfaces for a video signal. Software installed on the computer
is executed with a CPU and a memory. In the forgoing digital VTR
structure, the CPU and memory may be substituted for the MPEG
encoder 103.
[0140] The software is recorded as program data on a recording
medium such as a CD-ROM (Compact Disc--Read Only Memory). The
recording medium, on which the software has been recorded, is
loaded to the computer apparatus. With a predetermined operation,
the software is installed onto the computer apparatus. As a result,
the process of the software can be executed. Since the structure of
the computer apparatus is well known, its description will be
omitted in the following.
[0141] FIG. 13 is a flow chart showing an example of the process of
the MPEG encoder 103 implemented by software. Since the process of
the flow chart is the same as the process implemented by the
forgoing hardware, the process of the flow chart will be briefly
described in consideration of the process implemented by hardware.
Steps S1 to S7 correspond to the process of the forgoing input
field activity averaging process portion 103A. Steps S11 to S21
correspond to the process of the forgoing pre-encoding process
portion 103B. Steps S31 to S38 correspond to the process of the
forgoing encoding process portion 103C.
[0142] At step S1, the first step, video data is captured. At step
S2, the next step, each MPEG header is extracted from the captured
video data in the vertical blanking region and stored in a memory.
In other than the vertical blanking region, the captured video data
is stored in the memory.
[0143] At step S3, video data is converted from raster scan data
into block scan data. As a result, a macro block is extracted. This
operation is performed by controlling read addresses of the video
data stored in the memory. At step S4, an activity of the extracted
macro block of the first field of the video data is calculated. At
step S5, the calculated result Activity (act) is cumulated and
stored as a cumulated value sum in a memory. The process from step
S3 to step S5 is repeated until it has been determined that the
last macro block of the first field has been processed at step S6.
In other words, the cumulated value sum becomes the sum of
activities of macro blocks for one field.
[0144] When it has been determined that the last macro block of one
field has been processed at step S6, the cumulated value sum stored
in the memory is divided by the number of macro blocks for one
field at step S7. As a result, the average value Activity (avg_act)
of the field activities for one field is obtained and stored in the
memory.
[0145] When the average value Activity (avg_act) of the field
activities has been obtained, the flow advances to step S11. Like
at step S3, at step S11, the video data stored in the memory is
converted from raster scan data into block scan data. As a result,
a macro block is extracted.
[0146] At step S12, the field DCT encoding mode or the frame DCT
encoding mode is selected so as to process each DCT. The selected
result is stored as DCT mode type data dct_typ in the memory. At
step S13, with the first and second fields, an activity of each
macro block is calculated. With the average value Activity
(avg_act) of the field activities obtained and stored in the memory
at step S7, a normalized activity Activity (norm_act) is obtained.
The obtained normalized activity Activity (norm_act) is stored in
the memory.
[0147] At step S14, the macro block extracted from the video data
at step S11 is divided into DCT blocks each of which is composed of
eight pixels.times.eight pixels. The DCT blocks are
two-dimensionally transformed in the two-dimensional DCT method. As
a result, DCT coefficients are obtained. At step S15, the DCT
coefficients are quantized with a quantizer table
(quantizer_table). Thereafter, the flow advances to step S16.
[0148] The process from steps S16 to S20 is repeated with each
quantizer scale (quantizer_scale) Qn value. As a result, the
processes corresponding to the forgoing Q_n portions 212, 212, . .
. , the forgoing VLC portions 213, 213, . . . , the forgoing
cumulating portions .SIGMA. 214, 214, . . . , and the forgoing
cumulating portions .SIGMA. 215, 215, . . . are performed. In other
words, at step S16, the DCT coefficients are quantized with
quantizer scale Q=1. At step S17, the DCT coefficients quantized
with reference to the VLC table are encoded with a variable length
code. At step S18, the generated code amount of the macro block
with the variable length code is calculated. At step S19, the
generated code amount of each macro block for one frame is
cumulated. At step S20, it is determined whether or not there is
another quantizer scale Qn. When it has been determined that there
is another quantizer scale Qn, the flow returns to step S16. At
step S16, the process for another quantizer scale Qn is performed.
The generated code amounts for one frame corresponding to the
individual quantizer scales Qn are stored in the memory.
[0149] When it has been determined that the cumulated value of the
generated code amounts for the frame corresponding to all the
quantizer scale values Qn has been obtained at step S20, the flow
advances to step S21. At step S21, it is determined whether or not
the last macro block (MB) of one frame has been processed. When the
last macro block has not been processed, the flow returns to step
S11. When the last macro block has been processed and the generated
code amount for one frame has been estimated, the flow advances to
step S31. At step S31, the real encoding process is performed.
[0150] Like at S11, at step S31, the video data stored in the
memory is converted from raster scan data into block scan data. As
a result, a macro block is extracted. At step S32, corresponding to
the DCT mode type data dct_typ stored in the memory at step S12,
the DCT encoding mode is designated.
[0151] At step S33, the macro block extracted from the video data
at step S31 is divided into DCT blocks each of which is composed of
eight pixels.times.eight pixels. The DCT blocks are
two-dimensionally transformed in the dimensional DCT method. As a
result, DCT coefficients are obtained. The DCT coefficients are
quantized with the quantizer table (quantizer_table) at step S34.
Thereafter, the flow advances to step S35.
[0152] At step S35, corresponding to the generated code amounts for
one frame corresponding to the quantizer scales Qn, which have been
estimated at steps S11 to S21, the quantizer scales Qn used at step
S36 are designated for each macro block so as to control the
generated code amount in the real encoding process.
[0153] Thereafter, the flow advances to step S36. With the
quantizer scales Qn designated at step S35, the DCT coefficients
quantized with the quantizer table at step S34 are quantized. With
reference to the VLC table, the DCT coefficients quantized at step
S36 are encoded with a variable length code at step S37. At step
S38, it is determined whether or not the last macro block of one
frame has been processed. When it has been determined that the last
macro block of one frame has not been processed, the flow returns
to step S31. At step S31, the quantizing process and the variable
length code encoding process are performed for the next macro
block. In contrast, when it has been determined that the last macro
block of one frame has been processed at step S37, the encoding
process for one frame has been completed.
[0154] The forgoing example shows that the pre-encoding process
from steps S11 to S21 is different from the encoding process from
of steps S31 to S38. However, it should be noted that the present
invention is not limited to such an example. For example, the
generated code amounts estimated at steps S11 to S21 are stored in
the memory. Data that is obtained in the real encoding process is
selected from the stored data. As a result, the process from steps
S31 to S38 can be contained as a loop in the process from steps S11
to S21.
[0155] FIG. 14 is a schematic diagram for describing a block
segmentation for calculating an activity according to an embodiment
of the present invention. According to the embodiment of the
present invention, when activities are calculated for only one
field, they are calculated in the unit of one sub block composed of
eight pixels.times.four lines.
[0156] One macro block composed of 16 pixels.times.16 lines is
shown on the left of FIG. 14. In the macro block of one frame shown
on the left side of FIG. 14, hatched lines represent lines of a
second field. The other lines represent lines of a first field. The
macro block is divided into four DCT blocks each of which is
composed of eight pixels.times.eight lines. As shown on the right
of FIG. 14, components (lines) that compose the first field are
extracted. As a result, four sub blocks each of which is composed
of eight pixels.times.four lines are obtained. A field activity
field_act of the sub blocks composed of data of the first field is
calculated.
[0157] As was described above, according to the embodiment of the
present invention, when data for eight lines of the first field has
been stored in the main memory 203, the process is started. Thus,
actually, sub blocks each of which is composed of eight
pixels.times.four lines are not extracted from a macro block
composed of 16 pixels.times.16 lines. Instead, when data for eight
lines of the first field has been stored in the main memory 203,
four sub blocks each of which is composed of eight
pixels.times.four lines are obtained by the raster scan/block scan
converting portion 204.
[0158] An average value (P) of luminance level values of individual
pixels of each sub block composed of eight pixels.times.four lines
is obtained corresponding to Formula (7).
P={fraction (1/32)}.SIGMA.[k=1, 32]Yk (7)
[0159] In other words, 32 luminance level values (Yk) of eight
pixels.times.four lines of each sub block are summed and then
divided by "32". As a result, an average value (P) is obtained.
[0160] Next, a difference value between the luminance level value
(Yk) of each sub block of eight pixels.times.four lines and the
average value (P) is squared. An average difference value
(var_sblk) of the squared difference values is obtained
corresponding to Formula (8).
var.sub.--sblk={fraction (1/32)}.SIGMA.[k=1, 32](Yk-P).sup.2
(7)
[0161] Since one macro block is composed of four sub blocks, the
minimum value is obtained from the four average difference values
(var_sblk). The minimum value is used as a field activity field_act
of the macro block (see Formula (8)).
field_act=1+min[sblk=1, 4](var.sub.--sblk) (8)
[0162] The forgoing calculating process is repeated for each macro
block. In such a manner, an average value (field_avg_act) of the
field activities field_act of the first field is obtained. (Formula
(9))
field.sub.--avg_act=1/MBnum.SIGMA.[m=1, MBnum]field_act [m] (9)
[0163] where the value MBnum represents the total number of macro
blocks of one frame.
[0164] Thereafter, calculations corresponding to Formula (1) to
Formula (5) described in the section of the Related Art are
performed so as to perform an adaptive quantization in
consideration of an activity of each macro block.
[0165] The calculating method for the forgoing field activity
field_act is just an example. In other words, the field activity
field_act may be calculated in another method.
[0166] To allow the MPEG encoder 103 according to the present
invention to be suitable for a VTR used in a broadcasting station,
the MPEG encoder 103 deals with only intra pictures and controls a
code amount so that the data amount per frame does not exceed a
predetermined value. Thus, since the pre-encoding process portion
103B performs the pre-encoding process, a delay for one frame is
added to the system delay shown in FIG. 3F. However, the delay for
one frame results from the code amount control in the pre-encoding
process. Thus, unlike with the present invention, the adaptive
quantization in consideration of the activity for each macro block
is not performed with the average activity obtained from the first
field of the picture. In other words, when the pre-encoding process
is performed, not only at the timing according to the present
invention as shown in FIG. 2B, FIG. 2C, FIG. 2D, FIG. 2E, and FIG.
2F, but also at the timing according to the related art as shown in
FIG. 1A, FIG. 1B, FIG. 1C, and FIG. 1D and at the timing in the
other method as shown in FIG. 3A, FIG. 3B, FIG. 3C, FIG. 3D, FIG.
3E, and FIG. 3F, a delay for one frame is made. As a result, the
advantage of the embodiment is not lost.
[0167] In the forgoing example, as the picture data encoding
system, the MPEG2 system as an encoding system of which a picture
is divided into blocks and each block is DCT-transformed is used.
However, the present invention is not limited to such an example.
As long as picture is encoded in the unit of one block, another
encoding system can be used.
[0168] In the forgoing example of the present invention, a digital
VTR has been described. However, the present invention is not
limited to such an example. The present invention can be applied to
a picture processing apparatus that compression-encodes picture
data in a block encoding method, for example a picture transmitting
apparatus that compression-encodes picture data and transmits the
encoded picture data.
[0169] As was described above, according to the embodiment of the
present invention, when a normalized activity of a frame is
calculated, with an average activity obtained from picture data of
a first field of the frame, the normalized activity is calculated.
Thus, according to the embodiment, when the normalized activity is
calculated, since an average activity of the immediately preceding
frame is not used, even if a scene change occurs between two
frames, the normalized activity can be properly calculated. As a
result, the picture quality does not deteriorate.
[0170] In addition, according to the present invention, the
calculation time can be shortened and the system delay can be
shortened in comparison with the method for calculating the average
activity with the entire frame (two fields) and obtaining the
normalized activity.
[0171] In addition, since a buffer for one frame is not necessary
for obtaining a normalized activity, the memory capacity can be
reduced.
[0172] Thus, when the present invention is applied to a VTR for an
editing operation that has a restriction of a system delay, the
picture quality can be optimized in consideration of the visual
characteristic.
* * * * *