U.S. patent number 5,467,136 [Application Number 08/197,862] was granted by the patent office on 1995-11-14 for video decoder for determining a motion vector from a scaled vector and a difference vector.
This patent grant is currently assigned to Kabushiki Kaisha Toshiba. Invention is credited to Yoshihiro Kikuchi, Tadaaki Masuda, Toshinori Odaka, Tadahiro Oku, Hideyuki Ueno, Yoshiharu Uetani, Noboru Yamaguchi, Tomoo Yamakage.
United States Patent |
5,467,136 |
Odaka , et al. |
* November 14, 1995 |
Video decoder for determining a motion vector from a scaled vector
and a difference vector
Abstract
A video decoding apparatus including a memory for storing at
least first and second reference pictures, a prediction signal
forming circuit for forming a prediction signal to produce an
output signal, using a first reference signal designated by a first
motion vector for the first reference picture and a second
reference signal designated by a second motion vector for the
second reference picture read out of the memory. Also included is a
decoding circuit for decoding the transmitted first motion vector
and a transmitted motion vector difference, and for further
decoding the second motion vector by adding a scaled motion vector
and the motion vector difference. The scaled motion vector is
derived by scaling the first motion vector to a motion vector for
the second reference picture. Further included is a decoding
circuit for decoding a transmitted difference signal and for
obtaining the output signal by adding the difference signal and the
prediction signal.
Inventors: |
Odaka; Toshinori (Yokohama,
JP), Uetani; Yoshiharu (Kawasaki, JP),
Masuda; Tadaaki (Tokyo, JP), Yamakage; Tomoo
(Kawasaki, JP), Ueno; Hideyuki (Tokyo, JP),
Yamaguchi; Noboru (Yashio, JP), Kikuchi;
Yoshihiro (Yokohama, JP), Oku; Tadahiro (Urayasu,
JP) |
Assignee: |
Kabushiki Kaisha Toshiba
(Kawasaki, JP)
|
[*] Notice: |
The portion of the term of this patent
subsequent to June 13, 2012 has been disclaimed. |
Family
ID: |
27549562 |
Appl.
No.: |
08/197,862 |
Filed: |
February 17, 1994 |
Related U.S. Patent Documents
|
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
Issue Date |
|
|
890705 |
May 29, 1992 |
5317397 |
|
|
|
Foreign Application Priority Data
|
|
|
|
|
May 31, 1991 [JP] |
|
|
3-130012 |
Oct 17, 1991 [JP] |
|
|
3-298316 |
Oct 17, 1991 [JP] |
|
|
3-298317 |
Oct 31, 1991 [JP] |
|
|
3-286855 |
Feb 18, 1992 [JP] |
|
|
4-30923 |
|
Current U.S.
Class: |
375/240.15;
348/699; 375/E7.105; 375/E7.113; 375/E7.133; 375/E7.14; 375/E7.148;
375/E7.15; 375/E7.155; 375/E7.156; 375/E7.163; 375/E7.17;
375/E7.171; 375/E7.176; 375/E7.18; 375/E7.193; 375/E7.194;
375/E7.211; 375/E7.217; 375/E7.218; 375/E7.244; 375/E7.25;
375/E7.255; 375/E7.256; 375/E7.257; 375/E7.26 |
Current CPC
Class: |
H04N
19/105 (20141101); H04N 19/503 (20141101); H04N
19/159 (20141101); H04N 19/176 (20141101); H04N
19/51 (20141101); H04N 19/61 (20141101); H04N
19/107 (20141101); H04N 19/112 (20141101); H04N
19/126 (20141101); H04N 19/137 (20141101); H04N
19/152 (20141101); H04N 19/16 (20141101); H04N
19/174 (20141101); H04N 19/80 (20141101); H04N
19/82 (20141101); H04N 19/523 (20141101); H04N
19/58 (20141101); H04N 19/577 (20141101); H04N
19/50 (20141101); H04N 19/149 (20141101); H04N
19/15 (20141101); H04N 19/146 (20141101) |
Current International
Class: |
G06T
9/00 (20060101); H04N 7/26 (20060101); H04N
7/46 (20060101); H04N 7/50 (20060101); H04N
7/32 (20060101); H04N 7/36 (20060101); H04N
007/36 () |
Field of
Search: |
;348/416,413,402,415,412,411,409,401,400,390,384,699,700,701 |
References Cited
[Referenced By]
U.S. Patent Documents
Other References
Information Technology, "Generic Coding of Moving Pictures and
Associated Audio Information: Video", Recommendation ITU-T H.262,
ISO/IEC 13818-2 Draft International Standard, Nov. 9, 1994, pp. i,
87 & 88. .
Information Technology, "Generic Coding of Moving Pictures and
Associated Audio", Recommendation H.26x, ISO/IEC Draft
International Standard, May 12, 1993, pertinent section
7.2.5..
|
Primary Examiner: Chin; Tommy P.
Assistant Examiner: Tung; Brian S.
Attorney, Agent or Firm: Oblon, Spivak, McClelland, Maier
& Neustadt
Parent Case Text
CROSS-REFERENCE TO THE RELATED APPLICATION
This application is a continuation-in-part of U.S. patent
application Ser. No. 07/890,705 filed May 29, 1992, now U.S. Pat.
No. 5,317,397.
Claims
What is claimed is:
1. A video decoding apparatus comprising:
memory means for storing at least first and second reference
pictures;
decoding means for decoding a transmitted first motion vector and a
transmitted motion vector difference, and for further decoding a
second motion vector by adding a scaled motion vector and the
motion vector difference, said scaled motion vector being derived
by scaling the first motion vector to a motion vector for the
second reference picture;
prediction signal forming means for forming a prediction signal,
using a first reference signal designated by the first motion
vector for the first reference picture and a second reference
signal designated by the second motion vector for the second
reference picture read out of said memory means; and
decoding means for decoding a transmitted difference signal and for
obtaining an output signal by adding the difference signal and said
prediction signal.
2. The video decoding apparatus according to claim 1, wherein said
motion vector difference is decoded from a variable length
code.
3. The video decoding apparatus according to claim 1, wherein said
motion vector difference is limited within a range of values
obtained only when the directions of the first and second motion
vectors are close to each other.
4. The video decoding apparatus according to claim 3, wherein said
motion vector difference is decoded from a variable length
code.
5. A video decoding apparatus comprising:
memory means for storing at least first and second reference
pictures;
decoding means for decoding a transmitted first motion vector and a
transmitted motion vector difference, and for further decoding a
second motion vector corresponding to a motion vector obtained by
adding a scaled motion vector and the motion vector difference,
said scaled motion vector being derived by scaling the first motion
vector to a motion vector for the second reference picture;
prediction signal forming means for forming a prediction signal,
using a first reference signal designated by the first motion
vector for the first reference picture and a second reference
signal designated by the second motion vector for the second
reference picture read out of said memory means; and
decoding means for decoding a transmitted difference signal and for
obtaining an output signal by adding the difference signal and said
prediction signal.
6. The video decoding apparatus according to claim 5, wherein said
motion vector difference is decoded from a variable length
code.
7. The video decoding apparatus according to claim 5, wherein said
motion vector difference is limited within a range of values
obtained only when the directions of the first and second motion
vectors are close to each other.
8. The video decoding apparatus according to claim 7, wherein said
motion vector difference is decoded from a variable length
code.
9. A method of decoding a video signal comprising the steps of:
storing at least first and second reference pictures in a
memory;
decoding a transmitted first motion vector and a transmitted motion
vector difference, and for further decoding a second motion vector
by adding a scaled motion vector and the motion vector difference,
said scaled motion vector being derived by scaling the first motion
vector to a motion vector for the second reference picture;
forming a prediction signal, using a first reference signal
designated by the first motion vector for the first reference
picture and a second reference signal designated by the second
motion vector for the second reference picture read out of said
memory means; and
decoding a transmitted difference signal and obtaining an output
signal by adding the difference signal and the prediction
signal.
10. The method according to claim 9, wherein said motion vector
difference is decoded by a variable length code.
11. The method according to claim 9, wherein said motion vector
difference is limited within a range of values obtained only when
the directions of the first and second motion vectors are close to
each other.
12. The video decoding apparatus according to claim 11, wherein
said motion vector difference is decoded by a variable length
code.
13. A method of decoding a video signal comprising the steps
of:
storing at least first and second reference pictures in a
memory;
decoding a transmitted first motion vector and a transmitted motion
vector difference and for further decoding a second motion vector
corresponding to a motion vector obtained by adding a scaled motion
vector and the motion vector difference, said scaled motion vector
being derived by scaling the first motion vector to a motion vector
for the second reference picture;
forming a prediction signal, using a first reference signal
designated by the first motion vector for the first reference
picture and a second reference signal designated by the second
motion vector for the second reference picture read out of said
memory means; and
decoding a transmitted difference signal and obtaining an output
signal by adding the difference signal and the prediction
signal.
14. The method according to claim 13, wherein said motion vector
difference is decoded by a variable length code.
15. The method according to claim 13, wherein said motion vector
difference is limited within a range of values obtained only when
the directions of the first and second motion vectors are close to
each other.
16. The video decoding apparatus according to claim 15, wherein
said motion vector difference is decoded by a variable length code.
Description
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to an apparatus for coding and
decoding a video signal in video communication, video transmission,
video storing, and broadcasting and, more particularly, to a video
coding and decoding apparatus using motion compensation predictive
coding.
2. Description of the Related Art
A TV telephone, a TV meeting system, an optical disk apparatus, a
VTR, a CATV, and the like require a technique for coding a video
signal. As such a video coding scheme, so-called motion
compensation prediction coding is known. In this scheme, a pixel
value of a picture to be coded (to be referred to as a to-be-coded
picture hereinafter) is predicted by using a pixel value of a coded
picture designated by a motion vector, and a corresponding
predictive error and the motion vector are coded.
Assume that such a motion compensation predictive coding scheme is
applied to an interlaced video (field picture) signal. In this
case, the scheme is not suitable for a motion precision higher than
an intra-field 1/2 line, e.g., a higher precision than an
intra-frame one line, because there is no corresponding pixel value
in a reference picture.
For this reason, a method of performing motion compensation by
interpolating a pixel value of a corresponding pixel which does not
exist on a reference picture using the pictures of the two adjacent
fields has been proposed (e.g., "Adaptive Line Interpolated
Inter-field Motion Compensation Method", Image Coding Symposium,
1990, (PCSJ90), 8-1). In this motion compensation method, a
to-be-coded picture is coded by using a reference picture and an
optimum motion vector. The reference video signal is formed by
interpolation using a signal, of coded video signals of past two
fields, which is located at a position designated by a motion
vector obtained by a motion vector searching circuit. More
specifically, three field memories are prepared, and a signal
obtained by performing intra-field interpolation using an output
from the first field memory is mixed with an output from the second
field memory at a mixing ratio of km:1-km. The value km varies
depending on the motion magnitude detected by a motion magnitude
detection circuit on the basis of outputs from the first and third
field memories.
According to this conventional technique, an interpolation value is
formed by using the video signals of two adjacent fields in
accordance with the motion magnitude so that an appropriate
reference video signal corresponding to a motion precision higher
than an intra-field 1/2 line (a motion precision higher than an
intra-frame 1 line) can be generated for a field picture, thereby
allowing high-precision motion compensation predictive coding.
In this scheme, however, the motion between two reference pictures
must be detected, as described above, and hence a motion magnitude
detection circuit is required. In addition, in order to perform
motion magnitude detection, the pictures of three adjacent fields
must be coded before the detection. If the pictures of three
adjacent fields are not coded before motion magnitude detection,
the detection cannot be performed.
In a conventional video coding apparatus using the above-described
motion compensation predictive coding scheme, when a search for a
motion vector for motion compensation is performed in a forward or
backward direction, a reference picture for searching for the
motion vector is limited to one coded picture in a case that a
to-be-coded picture is a non-interlaced video. For reasons of this,
accurate motion compensation cannot be performed with respect to a
video which moves between adjacent pictures in units of 1/2
pixels.
Of the above-described video coding schemes, a video coding scheme
having a transmission rate of about 1 to 2 Mbps has been developed
to be a standard, which is termed "MPEG1", for a picture storage
such as VCRs and optical disks. This scheme is based on motion
compensation inter-frame prediction and DCT (Discrete Cosine
Transform).
A scheme for coding a video having high quality equal to or higher
than quality for TV broadcasting at about 2 to 10 Mbps has been
studied for the same purpose as described above. A coding scheme of
MPEG1 is designed to be applied to a non-interlaced video as input
signals. However, since the standard TV signal is interlaced video,
where the coding scheme MPEG1 is applied to the interlaced video, a
new means suitable for interlaced video is required. An
inter-field/inter-frame adaptive prediction scheme is known as a
coding method of interlaced video. In this scheme, a field having
the same scan phase as that of a coding (to-be-coded) field (an
odd-numbered field when an odd-numbered field is coded and vice
versa), and a field having a scan phase different from that of the
coding (to-be-coded) field and close in time thereto (e.g., an
even-numbered field when an odd-numbered field is coded and vice
versa) are switched as a prediction signal. In addition,
interpolation prediction has recently been studied, which forms
prediction signals by averaging signals extracted from previous
fields (e.g., F. Wang et al., "High-quality coding of the
even-numbered fields based on the odd-numbered fields of interlaced
video sequences", IEEE trans. CS).
When an interlaced video is subjected to a predictive coding using
previous fields as in the coding scheme MPEG1, the even- and
odd-numbered fields suitable for the interlaced video is applied to
a prediction. In this case, since the amount of motion vector data
is increased when motion vectors are sent for the respective
fields, means for decreasing the amount of motion vector data
without a decrease in efficiency is required. That is, it is
required to improve the prediction precision with respect to an
interlaced video and decrease the data amount of predictive error
coded outputs. In addition, it is required to minimize an increase
in motion vector data. However, no effective techniques capable of
satisfying such requirements have been proposed yet.
As described above, in the conventional technique, in order to
interpolate between the pixels on a reference picture using two
field pictures adjacent to the reference picture, motion magnitude
detection is required for the reference picture. Therefore, a
motion magnitude detection circuit is required, and the hardware
inevitably becomes complicated. In addition, if three adjacent
fields are not coded before motion magnitude detection, the
detection cannot be performed.
Furthermore, according to the conventional technique since a
reference picture is limited to one coded picture in a search for a
motion vector, accurate motion compensation cannot be performed
with respect to a video which moves between pictures in units of
1/2 pixels. Further, if a prediction signal is formed referring to
plural frames, since a large amount of arithmetic operation is
required to search for a motion vector, the motion vector search
time is prolonged or the circuit size of the hardware is
increased.
Moreover, in the conventional technique, the prediction precision
with respect to an interlaced video cannot be effectively improved,
and the amount of motion vector data sent for the respective fields
is undesirably increased.
SUMMARY OF THE INVENTION
It is an object of the present invention to pro vide a video coding
apparatus which predicts one field from the other field within one
frame.
It is another object of the present invention to provide a video
coding apparatus which searches for a region adjacent to the
partial pictures which are to be coded from among a plurality of
coded frames, and obtains prediction signals by performing
spatial-temporal filtering of the pictures.
It is another object of the present invention to provide a video
coding apparatus which obtains a prediction signal by limiting the
searching range of motion vectors.
It is another object of the present invention to provide a video
coding apparatus which forms different prediction values in
accordance with the values of motion vectors.
It is another object of the present invention to provide a video
decoding apparatus for decoding the video signal coded by the above
coding apparatus.
According to the present invention, there is provided a video
coding apparatus comprising:
a memory for storing a coded video signal used for prediction as a
reference video signal;
a motion vector detecting circuit for detecting, using from one
field picture signal read out from the memory, a motion vector
regarding the other field picture signal to be coded, a pair of
said one field picture signal and the other field picture signal
forming a frame video signal;
a prediction signal producing circuit for producing a prediction
signal based on a reference video signal designated by the motion
vector detected by the motion vector detecting circuit; and
a coding circuit for coding a difference between the prediction
signal and a video signal corresponding to the other field picture
signal to be coded.
According to the present invention, there is provided a video
coding apparatus comprising:
a memory for storing a coded video signal used as a reference video
signal;
a vector detecting circuit for detecting, from plural field picture
signals or plural frame picture signals which are read out from the
memory and the reference picture signals, an optimum motion
regarding a picture signal to be coded;
a prediction signal producing circuit for subjecting a
spatial-temporal filtering to a reference picture signal designated
by the optimum motion vector to produce a prediction signal;
and
a coding circuit for coding the picture signal to be coded on the
basis of the prediction signal.
According to the present invention, there is provided a video
coding apparatus comprising:
a memory for storing a coded video signal used as a reference video
signal;
a motion vector detecting circuit for detecting, from a picture
read out from the memory, a motion vector regarding the to-be-coded
picture signal;
a searching range limiting circuit for limiting a range of the
reference picture which said motion vector detecting means searches
for in a motion vector detection;
a prediction signal producing circuit for producing a prediction
signal based on the reference video signal designated by the motion
vector detected by the motion vector detecting circuit; and
a coding circuit for coding the video signal on the basis of the
motion vector and the prediction signal.
According to the invention there is provided a video coding
apparatus comprising:
a memory for storing a coded video signal used as a reference
picture signal;
a motion vector candidate generating circuit for generating plural
motion vector candidates for designating plural partial pictures of
a plurality of reference pictures read out from the memory;
a prediction signal candidate producing circuit for subjecting a
filtering processing corresponding to a type of each of the motion
vector candidates output from the motion vector candidate
generating circuit to each of the partial pictures to produce a
plurality of prediction signal candidates;
a motion vector detecting circuit for selecting an optimum
prediction signal among the prediction signal candidates which is
most optimum to the to-be-coded picture, and outputting the optimum
prediction signal and an optimum motion vector corresponding
thereto; and
a coding circuit for coding the to-be-coded picture signal on the
basis of a difference between the to-be-coded picture signal and
the optimum prediction signal and the optimum motion vector.
According to the present invention, there is a video decoding
apparatus comprising:
memory means for storing at least first and second reference
pictures;
prediction signal forming means for forming a prediction signal to
produce an output signal, using a first reference signal designated
by a first motion vector for the first reference picture and a
second reference signal designated by a second motion vector for
the second reference picture read out of the memory means;
decoding means for decoding the transmitted first motion vector and
a transmitted motion vector difference, and for further decoding
the second motion vector by adding a scaled motion vector and the
motion vector difference, the scaled motion vector being derived by
scaling the first motion vector to a motion vector for the second
reference picture; and
decoding means for decoding a transmitted difference signal and for
obtaining the output signal by adding the difference signal and the
prediction signal.
Additional objects and advantages of the invention will be set
forth in the description which follows, and in part will be obvious
from the description, or may be learned by practice of the
invention. The objects and advantages of the invention may be
realized and obtained by means of the instrumentalities and
combinations particularly pointed out in the appended claims.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute
a part of the specification, illustrate presently preferred
embodiments of the invention, and together with the general
description given above and the detailed description of the
preferred embodiments given below, serve to explain the principles
of the invention.
FIG. 1 is a block diagram of a video coding apparatus in accordance
with the first embodiment of the present invention;
FIGS. 2 to 4 are views each showing the relationship between video
signals of the respective frames according to the first
embodiment;
FIG. 5 is a block diagram of a video coding apparatus in accordance
with the second embodiment of the present invention;
FIGS. 6A and 6B are views each showing the relationship between the
video signals of the respective frames according to the second
embodiment;
FIG. 7 is a block diagram of a video coding apparatus in accordance
with the third embodiment of the present invention;
FIG. 8 is a block diagram of a video coding apparatus in accordance
with the fourth embodiment of the present invention;
FIG. 9 is a block diagram of a video coding apparatus in accordance
with the fifth embodiment of the present invention;
FIG. 10 is a block diagram of a video decoder;
FIG. 11 is a view showing a motion vector searching operation
according to the present invention;
FIG. 12 is a view showing a motion vector searching operation
according to the present invention;
FIG. 13 is a view showing a motion vector searching operation
according to the present invention;
FIG. 14 is a view showing a motion vector searching operation
according to the present invention;
FIG. 15 is a view showing a motion vector searching operation
according to the present invention;
FIG. 16 is a view showing a motion vector searching operation
according to the present invention;
FIG. 17 is a block diagram showing a video coding apparatus
according to the sixth embodiment of the present invention;
FIG. 18 is a view showing an input picture format in the sixth
embodiment;
FIG. 19 is a view showing the hierarchical structure of coding
units in the sixth embodiment;
FIGS. 20A and 20B illustrate the arrangement of a group of pictures
and a coding sequence in the sixth embodiment, respectively;
FIGS. 21A and 21B are views for explaining prediction methods for
the respective pictures in the sixth embodiment;
FIG. 22 is a block diagram showing a video decoding apparatus
corresponding to the video coding apparatus in FIG. 17;
FIG. 23 is a block diagram showing an inter-field/inter-frame
adaptive prediction circuit in FIG. 17;
FIG. 24 is a view showing a telescopic search sequence in the sixth
embodiment;
FIGS. 25A and 25B are views for explaining inter-field/inter-frame
adaptive prediction processing in the sixth embodiment;
FIG. 26 is a view showing a manner of transmitting motion vectors
in the sixth embodiment;
FIGS. 27A and 27B are a detailed example in which the moving vector
of a color signal is obtained from a motion vector in the sixth
embodiment;
FIG. 28 is a flow chart showing part of the process of coding and
rate control in the sixth embodiment;
FIG. 29 is a flow chart showing the remaining part of the process
of coding and rate control in the sixth embodiment;
FIG. 30 is a view showing allocation of amounts of bits to N
pictures in the sixth embodiment;
FIG. 31 is a view for explaining a method of determining the ratio
between the amounts of bits allocated to the P1 picture and the B
picture in the sixth embodiment;
FIG. 32 is a view showing a virtual buffer used for intra-picture
rate control in the sixth embodiment;
FIG. 33 is a block diagram showing the overall arrangement of a
picture decoding system according to the present invention;
FIG. 34 is a block diagram showing a variable length code decoder
according to an embodiment of the present invention;
FIG. 35 is a block diagram showing an arrangement of an input data
temporary storage circuit in FIG. 34;
FIG. 36 is a block diagram showing an arrangement of a code length
detection/decoded value transform circuit in FIG. 34;
FIG. 37 is a block diagram showing another arrangement of a code
length detection/decoded value transform circuit;
FIG. 38 is a block diagram showing a video coding apparatus
according to another embodiment of the present invention;
FIG. 39 is a view showing the relationship between to-be-coded
pictures and reference pictures to explain a video coding method
according to the present invention;
FIG. 40 is a view showing the relationship between to-be-coded
pictures and reference pictures to explain another video coding
method according to the present invention;
FIG. 41 is a block diagram of a DMUX/VLD shown in FIG. 22; and
FIG. 42 is a block diagram of an adaptive prediction circuit shown
in FIG. 22.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
According to an embodiment of the present invention shown in FIG.
1, a video coding apparatus comprises a coding circuit 14 for
coding a video signal, two field memories 15 and 16 for storing the
coded video signals of past two fields, a motion vector searching
circuit 17, an interpolation circuit 19, and a local decoding
circuit 25.
A to-be-coded video signal 11 is coded by the coding circuit 14 by
using a reference video signal 12 output from the interpolation
circuit 19 and an optimum motion vector 13 output from the motion
vector searching circuit 17. The reference video signal 12 is
formed by the interpolation circuit 19 using a video signal, of the
coded video signals of past two fields stored in the field memories
15 and 16, which is located at a position designated by a motion
vector candidate 18 output from the motion vector searching circuit
17.
The motion vector searching circuit 17 is constituted by a
correlation arithmetic operation circuit 20, a motion vector
candidate generating circuit 21, an optimum vector decision circuit
22, and a switch circuit 23. During a motion vector searching
operation, motion vector candidates 24 sequentially generated by
the motion vector candidate generating circuit 21 are input to the
field memories 15 and 16 through the switch circuit 23, and a
correlation arithmetic operation between the reference video signal
12 generated by the interpolation circuit 19 and the to-be-coded
video signal 11 is performed by the correlation arithmetic
operation circuit 20 on the basis of these motion vector
candidates. The optimum vector decision circuit 22 stores a motion
vector at which the correlation between the to-be-coded video
signal 11 and the reference video signal 12 is maximized, and
outputs the optimum motion vector 13 to the coding circuit 14 and
the switch circuit 23 upon completion of the motion vector
searching operation. The to-be-coded video signal 11 is coded by
the coding circuit 14 in accordance with this optimum motion vector
13 and the optimum reference video signal 12 from the interpolation
circuit 19. As a result, coded data 26 is output from the coding
circuit 14.
The local decoding circuit 25 forms a local decoded video signal 27
on the basis of the optimum reference video signal 12 output from
the interpolation circuit 19 and the coded data 26 output from the
coding circuit 14. The local decoded video signal 27 is input to
either the field memory 15 or the field memory 16 through a switch
circuit 28. The output from the field memory 15 or 16 is input to
the interpolation circuit 19 through a switch circuit 29. In this
case, the switch circuits 28 and 29 are switched such that two
video signals for forming a predetermined reference picture for a
to-be-coded video are input to the interpolation circuit 19.
The interpolation circuit 19 comprises an intra-field interpolation
circuit 30, multipliers 31 and 32, and an adder 33. The
interpolation circuit 19 forms the reference video signal 12 by
mixing a signal, formed by the intra-field interpolation circuit 30
using an output signal from the field memory 15, with an output
signal from the field memory 16 at a mixing ratio of k:1-k.
The motion vector candidate 18 output from the motion vector
searching circuit 17 is also input to the interpolation circuit 19
to control a parameter k for determining the mixture ratio between
output signals from the field memories 15 and 16. More
specifically, if the vertical component of the motion vector
candidate 18 corresponds to intra-field n+1/2 lines (n is an
integer), control is performed to set k=1 so that a corresponding
pixel value stored in the field memory 15 (in this case, it is
assumed that a video close to a to-be-coded video is stored in the
field memory 15) is directly output as the reference video signal
12.
If the vertical component of the motion vector candidate 18
corresponds to intra-field n lines, an interpolation value
(.DELTA.) for the video signal of an adjacent field is formed by
using output signals from the field memories 15 and 16. More
specifically, as shown in FIG. 2, an interpolation value (.DELTA.)
49 serving as a reference video signal for a pixel value 45 in a
to-be-coded video signal 44 is the sum of a value, obtained by
multiplying the average value of pixel values 46 and 47 formed by
intra-field interpolation by km, and a value obtained by
multiplying a pixel value 48 by (1-k). If the absolute value of the
vertical component of the motion vector candidate 18 is larger than
a certain threshold value, it is considered that the motion of the
reference picture is also large. In this case, since it is proper
that the interpolation value (.DELTA.) is interpolated by mainly
using a signal adjacent to the interpolation value (.DELTA.) of the
same field stored in the field memory 15, the parameter k is
increased. In contrast to this, if the absolute value of the
vertical component of the motion vector candidate 18 is smaller
than the threshold value, it is considered that the motion of the
reference picture is also small. In this case, since it is proper
that the interpolation value (.DELTA.) is interpolated by mainly
using a signal adjacent to the interpolation value (.DELTA.) in the
field memory 16, the parameter k is decreased.
Note that if the to-be-coded picture 44 is adjacent to a reference
picture 41, as shown in FIG. 2, the effective use of a signal from
the field memory 16, obtained by decreasing the parameter k to a
value close to "0", as an interpolation value for the reference
picture 41 is almost limited to the case wherein the absolute value
of the vertical component of the motion vector candidate 18 is "0".
In contrast to this, when two adjacent fields (fields 1, 2, 7, and
8 in FIG. 3) are to be coded in advance in the coding sequence
shown in FIG. 3, i.e., at intervals of a plurality fields (six
fields in FIG. 3), and the remaining four fields (fields 3 to 6 in
FIG. 3) are to be subjected to prediction coding for motion
compensation by using the two adjacent coded fields, a to-be-coded
picture 53 and a reference picture 51 may be relatively spaced
apart from each other, as shown in FIG. 4. In such a case, a signal
from the field memory 16, obtained by decreasing the parameter k to
a value close to "0", can be effectively used as an interpolation
value for the reference picture 51 even if the absolute value of
the vertical component of the motion vector is not "0".
The second embodiment of the present invention will be described
below with reference to FIG. 5. Since this embodiment is the same
as the one shown in FIG. 1 except for the arrangement of an
interpolation circuit 19, a detailed description thereof will be
omitted.
The interpolation circuit 19 is constituted by two interpolators 34
and 35 and a switch circuit 36 for selecting one of outputs from
the interpolators 34 and 35. Each of the interpolators 34 and 35
comprises an inter-field interpolation circuit 30, multipliers 31
and 32, and an adder 33, similar to the interpolation circuit 19
shown in FIG. 1.
The relationship between a plurality of video signals in the
embodiment shown in FIG. 5 will be described below with reference
to FIGS. 6A and 6B.
The first interpolator 34 generates an interpolation value which is
effective when the motion magnitude of a reference picture 61 is
large. That is, as shown in FIG. 6A, when the vertical component of
a motion vector candidate 18 corresponds to intra-field n+1/2 lines
(n is an integer), a corresponding pixel value .largecircle. stored
in a field memory 15 (in this case, it is assumed that a picture
close to a to-be-coded picture 63 is stored in the field memory 15)
is directly output as a reference video signal (control is
performed to set k=1).
When the vertical component of the motion vector candidate 18
corresponds to intra-field n lines (n is an integer), the first
interpolator 34 forms an interpolation value .DELTA. on the basis
of the average value of two pixel values .largecircle. of the video
signal 61 in the field memory 15 which are adjacent to a pixel
.DELTA. to be interpolated.
The second interpolator 35 generates an interpolation value which
is effective when the motion magnitude of the reference picture 61
is small. That is, as shown in FIG. 6B, when the vertical component
of the motion vector candidate 18 corresponds to intra-field n+1/2
lines (n is an integer), a corresponding pixel value .largecircle.
stored in the field memory 15 (in this case, it is assumed that a
picture close to the to-be-coded picture 63 is stored in the field
memory 15) is directly output as a reference video signal (control
is performed to set k=1). If, for example, n=0, a reference video
signal corresponding to a to-be-coded video signal 64 corresponds
to a pixel value 65.
In addition, when the vertical component of the motion vector
candidate 18 corresponds to intra-field n lines, the second
interpolator 35 sets an adjacent signal .largecircle. in a field
memory 16 as a pixel value .circleincircle. to be interpolated. If,
for example, n=0, an interpolation value 68 of a reference video
signal for the to-be-coded video signal 64 corresponds to a pixel
value 66.
If the vertical component of the motion vector candidate 18
corresponds to intra-field n/2+1/4 lines, an interpolation value
.DELTA. is formed on the basis of the average value of pixel values
.largecircle. from the field memories 15 and 16 which are adjacent
to a pixel .DELTA. to be interpolated. For example, if n=0, an
interpolation value 67 of a reference video signal for the
to-be-coded video signal 64 corresponds to the average value of the
pixel values 65 and 66.
In this manner, the interpolator 35 can generate reference video
signals in units of intra-field 1/4 lines to realize effective
motion compensation for a high-resolution picture having a small
motion magnitude.
In this case, the switch circuit 36 selects an output from the
interpolator 34 when the absolute value of the vertical component
of the motion vector candidate 18 output from a motion vector
searching circuit 17 is larger than a certain threshold value. When
this value is smaller than the threshold value, the switch circuit
36 selects an output from the interpolator 35. With this operation,
proper reference video signals are output. According to another
switching method, when the absolute value of the vertical component
of the motion vector candidate 18 output from the motion vector
searching circuit 17 is large, an output from the interpolator 34
is selected, whereas when the value is small, outputs from both the
interpolators 34 and 35 are used as reference video signals, and a
correlation arithmetic operation between the two signals is per
formed to select one of the outputs. This method is also
effective.
A video coding apparatus according to the third embodiment of the
present invention will be described below.
Referring to FIG. 7, picture data input to an input terminal 001 is
temporarily stored in an input buffer memory 100. The data is then
read out, as partial picture data, from the input buffer memory 100
in units of partial regions, each constituted by a plurality of
pixels, in the order of to-be-coded pictures. Partial picture data
read out from the input buffer memory 100 is input to a motion
vector detection circuit 200. The motion vector detection circuit
200 obtains a partial picture, from pictures coded and reproduced
in the past, which can efficiently code the input data, and outputs
a motion vector (address data) indicating the region data and
position of the partial picture.
The partial picture data output from the input buffer memory 100 is
also input to a local coding circuit 300 together with the partial
picture data and the motion vector output from the motion vector
detection circuit 200. The local coding circuit 300 codes either
the partial picture data output from the input buffer memory 100 or
difference data relative to the partial picture data designated by
the motion vector. In this case, the coded data corresponding to
the difference relative to the region designated by the motion
vector includes data obtained by variable length coding of the
motion vector.
The data coded by the local coding circuit 300 is input to a local
decoding circuit 400 to be decoded together with the partial
picture data output from the motion vector detection circuit 200.
With this operation, a reproduced picture is obtained. In addition,
if the data is coded by using a motion vector, the decoded data is
added to the partial picture data output from the motion vector
detection circuit 200 to obtain a reproduced picture. This
reproduced picture data is input to the motion vector detection
circuit 200 and is temporarily stored to code the next input
picture data.
Operations of the motion vector detection circuit 200, the local
coding circuit 300, and the local decoding circuit 400 will be
described in detail below.
In the motion vector detection circuit 200, data input from the
input buffer memory 100 are sequentially written in picture
memories (211 to 214), in which picture data unnecessary to search
for motion vectors are stored, under the control of a write control
circuit 222. In this manner, coded picture data stored in the
picture memories (211 to 214) are sequentially read out in units of
regions, starting from a picture close in time to the coded
picture, by a read control circuit 221 and a data switching circuit
231. The data are then input to a difference circuit 241. The
difference circuit 241 calculates the differences between the coded
picture data and input data in units of regions.
An estimation circuit 242 sequentially compares the sums of
differences in units of regions to control the searching direction
of the read control circuit 221 in the picture memories. Every time
a partial region of a coded picture which is less different from
the input partial picture than the previously detected partial
picture is detected, the estimation circuit 242 causes a vector
register 243 to store address data indicating this region of the
partial picture, thus obtaining a partial region of the coded
picture which is closest to the input partial picture. In this
manner, the address data indicating the partial region of the coded
picture which is least different from the input partial picture
stored in the vector register 243 is input to a read control
circuit 223 and a switch circuit 232. As a result, the reproduced
picture of coded data corresponding to the partial region of the
coded picture is read out from one of reproduced picture memories
(215 to 218), and is input to the local coding circuit 300 together
with its address data.
According to this embodiment, in the local coding circuit 300, DCT
(discrete cosine transformation) as one of orthogonal
transformation schemes, quantization, and variable length coding
are used as a coding method for a motion compensation error. In the
local coding circuit 300, partial picture data output from the
input buffer memory 100 is input to a difference circuit 311 so
that the difference between the partial picture data and partial
picture data obtained by reproducing coded data output from the
motion vector detection circuit 200 is calculated. A switch circuit
312 sequentially switches and outputs difference picture data input
from the difference circuit 311 and partial picture data input from
the input buffer memory 100 in accordance with control signals
input to a terminal 002.
A DCT circuit 320 sequentially frequency-converts partial data and
difference picture data sequentially output from the switch circuit
312, and outputs the resultant data. A quantizer 330 quantizes the
frequency-converted data output from the DCT circuit 320 with a
preset quantization step size, and outputs the resultant data. An
entropy coder 340 codes the quantized data together with its
quantization step size data and identification data indicating
whether the data is partial picture data or difference data. In
addition, in coding of difference picture data, the entropy coder
340 performs variable length coding of the data together with a
motion vector corresponding to the partial picture data output from
the vector register 243 by using Huffman codes or the like in
accordance with the respective occurrence probabilities. If this
identification code and the motion vector code are combined to form
one Huffman code, efficient coding can be realized. Furthermore, in
this coding, with regard to picture data obtained by reproducing
coded data in a region designated by a predetermined rule, or input
picture data which differs from fixed data by a predetermined value
or less, if the number of such successive partial pictures is coded
by variable length coding, the coding efficiency is further
improved.
An amount-of-bits estimation circuit 351 compares the amount of
bits of coded data, obtained by coding the difference between a
partial picture to be coded and a picture in a region designated by
a motion vector, with that of coded data obtained by directly
coding input data by DCT, and outputs coded data with higher coding
efficiency to an output buffer 360 and the local decoding circuit
400.
The output buffer 360 temporarily stores this coded data for
adjustment of the output data rate, and controls a quantization
step size used by the quantizer 330 and a coding table used by the
entropy coder 340.
In the local decoding circuit 400, the partial picture data output
from the motion vector detection circuit 200 is temporarily stored
in a data memory 441, and the coded data output from the
amount-of-bits estimation circuit 351 is input to a variable length
decoder 410, so that the motion vector including the identification
code and the quantized data before coding are decoded. This decoded
quantized data is input to an inverse quantizer 420 to be converted
(inversely quantized) into a typical value having a dynamic range
before quantization. The converted value is input to an adder 450.
The data inversely quantized by the inverse quantizer 420 is input
to an inverse DCT circuit 430 so that the partial picture or the
difference picture data is reproduced. A gate circuit 442 permits
the passage of the partial picture data output from the data memory
441 if it is determined on the basis of the identification code
decoded by the variable length decoder 410 that the reproduced data
output from the inverse DCT circuit 430 is difference picture data.
Otherwise, the gate circuit 442 sets the output data to be "0". The
gate circuit 442 outputs the resultant data to the adder 450.
If the picture data subjected to inverse DCT in this manner
corresponds to a coded difference picture, the data is added to the
partial picture data output from the motion vector detection
circuit 200. Otherwise, a reproduced picture is obtained by the
adder 450 without using the partial picture data output from the
motion vector detection circuit 200. This reproduced picture data
is input to the motion vector detection circuit 200 to be
temporarily stored so as to be used for coding the next input
picture data.
FIG. 8 shows the fourth embodiment of the present invention. This
embodiment is different from the previous embodiment in that a data
memory 460 for storing quantized data is used in place of the
variable length decoder 410 included in the decoding circuit 400
shown in FIG. 7. In this case, data obtained by per forming DCT and
quantization of difference picture data based on the difference
between the partial picture data output from an input buffer memory
100 and partial picture designated by a corresponding motion vector
are temporarily stored in the data memory 460. Thereafter, picture
data corresponding to coded data selected by a amount-of-bits
estimation circuit 352 and output from an output buffer 360 is
output to an inverse quantizer 420. If this picture data is
difference picture data, the data is added to partial picture data
output from a data memory 441 by an adder 450 to obtain a
reproduced picture, as in the embodiment shown in FIG. 7.
In this embodiment, since no arithmetic operation is required for
decoding, the processing time is shorter than that in the
embodiment shown in FIG. 7.
FIG. 9 shows the fifth embodiment of the present invention, in
which part of a decoding circuit 400 is used as a motion vector
detection circuit. Similar to the above-described embodiments, in
this embodiment, a motion vector detection circuit 200 reads out
partial picture data of decoded and reproduced picture data through
a switch circuit 232 under the control of a read control circuit
224, and outputs a motion vector (address data) indicating the
position of the corresponding region.
Referring to FIG. 9, similar to the embodiments shown in FIGS. 7
and 8, picture data input to an input terminal 001 are temporarily
stored in an input buffer memory 100. The data are then read out
from the input buffer 100 in units of regions, each consisting of a
plurality of pixels, in the order of to-be-coded pictures, and are
input to a local coding circuit 300. In response to a control
signal input to an input terminal 002, DCT, quantization, and
coding of partial picture data output from the input buffer memory
100 are per formed without calculating the difference between the
partial picture data and output data from the motion vector
detection circuit 200, which is input through a gate circuit 313,
in the same manner as in the embodiments shown in FIGS. 7 and 8.
The quantized data is stored in a data memory 460, and the coded
data and the amount-of-bits data are stored in an amount-of-bits
estimation circuit 353. Subsequently, a different control signal is
input to the input terminal 002 to calculate the difference between
the partial picture output from the input buffer memory 100 and the
partial picture data output from the motion vector detection
circuit 200, and DCT, quantization, and variable length coding of
the difference data are performed in the same manner as in the
embodiments shown in FIGS. 7 and 8.
The amount-of-bits estimation circuit 353 estimates coded data from
an entropy coder 340 and controls the read control circuit 224 in
accordance with the estimation result. With this operation, every
time the amount-of-bits estimation circuit 353 detects a region in
which the amount of bits can be further reduced, the circuit 353
causes the read control circuit 224 to store corresponding coded
data. In addition, the circuit 353 causes a data memory 441 to
store partial picture data output from the motion vector detection
circuit 200, and causes the data memory 460 to store data obtained
by performing DCT and quantization of a corresponding difference
picture. In this manner, coded data having the minimum amount of
bits is finally output to an output buffer 360. Quantized data
corresponding to this coded data is reproduced by the local
decoding circuit 400 in the same manner as in the embodiments shown
in FIGS. 7 and 8. The reproduced data is input to the motion vector
detection circuit 200 to be temporarily stored so as to be used for
coding the next input picture data.
In the embodiments shown in FIGS. 7 and 8, a more accurate motion
vector is obtained as an optimum motion vector. In contrast to
this, in the embodiment shown in FIG. 9, a motion vector realizing
the maximum coding efficiency is obtained.
FIG. 10 shows a reproduction circuit for reproducing coded data in
the embodiments shown in FIGS. 7 to 9. In this circuit, a variable
length decoder 510 decodes coded data input from an input terminal
004 to reproduce quantization step size data, motion vector data
(including an identification code indicating whether the reproduced
data is difference picture data relative to a partial picture
designated by the motion vector), and quantized data. The quantized
data is reproduced through an inverse quantizer 520 and an inverse
DCT circuit 530. If this reproduced data is difference picture data
relative to the partial picture designated by the motion vector, a
read control circuit 621 reads out the corresponding partial
picture data from one of reproduced picture memories 611 to 614,
outputs the data from a motion vector detection circuit 600, and
inputs it to a local decoding circuit 500. This partial picture
data is added to the difference picture data by an adder 550
through a gate circuit 540 to form reproduced picture data. The
reproduced picture data is input to the motion vector detection
circuit 600 to be temporarily stored so as to be used for
reproduction of the next input coded data. In addition, the
reproduced picture data is input to an output buffer 560 to be
output in the original picture order.
According to the above embodiment, it is possible to subject to an
interlaced video a predictive coding based on a motion compensation
method with good precision, without a motion magnitude detection
with respect to the reference picture. Accordingly, the video
coding apparatus can be simply constructed, since no motion
magnitude detection circuit is required. Further, the motion
compensation prediction coding can be applied to a coding sequence
by which a motion magnitude detection so as to correspond to a
special reproduction performed by a VTR, for example, cannot be
performed.
Motion vector searching operations to be performed in the
estimation circuits 242 and the read control circuits 221 in FIGS.
7 and 8 and in the amount-of-bits estimation circuit 353 and the
read control circuit 224 in FIG. 9 will be described below.
FIGS. 11 to 16 respectively show examples of motion vector
searching processing in the present invention. Referring to FIGS.
11 to 16, each of reference symbols s1, s2, . . . , s6 denotes one
frame or one field picture, and each of reference numerals 101,
102, . . . , 120 denotes one pixel or a plurality of pixels in the
horizontal or vertical direction.
In the motion vector searching operation shown in FIG. 11, in
searching for a partial picture to be coded (e.g., s4-104), all the
pixels (101 to 120) of the coded picture s3 close in time to the
to-be-coded picture are searched, and the search range (area) in
the coded picture s2 remoter in time from the to-be-coded picture
than the coded picture s3 is limited to a certain range (e.g., 105
to 115) on the basis of an optimum motion vector obtained upon this
searching operation, i.e., a motion vector realizing the highest
coding efficiency or the most accurate motion vector (e.g.,
s3-107).
In a searching operation in the coded picture s2 which is remoter
in time from the to-be-coded picture, an optimum motion vector
(e.g., s2-109 or s3-107) is obtained in addition to the previously
obtained optimum motion vector (e.g., s3-107).
Subsequently, the difference between a picture in the region
designated by the obtained optimum motion vector (e.g., s2-107) and
the partial picture to be coded (e.g., s4-104), i.e., a motion
compensation error, is obtained, and the optimum motion vector and
the motion compensation error are coded.
The motion vector searching operations shown in FIGS. 12 to 14 are
suitable for a case wherein a picture (e.g., s3) close in time to a
to-be-coded picture (e.g., s4) is coded by using a motion vector
between this picture (s3) and a picture (e.g., s1) which is remoter
in time from the to-be-coded picture. In this case, since a motion
vector between a partial picture (s4-104) of the to-be-coded
picture and the picture (s3) close in time thereto can be
predicted, the search range in the picture (s3) close in time to
the to-be-coded picture is limited to, e.g., the range between
partial regions 103 and 112. In addition, if a motion vector
realizing high coding efficiency is obtained between the
to-be-coded picture (s4) and the picture (s3) close in time
thereto, the range of regions as vector candidates between the
partial region (s4-104) of the to-be-coded picture and a picture
(e.g., s2) which is remote in time therefrom can be limited to a
narrow range, e.g., the range between partial regions 107 to 110,
as shown in FIGS. 13 and 14, by using a motion vector between a
partial region (e.g., s1-110) of a picture remoter in time from the
to-be-coded picture than a partial region (e.g., s3-107) closer in
time to the to-be-coded picture, which is designated by the motion
vector, and the partial region (s4-104) of the to-be-coded
picture.
The motion vector searching operation shown in FIG. 15 is suitable
for a case wherein since the motion between pictures is large, no
motion vector candidate exists even in the picture s3 close in time
to the to-be-coded picture s4, or a case wherein since a motion
vector (e.g., s3-116 to s4-104) obtained as a candidate in the
picture s3 close in time to the to-be-coded picture is very large,
it can be predicted that no motion vector realizing high coding
efficiency exists in the picture s2 remote in time from the
to-be-coded picture.
In such a case, a search for a motion vector is not performed in
the picture s2 which is remote in time from the to-be-coded
picture, or the search range can be further limited to, e.g., the
range between partial regions 116 and 120. With this operation, a
motion vector realizing high coding efficiency can be obtained with
a small arithmetic operation amount required for a motion vector
searching operation.
The motion vector searching operation shown in FIG. 16 is suitable
for a case wherein a motion vector (e.g., s3-107 to s4-104)
obtained between the partial picture s4-104 of the to-be-coded
picture and the picture s3 close in time thereto is relatively
large, or a case wherein no proper motion vector can be obtained.
In such a case, a search for a motion vector in the picture s2
remote in time from the to-be-coded picture is not performed. A
motion vector searching operation in the picture s2 remote in time
from the to-be-coded picture, however, is performed by limiting the
search range in the picture s2 to the range between partial regions
near the position of the partial picture of the to-be-coded
picture, e.g., the range between the partial regions 103 and 105,
only when the motion vector between the to-be-coded picture and the
picture s3 close in time thereto is small.
If the motion between pictures close in time to each other is
large, since a blur caused by the motion exists even in one picture
obtained by normal photography using a camera, it is difficult to
produce differences in units of 1/2 pixels in motion compensation
between the pictures. That is, in such a case, even if a motion
vector is obtained between the to-be-coded picture and a picture
remoter in time therefrom, the coding efficiency cannot be improved
much in spite of an increase in arithmetic operation amount for the
searching operation. According to the example shown in FIG. 16, in
the case wherein the motion between pictures is large, since a
motion vector between the to-be-coded picture and a picture remote
in time there from is not coded, the number of types of coding is
decreased, and the coding efficiency is further improved.
According to the above embodiment, an accurate motion compensation
and a high coding efficiency can be attained notwithstanding that
an amount of arithmetic operation required for the motion vector
searching is decreased.
Note that as a coding method, the present invention can employ any
one of the following methods: intra-frame coding, intra-field
coding, inter-frame difference coding, and inter-field difference
coding, and can use such a method in combination with another
coding scheme. In addition, the motion vector searching method of
the present invention can be applied to a case wherein pictures
coded in advance at intervals of an arbitrary number of pictures
are used as reference pictures, and a motion vector in a picture
between the reference pictures is obtained. Therefore, a search for
a motion vector can be performed by using pictures before and after
a to-be-coded picture.
A video coding apparatus according to the sixth embodiment of the
present invention will be described below with reference to FIG.
17.
In this embodiment, a coding algorithm basically employs a motion
compensation and DCT scheme, and the input picture format is an
interlace format such as a picture format defined by CCIR Rec.
601/525. FIG. 18 shows this picture format. Referring to FIG. 18,
reference symbol Y denotes a brightness signal; and Cr and Cb,
color signals, respectively. The number of pixels of each signal
per field is also shown in FIG. 18.
Coding units are arranged in a hierarchy. That is, as shown in FIG.
19, "block", "macroblock", "slice", "picture", "group of pictures"
(not shown), and "sequence" (not shown) are hierarchically arranged
in the order named. A block is constituted by 8.times.8 pixels. DCT
is performed in units of blocks. A macroblock is constituted by a
total of four blocks, i.e., two Y blocks, one Cr block, and one Cb
block. Motion compensation and selection of the respective coding
modes are performed in units of macroblocks.
A group of pictures (GOP) has the following arrangement. Pictures
are roughly classified into I, P, and B pictures depending on the
types of modes allowed as prediction modes and selected in units of
macroblocks. The modes include four modes (to be described in
detail later), i.e., an intra-field mode (Intra), a forward
prediction mode (Inter: including motion compensation), a backward
prediction mode, and a bidirectional interpolation prediction mode.
As shown in Table 1, pictures are classified into three types,
i.e., I, P, and B pictures depending on the types of prediction
modes to be used. These pictures I, P, and B are coded in
accordance with quantization step sizes determined by a relation of
I<P<B. Namely, the picture I is coded with a quantization
step size smaller than in the picture P, and the picture P with a
smaller quantization step size than in the picture B. The ratio of
the amounts of bits of these coded pictures is previously
determined.
TABLE 1 ______________________________________ Picture Type
Macroblock Mode which can be selected
______________________________________ I picture .cndot.
intra-field mode P picture .cndot. intra-field mode .cndot. forward
prediction mode B picture .cndot. intra-field mode .cndot. forward
prediction mode .cndot. backward prediction mode .cndot.
bidirectional interpolation prediction mode
______________________________________
This embodiment employs an interlace format as a coding picture
format. In this format, since different prediction methods are used
even for pictures of the same type depending on their positions in
GOP, the pictures can be further classified.
The arrangement of GOP and the manner of predicting pictures from
specific pictures will be described below with reference to FIGS.
20A and 20B.
As shown in FIGS. 20A and 20B, GOP is defined by a group of
pictures, starting from the B0 picture preceding one of the I
pictures, which are periodically arranged as entry points for
random access and special reproduction, and ending at the P2
picture located before the next I picture. The I pictures appear
only in even-numbered fields. In addition, Table 2 and FIGS. 21A
and 21B show the manner of predicting the respective pictures from
specific pictures.
TABLE 2 ______________________________________ Picture Type
Prediction Method ______________________________________ I .cndot.
No prediction P0 .cndot. Inter-field forward prediction from past I
P1 .cndot. Inter-field/Inter-frame adaptive forward prediction form
past I and P0 (or P1 and P2) P2 .cndot. Selection from
inter-field/inter-frame adaptive forward prediction from past I and
P0 (or P1 and P2) and inter-field forward prediction from past P1
B0 to B3 .cndot. Selection from inter-field/inter-frame adaptive
forward prediction from past I and P0 (or P1 and P2),
inter-field/inter-frame adaptive backward prediction from I and P0
(or P1 and P2) in the future, and bidirectional interpolation
prediction from past P1 and P2 (or I and P0) and P1 and P2 (or I
and P0) in the future. ______________________________________
There are two prediction methods: an inter-field prediction method
in which prediction is performed only from an even-numbered field;
and an inter-field/inter-frame adaptive prediction method in which
prediction is adaptively performed from even- and odd-numbered
fields in units of fields. Referring to FIGS. 21A and 21B, each
arrow indicates that inter-field prediction is performed, and each
symbol consisting of one arrow and two lines combined with each
other by the arrow indicates that inter-field/inter-frame adaptive
prediction is performed. FIG. 20B shows the coding order of the
respective pictures within one GOP.
In consideration of the above-described points, the video coding
apparatus shown in FIG. 17 will be described below.
Referring to FIG. 17, an interlaced video signal is input to an
input terminal 700. Successive components of the input video signal
which correspond to eight fields are stored in field memories 701.
The video signal in the field memories 701 is supplied to a first
motion vector detector 710, and motion vector detection is
performed by a telescopic search (to be described later) using an
original motion picture with a precision of one pixel. This
processing is the first step of motion compensation.
Subsequently, in preprocessing for the second step of motion
compensation, the motion vector obtained by the motion vector
detector 710 using the original motion picture is refined by
searching a whole range of .+-.1 pixels around the target pixel
using local decoded signals stored in field memories 708 in a local
decoder loop.
In the second step of motion compensation, motion vector detection
is performed with a precision of 1/2 pixels by the field memories
708, an inter-field/inter-frame adaptive prediction circuit 709,
and a second motion vector detector 711 using the local decoded
signals. As a result, a prediction signal is generated by the
prediction circuit 709. This prediction signal is input to a
subtracter 702, and the difference between the prediction signal
and the video signal from the field memories 701 is calculated. The
difference is then output as a predictive error signal.
Discrete cosine transformation of the predictive error signal is
performed by a DCT circuit 703 to obtain DCT coefficient data. The
DCT coefficient data is quantized by a quantizer 704 and is
subjected to adaptive scanning. The resultant data is then input to
a multiplexer 714 through a two-dimensional variable length coding
circuit 712. The quantized DCT coefficient data is locally decoded
through an inverse quantizer 705 and an inverse DCT circuit 706,
and only the I and P pictures are written in the field memories
708. The number of field memories 708 to be prepared corresponds to
four fields required for adaptive prediction.
In the multiplexer 714, a motion vector (a motion vector to a base
field, a motion vector to a non-base field (in case of a frame
interpolation mode) and a field parity data indicating which field
is a base field (described hereinafter)) and a prediction mode data
(described hereinafter), which are sent out from the second motion
vector detector 711; attendant data (data indicating a prediction
direction (forward, backward, or bidirectional), and a field
number) and a step size, which are sent out from a coding
controller 717; and DCT coefficient data are multiplexed. The data
multiplexed by the multiplexer 714 is transmitted at a constant
transmission rate through a buffer 715 to be output to a storage
device such as a VTR or a recording device.
The coding controller 717 controls the quantization step size in
the quantizer 704 on the basis of the buffer capacity of the buffer
715 and an intra-macroblock activity (I picture) calculated by an
activity calculator 716 or the intra-macroblock activity (P or B
picture) of a signal before quantization in the same mode, located
immediately before the current picture.
Motion compensation and motion vector detection in this embodiment
will be described in detail below.
The motion compensation is performed in units of macroblocks, as
described above. The motion vector detection includes motion vector
detection performed in the first motion vector detection circuit
710 by normal block matching with a precision of one pixel, and
motion vector detection performed in the second motion vector
detection circuit 711, which searches for motion vectors at points
obtained by performing adaptive spatial-temporal interpolation with
respect to pixel positions set, with a precision of 1/2 pixels,
around a reference picture designated by the motion vector with a
precision of one pixel. A method of performing spatial-temporal
interpolation will be described in detail later.
The motion vector detection in the first motion vector detection
circuit 710 is performed to search for an optimum motion vector by
searching all the pixels of each field with a precision of one
pixel using each field picture of an input video. A searching
operation between fields spaced apart from each other is per formed
by using telescopic search processing (refer to Simulation Model
No. 3). With regard to field pictures, a plurality of paths can be
considered as search paths between fields. In this embodiment,
search paths are determined in accordance with the following
rules:
1) A search between in-phase fields is performed by using only
in-phase fields.
2) A search between reverse phase fields is performed by using as
little reverse phase fields as possible.
A search between reverse phase fields must be included in search
paths only once. In this embodiment, as indicated by the telescopic
search order shown in FIG. 24, fields having difference phases are
searched first. For example, when a motion vector between the I and
P2 pictures is to be detected, search processing is performed in
the order of I P0 B1 B3 P2 but is not performed in the order of I
B0 B2 P1 P2. Note that the hatched portion in FIG. 24 indicates a
search range. This telescopic search processing is independently
performed in the forward and backward directions. In addition, the
search range between adjacent fields is defined by horizontal
.+-.15 pixels and vertical .+-.7 pixels.
The motion vector detection in the second motion vector detection
circuit 711 is performed by using pictures stored in the field
memories 708 in the local decoding loop as reference pictures. In
preprocessing for this motion vector detection, the motion vector
is refined by searching a whole range of .+-.1 pixels around the
reference picture designated by the motion vector obtained by the
first motion vector detection circuit 710. In the main process of
the motion vector detection in the second motion vector detection
circuit 711, the values of predictive error power of all the
prediction signal candidates generated, by a method to be described
later, at positions with a precision of 1/2 pixels around the
reference picture designated by the motion vector obtained by the
refining process are estimated and compared with each other,
thereby selecting an optimum prediction signal candidate.
Note that motion vector detection is not performed with respect to
a color signal, but motion compensation is performed on the basis
of a motion vector obtained with respect to a brightness
signal.
Processing to be performed by the inter-field/inter-frame adaptive
prediction circuit 709 shown in FIG. 17 will be described below
with reference to FIG. 23 which is a block circuit showing part of
the inter-field/inter-frame adaptive prediction circuit 709.
As described above, in the second step of motion compensation,
searching is performed in a range of 1/2 pixels around the
reference picture designated by the motion vector obtained by the
first motion vector detection circuit 710. FIG. 25B shows the state
of this processing. This inter-field/inter-frame adaptive
prediction is performed with respect to each of the P1, P2, and B
pictures by using, e.g., a pair of even- and odd-numbered fields #1
and #2 before a to-be-coded picture (field to be coded) as
reference pictures (reference fields).
Assume that optimum points (indicated by " ") at the reference
fields #1 and #2 are respectively obtained by the motion vector
detection circuit 710 using motion vectors V1 and V2 in the first
step of motion compensation shown in FIG. 25A. In the second step
of motion compensation shown in FIG. 25B, a plurality of prediction
signal candidates are obtained by performing spatial-temporal
filtering of reference pictures in ranges of 1/2 pixels around the
two optimum points designated by the motion vectors V1 and V2 in
the inter-field/inter-frame adaptive prediction circuit 709.
Subsequently, a search for an optimum motion vector is performed by
the motion vector detection circuit 711 on the basis of the
prediction signal candidates and the motion vectors detected by the
motion vector detection circuit 710. In this case, prediction
signal candidates are generated in two modes, i.e., a field
interpolation mode and a frame interpolation mode.
In the field interpolation mode, prediction signal candidates are
generated by using only a spatial filter. That is, the pixel values
of pixels "A" to "C" in FIG. 25B are obtained according to the
following equations:
In this field interpolation mode, there are nine search points for
prediction signal candidates for even- and odd-numbered fields
each, i.e., a total of 18 search points.
In contrast to this, in the frame interpolation mode, prediction
signal candidates are formed by performing spatial-temporal
filtering of signals obtained by performing motion compensation for
the respective fields. For example, the pixel values of the pixels
"A" to "C" in FIG. 25B are obtained according to the following
equations:
In this case, a field for providing data at a pixel position with a
precision of 1/2 pixels will be referred to as a base field. In
this frame interpolation mode, although each of even and
odd-numbered search fields has nine search points for prediction
signal candidates, since prediction signals at the positions of
pixels 0 coincide with each other, a total of 17 search points for
prediction signal candidates exist.
In the second step of motion compensation, searching of prediction
signal candidates at 35 search points, i.e., the sum of the search
points in the two modes, is performed, and a prediction signal
candidate having the minimum predictive error is determined as a
prediction signal. If, however, the directions of motion vectors
between two fields greatly deviate from each other, the frame
interpolation mode is not selected (to be described in detail
later).
Data indicating the selection of the field interpolation mode or
the frame interpolation mode (prediction mode data) and data
indicating a specific field selected as a reference field used for
the field interpolation mode or as a base field used for the frame
interpolation mode (field parity data) are transmitted by one-bit
flags. Note that prediction by means of fields immediately before
the P0 and P2 pictures is performed by only a mode in which the
field interpolation mode is adapted for a single field. In this
case, mode selection flags (prediction mode data and a field parity
data) are not transmitted.
Although the above description is associated with the P picture,
almost the same processing applies to the B picture. With regard to
the B picture, however, only fields which are in phase with a field
to be coded are selected as reference fields in the field
interpolation mode and as base fields in the frame interpolation
mode. It is apparent that in this case, a flag (field parity data)
indicating the selection of a specific field is not
transmitted.
The manner of realizing the above principle on hardware in practice
will be described below with reference to the block diagram of FIG.
23.
Of the video signals output from the field memories 708
corresponding to four fields, shown in FIG. 17, signal components
designated by motion vectors obtained by preprocessing for the
second motion vector detection are stored in cache memories 901a,
901b, 902a, and 902b. In this case, the wider range of signal
components are stored in the cache memories 901a, 901b, 902a and
902b to cope with the excessive signal components corresponding to
taps of a spatial filter described hereinafter. In accordance with
control signals from the second motion vector detection circuit
711, a switch circuit 900 distributes outputs from the cache
memories 901a and 901b to a temporal filter 903 and a spatial
filter 905, and also distributes outputs from the cache memories
902a and 902b to a temporal filter 904 and a spatial filter
906.
A selector 907 selects either a signal passing through only the
spatial filter 905 or a signal passing through both the spatial and
temporal filters 905 and 903 in accordance with a control signal
output from the second motion vector detection circuit 711 which
instructs whether to select the field interpolation mode or the
frame interpolation mode. A selector 908 has the same function as
that of the selector 907. Assume that a signal representing a pixel
"G" shown in FIG. 25B is input from the switch circuit 900 to the
temporal filter 903, and a signal representing a pixel "B" shown in
FIG. 25B is output from the spatial filter 905. In this case, a
signal obtained by averaging the signals representing the pixels
"G" and "B", i.e., temporal-space filtering, can be obtained as an
output from the temporal filter 903. Therefore, an output from the
spatial filter 905 is selected in the field interpolation mode, and
an output from the temporal filter 903 is selected in the frame
interpolation mode.
Outputs from the selectors 907 and 908 are directly input to a
selector 911. At the same time, the outputs are added together by
an adder 909, and the product is divided by 2 by a divider 910. The
resultant signal is then input to the selector 911. The selector
911 selectively outputs these three inputs, i.e., the signal
predicted in the forward direction by using the reference fields #1
and #2 as an output from the selector 907, the signal predicted in
the backward direction by using the reference fields #3 and #4 as
an output from the selector 908, and the signals bidirectinally
predicted in the forward direction by averaging the outputs of the
selectors 907 and 908 as an output from the divider 910. In other
words, the second motion vector detection circuit 711 supplies a
control signal to the inter-field/inter-frame adaptive prediction
circuit 709 to cause all the candidates of these predictive signals
to produce sequentially therefrom. For each of the forward and
backward predictions of these prediction signal candidates (35
candidates for one side direction), a prediction signal candidate
having the minimum predictive error is determined as a prediction
signal by the motion vector detection circuit 711 in the
above-described manner. The motion vector detection circuit 711
then sends data indicating this determination to the
inter-field/inter-frame adaptive prediction circuit 709. In B
picture capable of predicting in both directions, the most suitable
candidate among forward prediction candidates, the most suitable
candidate among backward prediction candidates and an average
candidate between both of the most suitable candidates are
obtained. Next, the candidate having the smallest prediction error
among the three candidates (i.e., two suitable candidates and
average candidate) is selected, and the data indicating the
selected candidate is sent to the inter-field/inter-frame adaptive
prediction circuit 709. With this operation, the
inter-field/inter-frame adaptive prediction circuit 709 outputs the
prediction signal designated by the motion vector detection circuit
711 to the subtracter 702 and an adder 707.
The manner in which a motion vector from the second motion vector
detection circuit 711 is coded to be transmitted will be described
below.
A motion vector in the field interpolation mode or a motion vector
for the base field in the frame interpolation mode is coded as a
base vector by a variable length coding circuit 713. The base
vector is then transmitted to the multiplexer 714. This motion
vector (base vector) has a precision of 1/2 pixels. If the frame
interpolation mode is selected, the difference between a motion
vector at a non-base field and a motion vector obtained by scaling
a motion vector in the base field (base vector) into a value at the
non-base field is coded and transmitted. That is, the difference
between a point nearest to the point at which the extended line of
the motion vector in the base field crosses the non-base field and
the motion vector for the non-base field is transmitted with a
precision of one pixel. If the difference between the directions of
these two motion vectors exceeds a range of .+-.1, the frame
interpolation mode is not selected, assuming that the frame
interpolation mode is not effective unless the directions of the
two motion vectors are close to each other.
FIG. 26 shows a detailed example of how the above-mentioned motion
vector is transmitted, in which the difference (indicated by the
arrow (difference vector d) extending in the vertical direction in
FIG. 26) between a motion vector with a precision of one pixel
(indicated by the arrow (motion vector b) on the lower side in FIG.
26) in the reference field #2 and a point " " nearest to a point
".DELTA." at which a motion vector with a precision of 1/2 pixels
(indicated by the arrow (motion vector a) on the upper side in FIG.
26) in the reference field #1 crosses the reference field #2 is
transmitted with a precision of one pixel. In the case shown in
FIG. 26, the difference is -1. With this operation, the data amount
of a motion vector can be saved without causing a deterioration in
prediction performance.
In summary, the variable length coder 713 subjects the base vector
of motion vector data sent from the second motion vector detector
711 to a variable length coding without any modification. However,
as for coding of non-base vector of the motion vector data, the
base vector is scaled into a motion vector in the non-base field, a
difference between the non-base vector and the scaled motion vector
is calculated and it is subjected to the variable length coding.
The motion vector data includes the field parity data described
above. In addition, the prediction mode data is also sent from the
second motion vector detector 711. These data are also subjected to
a variable-length coding.
Motion compensation for color signals will be described next.
As shown in FIG. 19, in one macroblock, the brightness signal Y and
the color signals Cr and Cb have the same number of pixels in the
vertical direction, but the number of each color signal is 1/2 that
of the brightness signal Y in the horizontal direction. For this
reason, when a motion vector obtained from a brightness signal is
to be applied to a color signal, the horizontal components of the
motion vector is reduced to 1/2. When the motion vector is divided
by 2, fractions are rounded to "0". This equally applies to both
the field interpolation mode and the frame interpolation mode.
FIGS. 27A and 27B respectively show detailed examples of how the
motion vector of a color signal is obtained from the motion vector
of a brightness signal in the field interpolation mode and the
frame interpolation mode. Referring to FIGS. 27A and 27B, the
broken circles indicate pixel positions where no color signals
exist. Assume that the pixel position obtained in the first step is
indicated by the circle at the central position, and the point
obtained in the second step is indicated by "x". Assume that in
either mode, the horizontal coordinate value of an origin is larger
than that of the circle at the central position. In the field
interpolation mode shown in FIG. 27A, if the motion vector of a
brightness signal is obtained at the position of the point "x", the
1/4-pixel-precision component of a color signal is rounded into the
direction toward zero to form an interpolated pixel ".DELTA.". This
interpolated pixel ".DELTA." is formed by
Similarly, in the frame interpolation mode shown in FIG. 27B, an
interpolated pixel .DELTA. in the reference field #1 is formed
by
In the reference field #2, the 1/2-pixel-precision component of a
color signal is rounded in the direction toward zero, and a pixel
at a position "I" is used as an interpolated pixel. That is, in the
cases shown in FIG. 27B, the value of the spatio-temporally
interpolated pixel .DELTA. is obtained by
The arrangement of a video decoding apparatus corresponding to the
video coding apparatus shown in FIG. 17 will be described below
with reference to FIG. 22.
A signal read out from a recording device is input to an input
terminal 800 and is temporarily stored in a buffer 801. The signal
read out from the buffer 801 is input to a demultiplexer/variable
length decoding circuit 802. As a result, DCT coefficient data,
motion vector data, and the attendant data such as step size data
multiplexed by the multiplexer 714 shown in FIG. 17 is separated
from the signal and is decoded.
More specifically, the DCT coefficient data is subjected to
two-dimensional variable length decoding and scan conversion. The
resultant data is then subjected to decoding through an inverse
quantizer 803 and an inverse DCT circuit 804, similar to the local
decoding loop in the video coding apparatus shown in FIG. 17. The
decoded data is input to an adaptive prediction circuit 807 through
an adder 805 and field memories 806 corresponding to four fields.
Since the motion vector data is transmitted as difference data, as
described above, the data is subjected to variable length decoding
and is subsequently restored to the original data. The restored
data is then supplied to the adaptive prediction circuit 807. The
adaptive prediction circuit 807 generates a prediction signal. When
this prediction signal and the signal from the inverse DCT circuit
804 are added together by the adder 805, the original video signal
is extracted through the field memories 806. The outputs from the
field memories 806 are supplied to a picture display circuit (not
shown).
In a picture display operation, the decoding order and the display
order are different from each other, and the B picture need not be
stored in the field memories because it is not used for prediction.
For these reasons, switching is performed such that the I and P
pictures are output from the field memories 806, while the B
picture is directly output while it is decoded.
The operation of the DMUX/VLD 802 in FIG. 22 will be described with
reference to a block diagram.
FIG. 41 is a block diagram showing the arrangement of the DMUX/VLD
802 in FIG. 22. Referring to FIG. 41, the sent data is separated by
a demultiplex circuit 1710 into DCT coefficient data, a step size,
prediction mode data, attendant data (data indicating a prediction
direction and a field number), and motion vector data (vector data
to the base field, difference vector data when a frame
interpolation mode is selected, and field parity data). These data
are input to VLD table ROMs 1719 to 1725 through FIFOs 1712 to 1718
and are subjected to a variable length decoding, respectively.
Thereafter, the DCT coefficient data and the step size are sent to
an inverse quantization circuit 803. The prediction mode data,
attendant data (data indicating prediction direction and field
number), field parity data and the motion vector data which
corresponds to the base field, among the decoded results of the
motion vector data are sent to an adaptive prediction circuit 807.
The motion vector corresponding to the base field is scaled by a
motion vector scaling circuit 1727 into a value which is derived by
scaling into a motion vector to the non-base field. This scaled
value is added to a difference vector by an adder 1728, thereby
obtaining a motion vector to the non-base field. The resultant
motion vector is sent to the adaptive prediction circuit 807.
During this calculation, the motion vector scaling circuit 1727
performs scaling using a distance between a field being decoded and
a predictive reference field calculated with reference to the data
indicating the prediction direction and the field number data. This
operation will be described with reference to FIG. 26.
In FIG. 26, the field distance between the base field #1 and the
field being decoded (this field has been referred to a to-be-coded
field for descriptive convenience for coding) can be calculated,
using the field number of base field #1 and that of the field being
decoded, and the data indicating the prediction direction of the
field being decoded. Assume that this field distance is "n". In
this case, to scale a motion vector a to the reference field #1
serving as the base field into a motion vector to the reference
field #2 serving as the non-base field, the following expression is
calculated, and the resultant fraction is rounded to a nearest
point, as indicated by a vector c to a black dot in FIG. 26.
The motion vector to the reference field #2 serving as the non-base
field is obtained by adding the difference vector d (-1 in FIG. 26)
to the scaled and rounded motion vector c. Referring to FIG. 26, if
the reference field #2 is the base field, the motion vector to the
base field #2 is multiplied with (n+1)/n to perform scaling into
the motion vector to the reference field #1 serving as the non-base
field. Either case is selected from the field parity data.
As a special case, if the field being decoded is a field adjacent
to the reference field #2 (n=2), a required multiplication can be
1/2 times, so that scaling can be performed by only a bit
shift.
An operation of the adaptive prediction circuit 807 in FIG. 22 will
be described with reference to a block diagram in FIG. 42. The
arrangement of the adaptive prediction circuit 807 is basically the
same as that of the inter-field/inter-frame adaptive prediction
circuit for an encoder shown in FIG. 23.
The picture areas represented by the integer part of a motion
vector (1/2 pixel precision) to the base field and the motion
vector to the non-base field (integer pixel precision) sent from
the DMUX/VLD 802 are loaded from the field memory 806 to the cache
memories 1729 and 1730 (the motion vector to the non-base field is
indefinite when the field interpolation mode is selected).
Of all inputs to the cache memories, inputs 1 and 2 are used to
load a picture corresponding to forward prediction, and inputs 3
and 4 are used to load a picture corresponding to backward
prediction. Inputs 1 and 3 correspond to odd-numbered fields, and
inputs 2 and 4 correspond to even-numbered fields. A switching
circuit 1731 selects an even- or odd-numbered field as the base
field. Control signal 1 for this switching is field parity data
sent from the DMUX/VLD 802. Base field picture data is connected to
a spatial filter 1733 (or 1735) and subjected to interpolation with
1/2 pixel precision. In this case, the coefficient of the spatial
filter is changed depending on the presence/absence of 1/2 pixel
precision component of the vertical and horizontal motion vector
(two control signals 2 are illustrated in FIG. 42 because the
spatial filter 1733 corresponds to forward prediction and the
spatial filter 1735 corresponds to backward prediction, and the
control signals are input to the corresponding spatial
filters).
An output from each spatial filter is filtered together with
non-base field picture data in a time direction by a temporal
filter 1732 (or 1734) to generate a signal corresponding to a frame
interpolation mode. A signal directly input from the spatial filter
1733 (or 1735) to a switching circuit 1736 (or 1737) is the signal
corresponding to the field interpolation mode. Of all input
signals, a filter output signal designated by the prediction mode
data (control signal 3) is selected by the switching circuit 1736
(or 1737) (two control signals 3 are illustrated in FIG. 42 due to
the same reason as control signals 2, so that control signals 3 are
also input to the corresponding switching circuits).
The signals corresponding to forward prediction and backward
prediction are input to an adder 1738 and a multiplier 1739,
thereby generating signals corresponding to bidirectional
interpolation prediction. Of these signals corresponding to the
forward prediction, the backward prediction and the bidirectional
interpolation prediction, a signal selected at the encoder side is
selected by a switching circuit 1740 in accordance with data
representing the prediction direction (control signal 4). The
selected signal is output as a predictive signal to the adder
805.
According to the above embodiment, a motion compensation prediction
suitable for an interlaced video by effectively using a number of
fields can be realized with high effectivity and precision while
suppressing the accompanied increase of motion vector
information.
Rate control will be described next with reference to the flow
charts shown in FIGS. 28 and 29. The rate control is performed in
the following three processes:
(1) allocating an amount of bits (a bit rate) to N pictures from
the I picture to the B3 picture immediately before the next I
picture;
(2) allocating and updating an amount of bits for each picture;
and
(3) controlling the quantization step size in each picture by using
a virtual buffer.
In this embodiment, an estimation amount called an activity is
calculated by the activity calculator 716, as described above, and
this estimation amount is used for rate control. The activity of
the I picture is the sum total of the absolute values of values
obtained by subtracting an average value in a block from the
respective pixel values. The activity of each of pictures of the
other types is the sum total of the absolute values of DCT
coefficients of motion compensation adaptive predictive error
signals obtained through the subtracter 702 and the DCT circuit
703. The activity of the I picture is calculated one field before
coding. The activities of the other pictures are calculated in
coding. The rate control of a given picture is performed by using
the activity of the same type of picture coded immediately before
the current picture.
The processes (1) to (3) of the above-described rate control will
be described in detail below with reference to the flow charts
shown in FIGS. 28 and 29.
(1) Bit rate allocation in units of N pictures (step S1)
Bit rate control is performed such that the amount of bits of N
pictures from the I picture to the B picture immediately before the
next I picture approaches a fixed value CAG
(constant-allocate-gop). An excess or shortage of the amount of
bits obtained by this control is carried over to the amount of bits
allocated to next N pictures, as shown in FIG. 30. At the end of
coding of N pictures, a difference LOBG (left-over-bits-gop)
between an allocated amount of bits ABG (allocated-bits-gop) and an
amount of bits GBG (generated-bits-gop) is calculated according to
the following equation (1):
An amount of bits obtained by adding the fixed allocated amount of
bits CAG to the excess or shortage LOBG of amount of bits according
to the following equation (2) is allocated to the next N
pictures:
(2) Allocation of amount of bits to each picture
An amount of bits allocated to each picture is determined to
satisfy the above-described conditions in accordance with the
following rules (I) and (II):
(I) When the ratios of quantization step sizes Q.multidot.I,
Q.multidot.P, and Q.multidot.B of the I, P1, P2, and B pictures are
set according to the following equation (3), high coding efficiency
is ensured, and no deterioration in picture quality occurs:
where 'Q0 and 'Q are constants. The ratio of the amounts of bits
allocated to the respective pictures is updated such that the ratio
of the quantization step sizes coincide with the ratio defined by
equation (3).
(II) Excess bits of coded pictures of the N pictures are carried
over for amount-of-bits allocation to non-coded pictures in the N
pictures, and allocated amounts of bits are determined in units of
pictures such that the amounts of bits are allocated at the ratio
defined by the rule (I).
The process of amount-of-bits allocation will be described in
detail below.
(2-1) Allocation of amount of bits to I picture (step S3)
An allocated amount of bits AB.I (allocate-bits-I) of the first I
picture in a sequence is determined on the basis of an activity
A.multidot.I (activity-I) of the I picture, calculated in step S2
in FIG. 22, according to the following equation (4) which is set
such that the amount of bits is not too large, and the quantization
step size of the I picture is not extremely smaller than that of
the P picture,
where C1 and C2 are constants.
The allocated amount of bits of each other I picture is determined
immediately before coding of the I picture. Based on an amount of
bits obtained by correcting the amount of bits of the immediately
preceding I picture using an activity, the average quantization
step size of the immediately preceding I picture is compared with
the average quantization step size of the immediately preceding P1
and P2 pictures. If the quantization step size of the I picture is
larger, the allocated amount of bits of the I picture is increased,
and vice versa according the following formula (5): ##EQU1## where
DQ (difference qscale) is the quantization step size difference,
AQP (averaged qscale previans) is the immediately preceding average
quantization step size, GBP (generated bits previous) is the
immediately preceding amount of bits, AQP (averaged qscale
previans) is the immediately preceding activity, C5 is a constant
equal to or larger than 1, C6 is a constant equal to or smaller
than 1, and f5 and f6 are functions using a difference
DQ.multidot.I.multidot.P12 between the quantization step sizes of
the I picture and the P1 and P2 pictures as a parameter. For
example, the functions f5 and f6 are defined by the following
equations (6):
As the allocated amount of bits of the I picture is updated in the
above-described manner, the ratio of the quantization step sizes of
the I picture and the P1 and P2 pictures approaches an optimum
value, thus ensuring high coding efficiency.
If the allocated amount of bits determined in the above-described
manner exceeds 23% of CAG, the amount of bits is reduced to 23% of
CAG. That is,
if (AB.multidot.I>CAG.times.0.23)
(2-2) Allocation of amount of bits to P0 picture
No allocated amount of bits is determined in advance with respect
to the P0 picture. However, the amount of bits of the P0 picture is
limited by the rate control in the picture so as not to exceed 25%
of the allocated amount of bits of GOP, as will be described
below.
(2-3) Allocation of amounts of bits to P1, P2, and B pictures
(steps S8, S10, S12, S14, S16, and S18)
The allocated amounts of bits of the P1, P2, and B pictures are
determined immediately before coding of the respective pictures in
such a manner that the ratio of allocated amounts of bits AR of the
P1, P2, and B pictures is set to be
AR.multidot.P1:AR.multidot.P2:AR.multidot.B, and the amount of bits
of the N pictures coincides with the allocated amount of bits. That
is,
where GBG (generated-bits-gop) is the amount of bits which has been
generated in the N pictures, and RN (residual-num).multidot.P1,
PN.multidot.P2, and RN.multidot.B are the numbers of P1, P2, and B
pictures, of the N pictures, which have not been coded yet.
Generated amounts of bits satisfying this equation are calculated
by the following equations, respectively:
(2-4) Updating of ratio of allocated amounts of bits of P1, P2, and
B pictures
The ratio of the allocated amounts of bits of the P1, P2, and B
pictures is set to a certain initial value at the start of a
sequence. This ratio is updated immediately before coding of the P1
picture as follows.
(2-4-1) Updating of ratio of allocated code amounts of P1 and P2
pictures
When the P1 and P2 pictures are quantized with the same
quantization step size, the ratio of the activities and the ratio
of the generated amounts of bits have the following
relationship:
where C3 and C4 are constants.
The ratio of the allocated amounts of bits of the P1 and P2
pictures is determined on the basis of this relationship as
follows:
With this updating, the quantization step size of the P1 picture
becomes almost equal to that of the P2 picture, thus suppressing a
deterioration in picture quality caused when the quantization step
size is increased.
(2-4-2) Updating of ratio of allocated bit rates of P1 and B
pictures
As shown in FIG. 31, Pi picture and those pictures of the B0, B1,
B2, and B3 pictures which have quantization step sizes defined by
1:.alpha.Q relative to the quantization step size of the P1 picture
are weighted, and the generated amounts of bits are averaged, and
the allocated amount-of-bits ratio is updated so that it coincides
with the ratio of the generated amounts of bits of the P1 and B
pictures.
A weighted average amount of bits WAGB
(weighted-averaged-generated-bits).multidot.B is calculated first
on the basis of the weighted average, i.e., the average
quantization step size, and generated amounts of bits of the B0,
B1, B2, and B3 according to the following formula (14): ##EQU2##
for
.DELTA.QBi=.vertline.(AQ.multidot.P1+AP.multidot.P2)/2.multidot..alpha.Q-A
Q.multidot.Bi.vertline. and i=0, . . . , 3. ##EQU3##
This value is an amount of bits obtained by averaging the generated
amounts of bits of the B0, B1, B2, and B3 pictures which are
respectively weighted by larger values as the deviation from the
optimum quantization step size (the quantization step size of the P
picture.times..alpha.Q) is decreased. The ratio of the allocated
amounts of bits of the P1 and B pictures is updated according to
the following equation (15) to coincide with the ratio of the
amount of bits of the P1 picture and the weighted average amount of
bits of the B picture:
When the allocated amount-of-bits ratio is updated in the
above-described manner, the ratio of the quantization step sizes of
the P and B pictures approaches an optimum ratio, i.e., 1:'Q,
thereby allowing optimum amount-of-bits allocation in terms of both
coding efficiency and picture quality.
(3) Rate control in picture
In a picture, feedback control is performed in units of slices by
using a virtual buffer. The quantization step size of the first
slice of the I picture is obtained on the basis of an allocated
amount of bits and an activity, whereas that of each of pictures of
the other types is determined on the basis of the quantization step
size of a picture coded in the past.
(3-1) Quantization step size of first slice of picture
The quantization step size of the first slice of the I picture is
determined on the basis of an allocated amount of bits
AB.multidot.I and a sum A.multidot.I of activities in the picture
according to the following equation (16):
The quantization step size of the first slice of the P0 picture is
determined on the basis of a value obtained by multiplying the
average quantization step size of the I picture by a constant 'Q0
according to the following equation (17): ##EQU4##
The quantization step size of the first slice of the first P1
picture in the N pictures is determined on the basis of the average
quantization step size of the I picture according to the following
equation (18): ##EQU5##
The quantization step size of the first slice of each P1 picture
other than the first P1 picture is determined on the basis of the
average quantization step size of the immediately preceding P1
picture according to the following equation (19): ##EQU6##
The quantization step size of the first slice of the P2 picture is
determined on the basis of the average quantization step size of
the P1 picture according to the following equation (20), thus
preventing a deterioration in picture quality caused when the
difference between the quantization step sizes of the P1 and P2
pictures is large. ##EQU7##
The quantization step size of the first slice of the B0 picture is
determined on the basis of a value obtained by multiplying the
average value of the quantization step sizes of the P1 and P2
pictures by 'Q according to the following equation (21) such that
the ratio between the determined quantization step size and the
quantization step sizes of the P1 and P2 pictures is optimized.
##EQU8##
The quantization step sizes of the first slices of the B1, B2, and
B3 pictures are determined on the basis of the quantization step
sizes of the last slices of the B0, B1, and B2 pictures according
to the following equations (22):
(3-2) Rate control in I, P1, and P2 pictures
The quantization step sizes of the I, P1, and P2 pictures are
updated in units of slices in a coding apparatus (identical to the
apparatus shown in FIG. 17) 800 by a quantization step size
updating circuit 803 in accordance with the occupied amount of a
virtual buffer 801 shown in FIG. 32.
Prior to coding of each picture, the buffer occupied amount is
reset to "0". Code bits GB (slices) generated by the coding
apparatus 1000 in units of slices are stored in the buffer 1001,
and allocated amount of bits AB (slices) represented by the
following equation (23) are read out from the buffer 1001 in units
of slices. Note that the allocated amounts of bits are allocated in
units of slices as follows. With regard to the I picture, each of
amounts of bits allocated in units of pictures is distributed among
slices in proportion to the activities of the respective slices.
With regard to the P1 and P2 pictures, each of amounts of bits
allocated in units of pictures is distributed among slices in
proportion to the activities of the slices of an immediately
preceding picture of the same type at the same spatial positions.
These amounts of bits are allocated in units of picture types by a
picture-unit amount-of-bits allocation circuit 1002. ##EQU9##
A buffer occupied amount BC (buffer content) is monitored in units
of slices. If this buffer occupied amount BC is larger (or smaller)
than a threshold value BT (buffer threshold) and is larger (or
smaller) than the buffer occupied amount of the previous slice, a
quantization step size Q is incremented by one (or decremented by
one). Note that the quantization step size Q is limited to 1 to 31.
The threshold value BT is defined by the following equation:
The above description can be expressed as follows: BC(0)=0;
for(slice=1; slice<NM; slice ++)
{BC(slice)=BC(slice-1)+GB(slice)-AB(slice);
if (BC(slice)>BT &&BC(slice)>BC(slice-1)
&&Q<31) Q+=1;
else if (BC(slice)<-BT &&BC(slice)<BT(slice-1)
&&Q>1) Q-=1;}
In this manner, amounts of bits read out from the buffer in units
of slices are allocated in proportion to the activities of the
respective slices, and a change in quantization step size is
limited to a maximum of q1. In addition, the conditions for a
change in quantization step size are made stricter than those in
the conventional method. Therefore, the quantization step size is
not greatly changed in units of slices, preventing a deterioration
in picture quality.
Note that if the amount of bits in the I picture exceeds 25% of the
allocated amount of bits of N pictures, all the coefficients are
forcibly set to be "0" in the subsequent operation to prevent
generation of codes.
(3-3) Rate control in B picture
In the B picture, the same control as that performed for the P1 and
P2 pictures is performed except for the following points.
(1) The quantization step size is updated in units of 10
slices.
(2) The execution of update processing is determined only by
checking whether the absolute value of the buffer occupied amount
exceeds a threshold value, but a comparison with the buffer
occupied amount of the previous slide is not performed.
The above description can be expressed as follows:
BC(0)=0;
for (slice=1; slice<NM; slice ++)
{BC(slice)=BC(slice-1)+GB(slice)-AB(slice);
if (slice %10==0) {
if (BC(slice)>BT&&Q<31) Q+=1;
else if (BC (slice)<-BT&&Q>1) Q-=1;}}
The quantization step size of the B picture is updated in units of
10 slices, i.e., at a long period, and a change in quantization
step size is limited to a maximum of q1. Therefore, the
quantization step size of the B picture is kept to be a value close
to the quantization step size of the first slice of the B0 picture,
which is set to optimize the coding efficiency, over a long period
of time, thus realizing high coding efficiency.
(3-4) Rate control in P0 picture
In the P0 picture, the quantization step size is updated in units
of slices in accordance with the occupied amount of the virtual
buffer in such a manner that the amount of bits does not exceed 25%
of the allocated amount of bits of N pictures. A bit count BOB
(buffer-out-bits) read out from the buffer is a value obtained by
dividing 20% of the allocated amount of bits of N pictures by the
number of slices in the picture according to the following
equation:
The buffer occupied amount is monitored in units of slices. If the
buffer occupied amount exceeds a threshold value, the quantization
step size is incremented by one. However, the quantization step
size is limited to a maximum of 31. The threshold value is the same
as the amount of bits taken out from the buffer. In addition, if
the amount of bits in the P0 picture exceeds 25% of the allocated
amount of bits of N pictures, control is forcibly performed to
prevent generation of codes for all the subsequent macroblocks.
In the P0 picture, rate control is performed only in a special
case, e.g., a case wherein the amount of bits is excessively
increased. In actual coding operations, such control is very rarely
performed. In most cases, the quantization step size of the P0
picture is kept equal to the average quantization step size of the
I picture, i.e., the set value of the first slice of the P0
picture. Therefore, the quantization step size of the P0 picture is
not greatly changed as compared with the I picture, preventing a
deterioration in picture quality.
As described above, since rate control is performed to set constant
amounts of bits in units of N pictures, the coding rate can be
controlled to be a set value. In addition, the coding efficiency
which ensures an optimum step size ratio of the respective pictures
can be obtained, and amount-of-bits allocation can be performed to
set an amount-of-bits ratio which prevents a deterioration in
picture quality. Therefore, the coding efficiency can be
considerably improved. Furthermore, since rate control is performed
to cause the amount of bits to approach a set amount of bits while
a change in quantization step size is limited, the amount of bits
can be controlled to be a set value without causing a deterioration
in picture quality, which occurs when the quantization step size is
excessively changed.
An embodiment of a variable length code decoder will be described
below with reference to FIG. 33.
Referring to FIG. 33, variable length code data is input to an
input terminal 1101 at a predetermined rate. Although this variable
length code data is not specifically limited, it is assumed, in
this case, that this data includes data obtained by transforming
picture data into a coefficient (DCT coefficient) having a small
correlation value using DCT (discrete cosine transformation) in
combination with adaptive prediction, and quantizing the resultant
data, quantization characteristic data, and prediction method
data.
A variable length code decoder 1110 decodes the variable length
code data and outputs the reproduction picture data to an inverse
quantizer 1120 at a pixel rate. In addition, the decoder 1110
outputs the quantization characteristic data and the prediction
method data to the inverse quantizer 1120 and an adaptive
prediction circuit 1150 at proper timings, respectively. The data
inversely quantized by the inverse quantizer 1120 is processed by
an inverse DCT circuit 1130, and difference data relative to the
adaptive prediction signal is restored.
The difference data output from the inverse DCT circuit 1130 is
input to an adder 1140 to be added to the adaptive prediction
signal output from the adaptive prediction circuit 1150 so as to
reproduce picture data. Of the reproduced picture data, picture
data to be referred by the adaptive prediction circuit 1150 is
stored in a reference picture temporary storage circuit 1160 and is
output to a scan converter 1170 when the next reference picture is
input.
The scan converter 1170 serves to change the order of output
pixels. The scan converter 1170 switches parallel signals, obtained
by scan-converting picture data reproduced/output from the adder
1140 in units of two-dimensional blocks into brightness signals and
color signals in units of horizontal scanning lines, and parallel
signals, constituted by brightness signals and color signals output
from the reference picture temporary storage circuit 1160 in units
of horizontal scanning lines, in the original picture order, and
outputs them to a picture display unit such as a TV monitor.
The variable length code decoder 1110 shown in FIG. 33 will be
described below with reference to FIG. 34.
The variable length code decoder 1110 is mainly constituted by an
input data temporary storage circuit 1210 and a code length
detection/decoded value transform circuit 1220, as shown in FIG.
34. Variable length code data input to an input terminal 1201 at a
predetermined rate is temporarily stored in the input data
temporary storage circuit 1210. The input data temporary storage
circuit 1210 outputs data associated with the data of at least one
coefficient to the code length detection/decoded value transform
circuit 1220, in units of the number of bits (M bits) corresponding
to the maximum code length of the variable length code, from a bit
position designated by code length data 1221 from the circuit 1220,
at a pixel rate.
The code length detection/decoded value transform circuit 1220
detects the code length of the variable length code data associated
with the data of each pixel, and transforms the variable length
code data into a decoded value. The circuit 1220 then transmits the
code length data 1221 indicating the code length of the decoded
variable length code data to the input data temporary storage
circuit 1210 to designate the start bit of the next variable length
code data, and also outputs the decoded data to an output terminal
1222.
The input data temporary storage circuit 1210 has the arrangement
shown in FIG. 35, in which variable length code data input to an
input terminal 1301 at a predetermined rate is temporarily stored
in a buffer memory 1310, and the variable length code data is
output in parallel in units of M bits designated by an address
counter 1320. Code length data indicating the code length of a
variable length code decoded at each clock by the code length
detection/decoded value transform circuit 1220 in FIG. 34 is input
to an input terminal 1331.
An adder 1330 adds data (remainder data) 1341 indicating the
remainder obtained by dividing the accumulated bit count of decoded
variable length codes by M to variable length data input from the
input terminal 1331 at each clock, and outputs the sum data to a
one-clock delay circuit 1340. The one-clock delay circuit 1340
outputs the remainder data 1341 to the adder 1330 and a selector
1350, and at the same time outputs data (quotient data) 1342,
obtained by dividing the sum data output from the adder 1330 by M,
to the address counter 1320 and shift registers 1360 and 1365.
Referring to FIG. 35, M is a power of 2 which is equal to or larger
than the maximum code length. With this arrangement, the remainder
data 1341 and the quotient data 1342 can be simply obtained on the
basis of only outputs from the one-clock delay circuit 1340 without
requiring any special circuits for calculating the data 1341 and
1342.
The address counter 1320 increments the read address of the buffer
memory 1310 every time the quotient data 1342 indicates that M-bit
data is decoded. Loading operations of the shift registers 1360 and
1365 are also controlled by the quotient data 1342 such that M-bit
data are sequentially shifted and loaded every time the read
address of the buffer memory 1310 is changed. Note that the shift
registers 1360 and 1365 are initially set such that data are loaded
from the buffer memory 1310 to the registers 1360 and 1365 in units
of M bits.
The selector 1350 is designed to perform selection at M:1. The
selector 1350 selects M-bit data from the start position of
variable length code data in accordance with the remainder data
1341, and outputs the data to an output terminal 1351.
FIG. 36 is a block diagram showing an arrangement of the code
length detection/decoded value transform circuit 1220 in FIG. 34.
All the bits of M-bit data 1402 input from the input data temporary
storage circuit 1210 in FIG. 34 to an input terminal 1401 are input
to a sequencer 1410. Upper bits 1403 of the data 1402 are input to
a regular pattern length detector 1420. Lower bits 1404 of the data
1402 are input to a selector 1430. The sequencer 1410 detects fixed
length control data 1411, and at the same time monitors decoded
data 1441 to perform switching and control operations in accordance
with the type of data to be decoded next, e.g., switching of the
types of variable length code decoding tables 1440 constituted by
ROMs, switching of the types of regular patterns detected by the
regular pattern length detector 1420, switching of decoding methods
in an attendant code decoded value transformer 1460, and control of
a selector 1480. The selector 1480 performs switching between the
code length of a fixed length code detected by the sequencer 1410
and the code length of a variable length code.
The regular pattern length detector 1420 detects the number of
successive "0"s or "1"s (0/1 run), the number of alternately
successive "0"s and "1"s in units of a specific number of bits, or
the like in accordance with a design rule for a variable length
code, and outputs detection data 1421 to the variable length code
decoding table 1440, the selector 1430, and an adder 1470.
The selector 1430 outputs lower bit data 1431, obtained by removing
the bit count detected by the regular pattern length detector 1420
from the upper bits of the data 1402, to a selector 1450, and
outputs data 1432 consisting of upper bits of the data 1431 to the
variable length code decoding table 1440.
The variable length code decoding table 1440 outputs a decoded
value 1441 corresponding to a variable length code, code length
data 1442 following the regular pattern of the variable length
code, code length data 1443 following the variable length code,
data 1444 indicating whether all the decoded value is expressed by
an attendant code, and the like on the basis of the regular pattern
length data 1421, the partial data 1431 following the regular
pattern, and code type data from the sequencer 1410.
The selector 1450 outputs lower bit data 1451, obtained by removing
the bit count indicated by the code length data 1442 from the upper
bits of the output data 1431 from the selector 1430, to the
attendant code decoded value transformer 1460.
The attendant code decoded value transformer 1460 decodes data
accompanying the variable length code by using data of a bit count
corresponding to the attend ant code length data 1443 of the output
data 1451 from the selector 1450 in accordance with a rule
designated by the sequencer 1410.
A selector 1485 normally outputs both the decoded value 1441 and
the output data from the attendant code decoded value transformer
1460 to an output terminal 1486. If the data 1444 indicates that
all the decoded value is expressed by an attendant code, the
selector 1485 outputs only the output data from the attendant code
decoded value transformer 1460, as a decoded value, to the output
terminal 1486.
The adders 1470 and 1478 calculate the code length of the variable
length code including the accompanying data, and output the data
indicating the code length of the decoded variable length code to
an output terminal 1481 through the selector 1480.
FIG. 37 is a block diagram showing another arrangement of the code
length detection/decoded value transform circuit 1220, which is
suitable for a case wherein a code accompanying a variable length
code is a regular variable length code. The difference between this
circuit and the circuit shown in FIG. 36 will be mainly described
below.
A variable length code decoding table 1540 outputs data 1543
indicating the minimum code length of an attendant code with
respect to a decoded variable length code. An attendant code
decoded value transformer 1560 outputs data indicating the
difference between the code length indicated by the minimum code
length data 1543 and the actual attendant code length to an adder
1574. The adder 1574 calculates a table of all the codes for the
attendant code on the basis of the difference data.
In this embodiment, no buffer is arranged at a decoded value data
output section. If, however, some buffer is used at the decoded
value data output section, decoding can be performed at a clock
rate slightly higher than a pixel rate, and attendant data, such as
control data, other than pixel value data, can be decoded with a
sufficient margin.
According to the above-described embodiment, at any coding rate,
the memory capacity required to convert reproduced picture data
into data with a pixel rate can be greatly reduced as compared with
the prior art, even if variable length codes of picture data are
decoded by clocks corresponding to the pixel rate at a relatively
low speed. That is, the operation speed and the memory capacity can
be reduced at once.
An embodiment of a video decoding apparatus will be described below
with reference to FIG. 38.
Referring to FIG. 38, an input video signal 1610 representing a
video constituting a to-be-coded picture is input to a subtracter
1611. The subtracter 1611 calculates the difference between the
input video signal 1610 and a prediction video signal 1612 formed
in the manner described later, thus forming a predictive difference
signal 1613. The predictive difference signal 1613 is subjected to
discrete cosine transformation in a DCT (discrete cosine
transformation) circuit 1614. The DCT coefficient data obtained by
the DCT circuit 1614 is quantized by a quantizer 1615. The signal
quantized by the quantizer 1615 is branched in two directions. One
signal is coded into a variable length code by a variable length
coder 1617 and is converted into data with a predetermined
transmission rate by a buffer 1618. The data is then output to a
transmission path 1619. The other signal sequentially undergoes
processing, in an inverse quantizer 1620 and an IDCT (inverse
discrete cosine transformation) circuit 1621, which is inverse to
the processing performed by the quantizer 1615 and the DCT circuit
1614. The resultant signal is added to the prediction video signal
1612. With this operation, a local decoded signal 1623 is
generated.
The local decoded signal 1623 generated in the above-described
manner is selectively input to first and second frame memories 1625
and 1626 through a switch circuit 1624. Motion vectors from first
and second motion vector detection circuits 1627 and 1628 are
respectively input to the frame memories 1625 and 1626. The motion
vector detection circuits 1627 and 1628 respectively detect motions
between frames by using the input video signal 1610 and the output
signals from the frame memories 1625 and 1626, and generate data
indicating the directions and amounts of the motions, i.e., motion
vectors. In accordance with these motion vectors, video signals
which have under gone motion compensation are read out from the
frame memories 1625 and 1626.
Output signals 1631 and 1632 from the frame memories 1625 and 1626
are input to an averaging circuit 1630 and a switch circuit 1634.
The averaging circuit 1630 outputs a signal 1633 obtained by
averaging the output signals 1631 and 1632 from the frame memories
1625 and 1626. The switch circuit 1634 selects one of the output
signals 1631 and 1632 from the frame memories 1625 and 1626 or the
output signal from the averaging circuit 1630, and outputs it as
the prediction video signal 1612.
A coding control circuit 1627 switches the quantization step sizes
in the quantizer 1615 and the inverse quantizer 1620 to finer or
coarser step sizes depending on whether the input video signal 1601
represents a first or second type frame, and perform rate control
in accordance with the occupied amount of the buffer 1618. In
addition, the coding control circuit 1627 performs switching
control of the switch circuits 1624 and 1634. Note that switching
of the quantization step sizes is performed in accordance with a
predetermined rule, e.g., every other frame of an input video
signal. The switch circuit 1634 selects a signal having the minimum
predictive difference signal 1613 from the output signals 1631,
1632, and 1633 from the frame memories 1625 and 1626 and the
averaging circuit 1630 under the control of the coding control
circuit 1627.
FIG. 39 is a view illustrating the relationship between to-be-coded
frames and reference frames used to form prediction video signals.
In this embodiment, as shown in FIG. 39, in coding of to-be-coded
frames, frames are classified into first type frames P1, P2, . . .
for which a relatively fine quantization step size is used, and
second type frames P1', P2', . . . for which a relatively coarse
quantization step size is used, and past two first frames relative
to to-be-coded frames are used as reference pictures.
In a conventional method, for example, the first type frame P2 is
used for the second type frame P2' and the first type frame P3, and
the first type frame P3 is used for the second type frame P3' and
the first type frame P4. That is, past one first type frame is used
as a reference frame. In contrast to this, in this embodiment, for
example, the first type frame P1 is used as a reference frame in
addition to the first type frame P2 for the to-be-coded frames P2'
and P3, and the frame P2 is used as a reference frame in addition
to the frame P3 for the to-be-coded frames P3' and P4. That is, one
first type frame is additionally used as a reference frame.
Table 3 shows the relationship between to-be-coded frames
represented by the input video signal 1610 and input/output of the
frame memories 1625 and 1626.
TABLE 3 ______________________________________ Coding
______________________________________ To-be-coded picture P2 P2'
P3' P3 P4 FM1 output P0 P2 P2 P2 P2 FM2 output P1 P1 P1 P3 P3 FM1
input P2 -- -- -- P4 FM2 input -- -- P3 -- --
______________________________________
As described above, in this embodiment, past two first type frames
relative to to-be-coded frames are used as reference frames, one of
video signals representing past two first type frames
(motion-compensated video signals output from the frame memories
1625 and 1626) or a signal obtained by averaging, i.e.,
interpolating, the two signals using the averaging circuit 1630 is
selected by the switch circuit 1634. That is, a signal having the
minimum predictive difference signal 1613 is selected as the
prediction video signal 1612. Therefore, in comparison with the
conventional method, the hardware size is increased because each of
the coding and decoding apparatuses require two frame memories.
However, since two first type frames are used as reference frames
to form the prediction video signal 1612, the prediction precision
is increased, and the coding efficiency is improved. In addition,
since future frames are not used as reference frames, an increase
in delay time in coding/decoding operations can be prevented.
FIG. 40 is a view illustrating the relationship between to-be-coded
frames and reference frames used to form prediction video signals
according to another embodiment. In this embodiment, a past frame
is additionally used as a reference frame for second type frames
P1', P2', P3', . . . in another conventional method.
More specifically, in another conventional method, for example, a
frame P2 is used for the frame P2', and a frame P3 is used for the
frame P3'. That is, past one first type frame is used as a
reference frame. In contrast to this, in this embodiment, for
example, a frame P1 is used as a reference frame in addition to the
frame P2 for the to-be-coded frame P2', and the frame P2 is used as
a reference frame in addition to the frame P3 for the to-be-coded
frame P3'. That is, as a first type frame used as a reference
frame, past one first type frame second nearest to a to-be-coded
frame is added.
Table 4 shows the relationship between to-be-coded frames
represented by the input video signal 1610 and input/output of the
frame memories 1625 and 1626 in this embodiment.
TABLE 4 ______________________________________ Coding
______________________________________ To-be-coded picture P2 P2'
P3' P3 P4 FM1 output P1' P2 P2 P2 P3' FM2 output P1 P1 P2' P3 P3
FM1 input P2 -- -- P3' P4 FM2 input -- P2' P3 -- --
______________________________________
As described above, in this embodiment, since the maximum number
(=2) of reference frames is not increased, the hardware size is not
increased. In addition, when a to-be-coded frame is a second type
frame, since two first type frames can be used as reference
pictures to form the prediction video signal 1612, the precision of
the prediction video signal 1612 for a second type frame is
increased, and the coding efficiency is improved.
In the above-described embodiments, first and second type frames
are arranged alternately in terms of time. However, the present
invention can be applied to a case wherein first and second type
frames are arranged in a different manner in terms of time. In
addition, in each embodiment described above, past two frames are
used as reference frames. However, the present invention can be
used to a case wherein a larger number of reference frames are
used.
As has been described above, according to the present invention,
when video signals are to be coded by using an inter-frame
predictive coding scheme, especially the motion compensation
inter-frame predictive coding scheme, the coding efficiency with
respect to moving pictures having low intra-frame correlation and
high inter-frame correlation, which are generally considered to be
difficult to code, can be improved without increasing the delay
time in coding/decoding operations, thereby realizing video coding
suitable for fields of application requiring a short delay time,
e.g., real-time communication.
Additional advantages and modifications will readily occur to those
skilled in the art. Therefore, the invention in its broader aspects
is not limited to the specific details, and representative devices
shown and described herein. Accordingly, various modifications may
be made without departing from the spirit or scope of the general
inventive concept as defined by the appended claims and their
equivalents.
* * * * *