U.S. patent application number 09/482704 was filed with the patent office on 2002-07-04 for high speed variable length decording processor.
Invention is credited to Yagi, Osamu.
Application Number | 20020084922 09/482704 |
Document ID | / |
Family ID | 11666494 |
Filed Date | 2002-07-04 |
United States Patent
Application |
20020084922 |
Kind Code |
A1 |
Yagi, Osamu |
July 4, 2002 |
HIGH SPEED VARIABLE LENGTH DECORDING PROCESSOR
Abstract
The invention provides a general-purpose processor which is
capable of performing high-speed variable-length decoding process.
The general-purpose processor is provided with a video data
register served for exclusively storing the variable-length code
which is capable of storing the data having a length larger than
the maximum length of the variable code to be subjected to
variable-length decoding, a data counter register served for
exclusively storing the length of the data which has not been
subjected to variable-length decoding out of the data stored in the
video data register, and a pointer register exclusively served for
storing the address of the variable-length code to be read out next
in a variable-length code bit stream stored in cache memory, and in
an ALU for performing general-purpose operation, the
variable-length code stored in the video data register is
variable-length decoded by controlling the video data register,
data counter, and pointer register.
Inventors: |
Yagi, Osamu; (Tokyo,
JP) |
Correspondence
Address: |
SONNENSCHEIN NATH AND ROSENTHAL
P.O. BOX 061080
WACKER DRIVE STATION - SEARS TOWNER
CHICAGO
IL
60606-1080
US
|
Family ID: |
11666494 |
Appl. No.: |
09/482704 |
Filed: |
January 13, 2000 |
Current U.S.
Class: |
341/67 ;
375/E7.144; 375/E7.226; 375/E7.231 |
Current CPC
Class: |
H04N 19/60 20141101;
H04N 19/91 20141101; H03M 7/40 20130101 |
Class at
Publication: |
341/67 |
International
Class: |
H03M 007/40 |
Foreign Application Data
Date |
Code |
Application Number |
Jan 14, 1999 |
JP |
P11-007462 |
Claims
What is claimed is:
1. An operation unit for general-purpose operation comprising:
general-purpose storage means for storing the data for said
general-purpose operation; variable-length code storage means for
exclusively storing a variable-length code capable of storing the
data having the length equal to or longer than the maximum length
of said variable-length code to be subjected to variable-length
decoding; length storage means for exclusively storing the length
of the data not subjected to variable-length decoding out of the
data stored in said variable-length code storage means; position
storage means for exclusively storing the position of said
variable-length code to be read out next in a bit stream of said
variable-length code; and operation means for said general-purpose
operation for variable-length decoding said variable-length code
stored in said variable-length code storage means by controlling
said variable-length code storage means, length storage means, and
position storage means.
2. The operation unit as claimed in claim 1, wherein the data
length stored in said variable-length code storage means is longer
than the data length stored in said general-purpose storage
means.
3. The operation unit as claimed in claim 1, wherein the said
operation means is provided with a barrel shifter comprising path
transistors and an operation circuit comprising path transistors
for operating OR in addition to said circuit for carrying out
general-purpose operation.
Description
BACKGROUND OF THE INVENTION
[0001] 1. Field of the Invention
[0002] This invention relates to an operation unit, and more
particularly to an operation unit which is capable of high-speed
variable-length decoding in, for example, a general-purpose
processor.
[0003] 2. Description of Related Art
[0004] FIG. 4 shows an exemplary structure of a conventional VLD
(variable-length decoding) circuit which utilizes a general-purpose
processor.
[0005] For example, a transport stream based on MPEG (Moving
Picture Experts Group) 2 system is supplied to a DMUX
(demultiplexer) 3, and the DMUX 3 separates the elementary stream
of a video and audio from the transport stream. A video elementary
stream obtained in the DMUX 3 is supplied to a main memory 2
through a bus 4 and stored in a main memory 2.
[0006] The main memory 2 stores the video elementary stream
supplied from the DMUX 3 as described herein above and has a stored
program for operating a general-purpose processor 1 as a VLD
circuit, and the general-purpose processor 1 decodes a
variable-length code stored in the main memory 2 by operating the
program stored in the main memory 2.
[0007] In detail, the general-purpose processor 1 comprises an
instruction fetch section 11, an instruction decoder 12, and an ALU
(Arithmetic Logic Unit) 13, a register group 14, a cache memory 15,
and an internal bus 16, and the program stored in the main memory 2
is supplied to the cache memory 15 through the bus 14 occasionally
and stored therein.
[0008] The instruction fetch section 11 suitably fetches a command
(instruction) which constitutes the program stored in the cache
memory 15 and supplies it to the instruction decoder 12. The
instruction decoder 12 decodes a command supplied from the
instruction fetch section 11 and supplies the decoded result to the
ALU 13. The ALU 13 performs various general processes as required
according to the decoded result supplied from the instruction
decoder 12 while reading and writing the data from and in the
register group 14.
[0009] In detail, the main memory 2 has the stored program used for
variable-length decoding, and the ALU 13 performs processes
required for variable-length decoding.
[0010] In detail, a video elementary stream stored in the main
memory 2 is occasionally transferred to the cache memory 15 through
the bus 4 and stored therein. The video elementary stream stored in
the cache memory 15 is occasionally transferred to the register 14
through the internal bus 16 and stored therein, and subjected to
variable-length decoding in the ALU 13.
[0011] The register group 14 comprises, for example, a plurality of
32 bit registers, and in the ALU 13, any one of these registers is
assigned to the buffer Bfr (video stream data buffer) which stores
a variable-length code to be subjected to variable-length decoding
out of variable-length codes which are components of the video
elementary stream (therefore, the buffer Bfr is 32 bits), and the
data stored in the buffer Bfr is subjected to general-operation
corresponding to functions such as show_bits( ), get_bits( ), and
flush_buffer ( ) to perform variable-length decoding.
[0012] The function show_bits(int N) is for observing N bits from
MSB (Most Significant Bit) of the buffer Bfr, and described as
shown herein under, for example, in C language.
1 unsigned int show_bits(int N) { return Bfr >> (32-N); }
[0013] According to the function show_bits(int N), the content of
the buffer Bfr that is one register of the register group 14 as
shown in FIG. 5A is copied in another register (temporary register)
Temp of the register group 14, and shifted to the right by 32-N
bits (Bfr>>(32-N) ). As the result, N bits are set from MSB
of the buffer Bfr to lower N bits of the register Temp as shown in
FIG. 5C, and this is returned as the functional value of the
function show_bits (int N) (return Bfr>>(32-N) In this case,
the stored value of the buffer Bfr is not changed.
[0014] In the case that higher N bits of the buffer Bfr observed in
the function show_bits( ) and get_bits( ) are discarded and the
number of bits of the stored value of the buffer Bfr that remains
residual after discarding is equal to or smaller than the maximum
length of the variable-length code, the function flush_buffer(int
N) reads the subsequent data from the cache memory 15 for
supplement, for example, it is described as shown herein under in C
language:
2 void flush_buffer(int N) { Bfr << = N; Incnt -= N; if
(Incnt <= 24) { do { bfr .vertline.= *Rdptr++ <<
(24-Incnt); Incnt += 8; } while (Incnt <= 24); } else { . . . }
}
[0015] The variable Rdptr is a pointer to the address (position) of
a variable-length code to be read out next in the video elementary
stream stored in the cache memory 15, one of the registers which
are components of the register group 14 is assigned to the variable
Rdptf. The variable Incnt is a variable for storing the length of
the data which has not been subjected to variable-length decoding
out of the data stored in the buffer Bfr, one of the registers
which are components of the register group 14 is assigned to the
variable Incnt.
[0016] According to the function flush_buffer(int N), as shown in
FIG. 6A, when variable-length decoding of the higher N bits of the
buffer Bfr is completed, the stored value of the buffer Bfr is
bit-shifted to the left by N bits (Bfr<<=N), and as the
result the stored value of the buffer Bfr is changed as shown in
FIG. 6B.
[0017] The variable Incnt is decremented by N, and the variable
Incnt is changed so as to indicate the length of the data (portion
described as Next_data in FIG. 6B and FIG. 6D) which has not been
subjected to variable-length decoding after the stored value of the
buffer Bfr is left-shifted by N bits as shown in FIG. 6B.
[0018] Furthermore, in the case that the variable Incnt after
changing is equal to or shorter than the maximum length (herein,
for example, 24 bits) of the variable-length code (if
(Incnt<=24), stuffing process is performed, that is, the data
subsequent to the data which has not been subjected to
variable-length decoding stored in the buffer Bfr is read out from
cache memory 15 and supplemented until the variable Incnt exceeds
24 bits, namely the maximum length of the variable-length code
(while (Incnt<=24)).
[0019] In detail, the pointer Rdptr is incremented by 1 (Rdptr++),
and the data (which is the component of the video elementary
stream) stored at the address pointed to by the pointer is read out
from the cache memory 15. Herein it is assumed that, for example,
8-bit data is stored at the address of the cache memory 15 which is
pointed to by the pointer Rdptr. The 8-bit data stored at the
address pointed to by the pointer Rdptr is read out from the cache
memory 15.
[0020] The 8-bit data nd (represented by Rdptr) read out from the
cache memory 15 is stored in the lower 8 bits of one of the
registers which are components of the register group 14 through the
internal bus 18. The ALU 13 reads out the stored value of the
register where the 8-bit data nd is stored, and shifts to the left
by 24-Incnt bits as shown in FIG. 6C
(Rdptr++<<(24-Incnt)).
[0021] Furthermore, the ALU 13 operates OR (referred to as bit OR
suitably) for each bit of the bit shift result (FIG. 6C) and the
stored value (FIG. 6B) of the buffer Bfr, and stores the operation
result in the buffer Bfr (bfr .vertline.=Rdptr++<<(24-Incnt))
as shown in FIG. 6D. As the result, the data (FIG. 6D) formed by
supplementing the data (FIG. 6B) which has remained residual after
the higher N bits of the original stored value (FIG. 6A) has been
discarded and which has not been subjected to variable-length
decoding with the subsequent 8-bit data is stored in the buffer
bfr.
[0022] The ALU 13 increments the variable Incnt which represents
the length of the data which has not been subjected yet to
variable-length decoding out of the data stored in the buffer Bfr
by 8 namely the data quantity of the data nd (Incnt+=8), and the
above-mentioned stuffing process (process for supplementing the
buffer Bfr with the data in 8-bit units) is repeated until the
variable Incnt exceeds 24 bits namely the maximum length of the
variable-length code (while (Incnt<=24)).
[0023] The function get_bits(int N) performs the process which
corresponds to both functions show_bits( ) and flush_buffer( ), and
is described as shown herein under in, for example, C language:
3 unsigned int get_bits(int N) { unsigned int Val; Val =
show_bits(N); flush_buffer(N); return Val; }
[0024] In the above-mentioned conventional VLD circuit, because the
register group 14 comprises general-purpose registers which the ALU
13 for general-purpose operation uses for operation, it is
difficult to perform high-speed VLD processing.
SUMMARY OF THE INVENTION
[0025] The present invention was accomplished to solve the problem,
it is the object of the present invention to provide an operation
unit which is capable of high-speed VLD processing in a
general-purpose processor.
[0026] An operation unit for general-purpose operation comprising a
general-purpose storage means for storing the data for the
general-purpose operation, a variable-length code storage means for
exclusively storing a variable-length code capable of storing the
data having the length equal to or longer than the maximum length
of the variable-length code to be subjected to variable-length
decoding, a length storage means for exclusively storing the length
of the data not subjected to variable-length decoding out of the
data stored in the variable-length code storage means, a position
storage means for exclusively storing the position of the
variable-length code to be read out next in a bit stream of the
variable-length code, and an operation means for the
general-purpose operation for variable-length decoding the
variable-length code stored in the variable-length code storage
means by controlling the variable-length code storage means, length
storage means, and position storage means.
[0027] In the operation unit having the above-mentioned structure,
the general-purpose storage means stores the data for performing
general-purpose operation. The variable-length code storage means
is exclusively served for storing a variable-length code which is
capable of storing the data having the length equal to or longer
than the maximum length of the variable-length code to be subjected
to variable-length decoding, and the length storage means is served
for storing the length of the data which has not been subjected to
variable-length decoding out of the data stored in the
variable-length code storing means. The position storage means is
served for exclusively storing the position of the variable-length
code to be read out next out of codes of a variable-length code bit
stream, and the operation means is served for performing
general-purpose operation to variable-length decode the
variable-length code stored in the variable-length code storage
means by controlling the variable-length code storage means, length
storage means, and position storage means.
BRIEF DESCRIPTION OF THE DRAWINGS
[0028] FIG. 1 is a block diagram for illustrating an exemplary
structure of an embodiment of a VLD circuit to which the present
invention is applied.
[0029] FIG. 2 is a circuit diagram for illustrating an exemplary
structure of a barrel shifter comprising path transistors mounted
on the ALU 13 shown in FIG. 1.
[0030] FIG. 3A and FIG. 3B are circuit diagrams for illustrating
exemplary structures of circuits for operating OR mounted on the
ALU 13 shown in FIG. 1.
[0031] FIG. 4 is a block diagram for illustrating an exemplary
structure of a conventional VLD circuit.
[0032] FIG. 5A to FIG. 5C are diagrams for describing the process
performed by the function show_bits( ).
[0033] FIG. 6A to FIG. 6D are diagrams for describing the process
performed by the function flush_buffer( ).
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0034] FIG. 1 is an exemplary structure of an embodiment of a VLD
circuit to which the present invention is applied. The same
components as shown in FIG. 4 are given the same characters as
given in FIG. 4, and the description is omitted hereinafter. In
detail, the VLD circuit shown in FIG. 1 comprises a general-purpose
processor 1, a main memory 2, a DMUX 3, and a bus 4, and has the
same structure as shown in FIG. 4 basically.
[0035] However, the general-purpose processor 1 is provided with a
video data register 21 for storing predetermined exclusive data
(variable-length code storage means), a data counter 22 (length
storage means), and a pointer register 23 (position storage means)
in addition to the register group 14 (general-purpose storage
means) for storing the data for general-purpose operation.
[0036] The video data register 21 is to be assigned to the
above-mentioned buffer Bfr, and structured so as to be a register
for exclusively storing the variable-length code read out from the
cache memory which is to be subjected to variable-length decoding
by the ALU 13. The video register 21 stores more data than each
register which is a component of the general-purpose register group
21, namely the data of more than 32 bits, for example 64-bit
data.
[0037] The data counter register 22 is to be assigned to the
above-mentioned variable Incnt, and structured so as to be a
register for exclusively storing the value which the variable Incnt
has to hold.
[0038] The pointer register 23 is to be assigned to the
above-mentioned pointer Rdptr, and is structured to be an register
for exclusively storing the address (position) of the cache memory
which the pointer Rdptr points to.
[0039] In the ALU 13 (operation means), the process corresponding
to the above-mentioned functions show_bits( ), get_bits( ), and
flush_buffer ( ) is performed while these exclusive video data
register 21 (Btr), the data counter register 22 (Incnt), and the
pointer register 32 (Rdptr) are being controlled, and the
variable-length code stored in the video data register 21 is
subjected to variable-length decoding.
[0040] Because the video data register 21 can store the data of
more bits, namely 64 bits, than each register of the
general-purpose register group 21, the number of stuffing process
as described herein above is reduced during the processing of the
function flush_buffer( ), and as the result the high-speed
variable-length decoding process can be realized.
[0041] In detail, in the case that one of the registers of the
register group 14 is assigned to the buffer Bfr, the data length of
the buffer Bfr is equal to the data length of the register group
14, namely 32 bits. It is assumed that the variable-length code
which has not been subjected to variable-length decoding
(accordingly, the variable-length code of 32 bits) is stored in the
whole buffer Bfr, such buffer is subjected to variable-length
decoding, and variable-length decoding of the higher N bits is
completed. In this case, the data which has not been subjected to
variable-length decoding has 32-N bits, this number of bits is
likely to be equal to or less than 24 bits namely the maximum
length of the variable-length code (because N ranges from 1 bit to
24 bits, namely the maximum length of the variable-length code), as
the result in the function flush_buffer( ), the number of stuffing
process (data transfer process for the cache memory to the buffer
Bfr) is repeated many times. As the result of repetition of
stuffing process performed many times, it takes a long time to
complete the whole process of variable-length decoding.
[0042] On the other hand, in the case that the large-sized video
register 21 for storing the variable-length code is provided
additionally to the general-purpose register group 14, the data
length of the buffer Bfr is equal to the data length of the video
register 21 namely 64 bits. Similarly to the above-mentioned case,
it is assumed that the variable-length code which has not been
subjected to variable-length decoding (accordingly, the
variable-length code of 64 bits) is stored in the whole buffer Bfr,
such buffer is subjected to variable-length decoding, and
variable-length decoding of the higher N bits is completed. In this
case, the data which has not been subjected to variable-length
decoding has 64-N bits, the maximum value of N is the maximum
length of the variable-length code namely 24 bits, and 64-N will be
therefore larger than 24. As the result, the number of repetition
of stuffing process is reduced, and high-speed variable-length
decoding process is realized (short time).
[0043] Next, it is made possible to provide an exclusive command
(function) for performing high-speed process, which is employed for
so-called incorporated microcomputer, by providing the exclusive
video data register 21 served as the buffer Bfr.
[0044] To take the advantage, a function vld_bit_ext (int N) which
can extract the higher N bits of the buffer Bfr without copying the
content of the buffer Bfr is provided in the temporary register
Temp (FIG. 5) instead of the above-mentioned function show_bits(int
N). According to the function vld_bit_ext (int N), because it is
not required to copy the content of the buffer Bfr, the higher N
bits of the buffer Bfr can be extracted in one step, as the result,
the higher speed variable-length decoding process is realized.
[0045] Next, in the case of the VLD circuit shown in FIG. 1, in the
function flush_buffer( ), the number of repetition of stuffing
process is reduced, but still some number of repetition of stuffing
process is required. In stuffing process, the data pointed to by
the pointer Rdptr (Rdptr) is bit-shifted as described herein above,
and bit unit OR of the bit shift result and the stored value of the
buffer Bfr is operated, and at that time it takes a long time to
complete bit shifting and operation of the bit unit OR in the case
that a general-purpose ALU 13 is used.
[0046] To solve the above-mentioned problem, in the present
invention, for example, a barrel shifter having the same structure
as the barrel shifter comprising path transistors as shown in FIG.
2 is incorporated in the ALU 13.
[0047] In FIG. 2, a latticed circuit formed by connecting sources
or drains of N-channel FETs (field effect transistor) to drains or
sources of P-channel FETs respectively is provided, and a
predetermined bit out of the input[ ] to be bit-shifted is supplied
to the connection point of the source of an N-channel EFT and the
drain of a P-channel FET.
[0048] A predetermined bit of the four bit output data result[ ]
obtained as the result of bit shifting of the input data input[ ]
is generated from the connection point of the drain of the
N-channel FET and the source of the P-channel FET of the latticed
circuit.
[0049] A predetermined bit of a four bit shift quantity data shift[
] for indicating the number of bit to be shifted is supplied to
gates of the N-channel FETs and P-channel FETs.
[0050] In the barrel shifter shown in FIG. 2, 4-bit output data
result[3] to result[0] is generated as the bit shift result instead
of 7 bit input data input[6] to input[0] (input[i] represents bits
from LSB (Least Significant Bit) to (i-1)-th bit), in this case the
number of bits to be shifted is determined based on the shift
quantity data shift[3] to shift[0].
[0051] In detail, in the case that the first bit shift[0] is 1 and
other bits are 0 in the shift quantify data shift[ ], the first bit
to the fourth bit input[3:0] of the input data input[ ] are
generated as the output data result[ ]. Accordingly, no bit
shifting is performed in this case.
[0052] In the case that the second bit shift[1] (second bit from
LSB) out of the shift quantity data shift[ ] is 1 and other bits
are 0, the second bit to the fifth bit input[4:1] of the input data
input[ ] are generated as the output data result[ ]. Accordingly, 1
bit right shifting is performed in this case.
[0053] In the case that the third bit shift[1] (third bit from LSB)
out of the shift quantity data shift[ ] is 1 and other bits are 0,
the third bit to the sixth bit input[5:2] of the input data input[
] are generated as the output data result[ ]. Accordingly, 2 bit
right shifting is performed in this case.
[0054] In the case that the fourth bit shift[1] (fourth bit from
LSB) out of the shift quantity data shift[ ] is 1 and other bits
are 0, the fourth bit to the seventh bit input[6:3] are generated
as the output data result[ ]. Accordingly, 3 bit right shifting is
performed in this case.
[0055] According to the barrel shifter comprising path transistors
as described herein above, high-speed bit shifting is realized.
[0056] Next, in the ALU 13, a general-purpose circuit comprising 6
FETs, for example, as shown in FIG. 3A may be incorporated as the
circuit for operating the bit unit OR, but in the case that bit
unit OR is operated, because it is only the requirement to generate
an output which indicates that any one of 2 input signals in1 and
in2 is H (High) level signal, a circuit comprising path transistors
may be incorporated in the ALU 13 as the circuit for operating bit
unit OR as shown in FIG. 3. In this case, a circuit for operating
the bit unit OR can comprise two FETs, which are less than those in
the case shown in FIG. 3A, and the OR can be obtained at high speed
the more.
[0057] It is made possible to perform bit shifting and bit unit OR
operation in one step by incorporating (mounting) a barrel shifter
comprising path transistors and a circuit for operating OR as
described herein above in the ALU 13. Herein, a function for
performing bit shifting and bit unit OR operation in one step is
defined as vld_1s_bor(int N), then it is made possible to perform
stuffing process performed in the function flush_buffer( ) in a
short cycle by using the function vld_1s_bor(int N). As the result,
execution cycle required for variable-length decoding process is
shortened the more.
[0058] Because the general-purpose processor 1 shown in FIG. 1
comprises the general-purpose processor, the video data register
21, the data counter 22, and the pointer register 23, which are
exclusively used for variable-length decoding process, and the ALU
13 comprising the barrel shifter having path transistors or the
circuit for operating OR as shown in FIG. 2 or FIG. 3 respectively,
the general-purpose processor 1 can be used not only for high-speed
variable-length decoding process as described herein above but also
for general-purpose operation as in the conventional use (a
processor excellent not only in general-purpose operation but also
in media processing (herein, variable-length decoding process) is
called as media processor).
[0059] The size of a video data register 21 is 64 bits in the
present invention, but the size of the video data register is by no
means limited to 64 bits. Basically, the larger the size is, the
more effectively the number of repetition of stuffing process is
reduced.
* * * * *