U.S. patent application number 13/162075 was filed with the patent office on 2012-12-13 for video decoder with transposing vector processor and methods for use therewith.
This patent application is currently assigned to VIXS SYSTEMS, INC.. Invention is credited to Edward Hong, Indra Laksono, Dong Liu, Hongri (Grace) Wang, Kai Yang.
Application Number | 20120314775 13/162075 |
Document ID | / |
Family ID | 47293189 |
Filed Date | 2012-12-13 |
United States Patent
Application |
20120314775 |
Kind Code |
A1 |
Laksono; Indra ; et
al. |
December 13, 2012 |
VIDEO DECODER WITH TRANSPOSING VECTOR PROCESSOR AND METHODS FOR USE
THEREWITH
Abstract
A multi-format video decoder includes an entropy decoding device
that generates entropy decoded (EDC) data from an encoded video
signal. A multi-format video decoding device includes a memory
module that stores a plurality of operational instructions
including at least one vector read instruction that includes vector
read orientation data. A plurality of vector processor units
generate a decoded video signal from the EDC data, wherein at least
one of the plurality of vector processors include a matrix memory
that stores matrix data corresponding to a plurality of rows and
columns and that generates vector read data in a first read
orientation when the vector read orientation data has a first value
and that generates vector read data in a second read orientation
when the vector read orientation data has a second value.
Inventors: |
Laksono; Indra; (Richmond
Hill, CA) ; Liu; Dong; (Markham, CA) ; Wang;
Hongri (Grace); (Richmond Hill, CA) ; Hong;
Edward; (North York, CA) ; Yang; Kai;
(Richmond Hill, CA) |
Assignee: |
VIXS SYSTEMS, INC.
Toronto
CA
|
Family ID: |
47293189 |
Appl. No.: |
13/162075 |
Filed: |
June 16, 2011 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61494614 |
Jun 8, 2011 |
|
|
|
Current U.S.
Class: |
375/240.25 ;
375/E7.027 |
Current CPC
Class: |
H04N 19/30 20141101;
H04N 19/12 20141101; H04N 19/42 20141101; H04N 19/44 20141101; H04N
19/40 20141101 |
Class at
Publication: |
375/240.25 ;
375/E07.027 |
International
Class: |
H04N 7/26 20060101
H04N007/26 |
Claims
1. A video decoder comprising: an entropy decoding device that
generates entropy decoded (EDC) data from an encoded video signal;
a video decoding device, coupled to the entropy decoding device,
that includes: a memory module that stores a plurality of
operational instructions including at least one vector read
instruction that includes vector read orientation data; a plurality
of vector processor units, coupled to the memory, for generating a
decoded video signal from the EDC data, wherein at least one of the
plurality of vector processor units includes: a matrix memory that
stores matrix data corresponding to a plurality of rows and columns
and that generates vector read data in a first read orientation
when the vector read orientation data has a first value and that
generates vector read data in a second read orientation when the
vector read orientation data has a second value.
2. The video decoder of claim 1 wherein the vector read data in the
first read orientation corresponds to matrix data in one of the
plurality of rows of the matrix memory.
3. The video decoder of claim 1 wherein the vector read data in the
second read orientation corresponds to matrix data in one of the
plurality of columns of the matrix memory.
4. The video decoder of claim 1 wherein the plurality of
operational instructions further includes at least one vector write
instruction that includes vector write orientation data; wherein
the matrix memory stores vector write data in a first write
orientation when the vector write orientation data has a third
value and that stores vector write data in a second write
orientation when the vector write orientation data has a fourth
value.
5. The video decoder of claim 4 wherein the vector write data in
the first write orientation corresponds to matrix data in one of
the plurality of rows of the matrix memory.
6. The video decoder of claim 4 wherein the vector write data in
the second write orientation corresponds to matrix data in one of
the plurality of columns of the matrix memory.
7. The video decoder of claim 1 wherein the encoded video signal is
encoded in accordance with a VP8 coding standard.
8. A method comprising: generating entropy decoded (EDC) data from
an encoded video signal; generating a decoded video signal from the
EDC data, via a plurality of vector processor units, in response to
a plurality of operational instructions including at least one
vector read instruction that includes vector read orientation data,
wherein at least one of the plurality of vector processor units
operates by: storing matrix data in a matrix memory corresponding
to a plurality of rows and columns; generating vector read data in
a first read orientation when the vector read orientation data has
a first value; and generating vector read data in a second read
orientation when the vector read orientation data has a second
value.
9. The method of claim 8 wherein the vector read data in the first
read orientation corresponds to matrix data in one of the plurality
of rows of the matrix memory.
10. The method of claim 8 wherein the vector read data in the
second read orientation corresponds to matrix data in one of the
plurality of columns of the matrix memory.
11. The method of claim 8 wherein the plurality of operational
instructions further includes at least one vector write instruction
that includes vector write orientation data, and wherein the at
least one of the plurality of vector processor units operates by:
storing vector write data in a first write orientation when the
vector write orientation data has a third value; and storing vector
write data in a second write orientation when the vector write
orientation data has a fourth value.
12. The method of claim 11 wherein the vector write data in the
first write orientation corresponds to matrix data in one of the
plurality of rows of the matrix memory.
13. The method of claim 11 wherein the vector write data in the
second write orientation corresponds to matrix data in one of the
plurality of columns of the matrix memory.
14. The method of claim 8 wherein the encoded video signal is
encoded in accordance with a VP8 coding standard.
Description
CROSS REFERENCE TO RELATED PATENTS
[0001] The present application claims priority under 35 USC 119(e)
to the provisionally filed application entitled, "VIDEO DECODER
WITH VECTOR PROCESSING UNIT AND METHODS FOR USE THEREWITH," having
Ser. No. 61/494,614, filed on Jun. 8, 2011, the contents of which
are incorporated herein by reference thereto.
TECHNICAL FIELD OF THE INVENTION
[0002] The present invention relates to coding used in devices such
as video encoders/decoders for stereoscopic television signals.
DESCRIPTION OF RELATED ART
[0003] Video encoding has become an important issue for modern
video processing devices. Robust encoding algorithms allow video
signals to be transmitted with reduced bandwidth and stored in less
memory. However, the accuracy of these encoding methods face the
scrutiny of users that are becoming accustomed to greater
resolution and higher picture quality. Standards have been
promulgated for many encoding methods including the H.264 standard
that is also referred to as MPEG-4, part 10 or Advanced Video
Coding (AVC), and the VP8 standard set forth by On2 Technologies,
Inc. While these standards set forth many powerful techniques,
further improvements are possible to improve the performance and
speed of implementation of such methods. The video signal encoded
by these encoding methods must be similarly decoded for playback on
most video display devices.
[0004] The Motion Picture Expert Group (MPEG) has presented a
Scalable Video Coding (SVC) Annex G extension to H.264/MPEG-4 AVC
for standardization. SVC provides for encoding of video bitstreams
that include subset bitstreams that can represent lower spatial
resolution, lower temporal resolution or otherwise lower quality
video. A subset bitstream can be derived by dropping packets from
the total bitstream. SVC streams allow end devices to flexibly
scale the temporal resolution, spatial resolution or video
fidelity, for example, to match the capabilities of a particular
device.
[0005] Efficient and fast encoding and decoding of video signals is
important to the implementation of many video devices, particularly
video devices that are destined for home use. Further limitations
and disadvantages of conventional and traditional approaches will
become apparent to one of ordinary skill in the art through
comparison of such systems with the present invention.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
[0006] FIGS. 1-3 present pictorial diagram representations of
various video devices in accordance with embodiments of the present
invention.
[0007] FIG. 4 presents a block diagram representation of a video
system in accordance with an embodiment of the present
invention.
[0008] FIG. 5 presents a block diagram representation of a video
decoder 102 in accordance with an embodiment of the present
invention.
[0009] FIG. 6 presents a block diagram representation of a
multi-format video decoder 150 in accordance with an embodiment of
the present invention.
[0010] FIG. 7 presents a block diagram representation of a
multi-format video decoder 150 in accordance with an embodiment of
the present invention.
[0011] FIG. 8 presents a block diagram representation of a decoding
process in accordance with an embodiment of the present
invention.
[0012] FIG. 9 presents a block diagram representation of a matrix
vector processing unit 190 in accordance with another embodiment of
the present invention.
[0013] FIG. 10 presents a block diagram representation of a filter
vector processing unit in accordance with another embodiment of the
present invention.
[0014] FIG. 11 presents a block diagram representation of a VPU
instruction 180 in accordance with another embodiment of the
present invention.
[0015] FIG. 12 presents a block diagram representation of a VPU 200
in accordance with an embodiment of the present invention.
[0016] FIG. 13 presents a block diagram representation of a VPU 300
in accordance with an embodiment of the present invention.
[0017] FIG. 14 presents a graphical representation of a write
operation in accordance with an embodiment of the present
invention.
[0018] FIG. 15 presents a graphical representation of a write
operation in accordance with an embodiment of the present
invention.
[0019] FIG. 16 presents a graphical representation of a read
operation in accordance with an embodiment of the present
invention.
[0020] FIG. 17 presents a graphical representation of a read
operation in accordance with an embodiment of the present
invention.
[0021] FIG. 18 presents a block diagram representation of a VPU 325
in accordance with an embodiment of the present invention.
[0022] FIG. 19 presents a block diagram representation of a video
distribution system 375 in accordance with an embodiment of the
present invention.
[0023] FIG. 20 presents a block diagram representation of a video
storage system 179 in accordance with an embodiment of the present
invention.
[0024] FIG. 21 presents a flow diagram representation of a method
in accordance with an embodiment of the present invention.
[0025] FIG. 22 presents a flow diagram representation of a method
in accordance with an embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION INCLUDING THE PRESENTLY
PREFERRED EMBODIMENTS
[0026] FIGS. 1-3 present pictorial diagram representations of
various video devices in accordance with embodiments of the present
invention. In particular, set top box 10 with built-in digital
video recorder functionality or a stand alone digital video
recorder, television or monitor 20 and portable computer 30
illustrate electronic devices that incorporate a video decoder in
accordance with one or more features or functions of the present
invention. While these particular devices are illustrated, the
present invention can be implemented in any device that is capable
of decoding and/or transcoding video content in accordance with the
methods and systems described in conjunction with FIGS. 4-15 and
the appended claims.
[0027] FIG. 4 presents a block diagram representation of a video
decoder 102 in accordance with an embodiment of the present
invention. In particular, this video device includes a receiving
module 100, such as a server, cable head end, television receiver,
cable television receiver, satellite broadcast receiver, broadband
modem, 3G transceiver or other information receiver or transceiver
that is capable of receiving a received signal 98 and generating a
video signal 110 that has been encoded via a video encoding format.
Video processing device 125 includes video decoder 102 and is
coupled to the receiving module 100 to decode or transcode the
video signal for storage, editing, and/or playback in a format
corresponding to video display device 104. Video processing device
can include set top box 10 with built-in digital video recorder
functionality or a stand alone digital video recorder. While shown
as separate from video display device 104, video processing device
125, including video decoder 102 can be incorporated in television
or monitor 20 and portable computer 30 of other device that
includes a video decoder, such as video decoder 102.
[0028] In an embodiment of the present invention, the received
signal 98 is a broadcast video signal, such as a television signal,
high definition television signal, enhanced definition television
signal or other broadcast video signal that has been transmitted
over a wireless medium, either directly or through one or more
satellites or other relay stations or through a cable network,
optical network or other transmission network. In addition,
received signal 98 can be generated from a stored video file,
played back from a recording medium such as a magnetic tape,
magnetic disk or optical disk, and can include a streaming video
signal that is transmitted over a public or private network such as
a local area network, wide area network, metropolitan area network
or the Internet.
[0029] Video signal 110 can include a digital video signal
complying with a digital video codec standard such as H.264, MPEG-4
Part 10 Advanced Video Coding (AVC) including an SVC signal, an
encoded stereoscopic video signal having a base layer that includes
a 2D compatible base layer and an enhancement layer generated by
processing in accordance with an MVC extension of MPEG-4 AVC, or
another digital format such as a Motion Picture Experts Group
(MPEG) format (such as MPEG1, MPEG2 or MPEG4), Quicktime format,
Real Media format, Windows Media Video (WMV) or Audio Video
Interleave (AVI), video coding one (VC-1), VP8, etc.
[0030] Video display devices 104 can include a television, monitor,
computer, handheld device or other video display device that
creates an optical image stream either directly or indirectly, such
as by projection, based on the processed video signal 112 either as
a streaming video signal or by playback of a stored digital video
file.
[0031] FIG. 5 presents a block diagram representation of a video
decoder 102 in accordance with an embodiment of the present
invention. Video decoder 102 includes an entropy decoding device
140 having a processing module 142 that generates entropy decoded
(EDC) data 146 from an encoded video signal such as video signal
110. The EDC data 146 can include run level data, motion vector
differential data, and macroblock header data and/or other data
that results from the entropy decoding of an encoded video signal.
Multi-format video decoding device 150 includes a processing module
152, a memory module 154 and a hardware accelerator module 156 that
operate to generate a decoded video signal, such as processed video
signal 112, from the EDC data 146.
[0032] In an embodiment of the present invention, the entropy
decoding device 140 and the multi-format video decoding device 150
operate contemporaneously in a pipelined process where the
multi-format video decoding device 150 generates a first portion of
the decoded video signal during at least a portion of time that the
entropy decoding device 140 generates EDC data 146 from a second
portion of the encoded video signal.
[0033] The processing modules 142 and 152 can each be implemented
using a single processing device or a plurality of processing
devices. Such a processing device may be a microprocessor,
co-processors, a micro-controller, digital signal processor,
microcomputer, central processing unit, field programmable gate
array, programmable logic device, state machine, logic circuitry,
analog circuitry, digital circuitry, and/or any device that
manipulates signals (analog and/or digital) based on operational
instructions that are stored in a memory, such as memory modules
144 and 154. These memories may each be a single memory device or a
plurality of memory devices. Such a memory device can include a
hard disk drive or other disk drive, read-only memory, random
access memory, volatile memory, non-volatile memory, static memory,
dynamic memory, flash memory, cache memory, and/or any device that
stores digital information. Note that when the processing modules
142 and 152 implement one or more of their functions via a state
machine, analog circuitry, digital circuitry, and/or logic
circuitry, the memory storing the corresponding operational
instructions may be embedded within, or external to, the circuitry
comprising the state machine, analog circuitry, digital circuitry,
and/or logic circuitry. In an embodiment of the present invention
the processing modules 142 and 152 each includes a processor
produced by ARC International to implement the neighbor management
module 218, however other processor configurations could likewise
be employed.
[0034] FIG. 6 presents a block diagram representation of a
multi-format video decoder 150 in accordance with an embodiment of
the present invention. The memory module 154 includes a software
library 160 that stores format configuration data corresponding to
a plurality of video coding formats such as H.264, MPEG-4 Part 10
Advanced Video Coding (AVC) including the SVC and MVC extensions,
MPEG2, MPEG4, Quicktime format, Real Media format, Windows Media
Video (WMV) or Audio Video Interleave (AVI), video coding one
(VC-1), VP8, or other video coding/compression format, etc. The
hardware accelerator module 156 includes a plurality of vector
processor units (VPU 1, VPU 2, . . . VPU N) that operate in
conjunction with processing module 152 to generate a decoded video
signal from the EDC data 146. The plurality of vector processing
units and the processing module 152 are configured, based on the
configuration data, to a selected one of the plurality of video
coding formats. In this fashion, a since video decoder 102 can be
configured for operation of the particular video coding format or
formats of the video signal 110.
[0035] In an embodiment of the present invention, the multi-format
video decoder 150 can receive selection data from a user or
designer that indicates the particular video coding format. In
another embodiment of the present invention, EDC data 146 can be
analyzed by processing module 152 to identify the video coding
format of the video signal 110. In either case, the multi-format
video decoder 150 responds to the selection by retrieving the
configuration data from the software library 160 and by configuring
the processing module 152 and the vector processing units to decode
the selected video coding format.
[0036] Configuration data can include loading program instructions
executed by the processing module 152 and the vector processing
units of the hardware accelerator module 156 along with other data
used in conjunction with the decoding of the EDC data 146. For
example, when a particular video coding format is selected,
software for processing module 152 and VPU instructions for the
hardware accelerator module 156 are selected to be executed. In one
mode of operation, the VPU instructions include one or more
instructions that configure the vector processing units of hardware
accelerator module 156 to the selected coding format, in addition
to instructions that perform the particular decoding operations
performed by the vector processing units in accordance with the
selected video coding format.
[0037] As will be discussed further in conjunction with FIGS. 7-12,
the vector processors can include one or more matrix vector
processors that perform parallel matrix operations such as matrix
addition, multiplication, transposition, inversion and/or other
matrix operation. The vector processors can further include one or
more filter vector processors that perform parallel filter
operations. These vector processing units are configured via VPU
programming instructions that include vector instructions, scalar
instructions and branching instructions, to operate in accordance
with the selected video coding format.
[0038] FIG. 7 presents a block diagram representation of a
multi-format video decoder 150 in accordance with an embodiment of
the present invention. In particular, multi-format video decoder
150 includes a processing module 152 and a memory module 154 as
described in conjunction with FIG. 5. In addition, the multi-format
video decoding device 150 further includes a bus 221, a signal
interface 158, decode motion compensation module 204, neighbor
management module 218, deblocking filter module 222, inverse
transform module 276, inverse quantization module 274, and inverse
intra prediction module 211. While a particular bus architecture is
shown that represents the functionality of communication between
the various modules of multi-format video decoding device 150,
other architectures can be implemented in accordance with the broad
scope of the present invention.
[0039] In operation, the signal interface 158 receives EDC data 146
and optionally buffers and preprocesses the EDC data 146 for
processing by the other modules of multi-format video decoding
device 150. Similarly, the decoded video signal generated via
processing by the other modules of multi-format video decoding
device 150 is optionally buffered, such as via a ring buffer or
other buffer structure implemented in conjunction with memory
locations of memory module 154 and formatted for output as
processed video signal 112.
[0040] The decode motion compensation module 204, neighbor
management module 218, deblocking filter module 222, inverse
transform module 276, inverse quantization module 274, and inverse
intra prediction module 211 are configured to operate to decode the
EDC data 146 in accordance with the selected video format such as
VP8, H.264 (including MVC and/or SVC), VC-1 or other compression
standard. In an embodiment of the present invention, the decode
motion compensation module 204, neighbor management module 218,
deblocking filter module 222, inverse transform module 276, inverse
quantization module 274, inverse intra prediction module 211 are
implemented using software stored in memory module 154 and executed
via processing module 152 as well as via vector processing unit
instructions executed by the plurality of vector processing units
of hardware accelerator module 156. In a particular embodiment, the
decode motion compensation module 204, deblocking filter module
222, and inverse intra prediction module 211 are implemented using
three separate filter vector processing units, one for each module.
In addition, the inverse transform module 276 and the inverse
quantization module 274 are implemented via two separate matrix
vector processing units, one for each module. In an embodiment of
the present invention the neighbor management module 218 is
implemented via software executed by processing module 152.
[0041] In operation, neighbor management module 218 generates
motion vector data, macroblock mode data and deblock strength data,
based on the motion vector differential data and the macroblock
header data. In an embodiment of the present invention, a data
structure, such as a linked list, array or one or more registers
are used to associate and store neighbor data for each macroblock
of a processed picture. In particular, the neighbor management
module 218 stores the motion vector data for a group of macroblocks
that neighbor a current macroblock and generates the motion vector
data for the current macroblock based on both the macroblock mode
data and the motion vector data for the group of macroblocks that
neighbor the current macroblock. In addition, the neighbor
management module 218 calculates a motion vector magnitude and
adjusts the deblock strength data based on the motion vector
magnitude.
[0042] The decode motion compensation module 204 generates
inter-prediction data based on the motion vector data when the
macroblock mode data indicates an inter-prediction mode. The
inverse intra-prediction module 211 generates intra-prediction data
when the macroblock mode data indicates an intra-prediction mode.
The inverse quantization module 274 dequantizes run level data. The
dequantized run level data is inverse transformed, such as via an
inverse discrete cosine transform or other inverse transform via
inverse transform module 276 to generate residual data. The inverse
intra-prediction module 211 generates reconstructed picture data
based on the residual data and on the inter-prediction data when
the macroblock mode data indicates an inter-prediction mode and
based on the residual data and on the intra-prediction data when
the macroblock mode data indicates an intra-prediction mode.
[0043] The deblocking filter module 222 generates the decoded video
signal from the reconstructed picture data, based on the deblock
strength data. In operation, the deblocking filter module 222
operates to smooth horizontal and vertical edges of a block that
may correspond to exterior boundaries of a macroblock of a frame or
field of video signal 110 or edges that occur in the interior of a
macroblock. A boundary strength, that is determined based on
quantization parameters, adjacent macroblock type, etcetera, can
vary the amount of filtering to be performed. For example, the
H.264 standard defines two parameters, .alpha. and .beta., that are
used to determine the strength of filtering on a particular edge.
The parameter .alpha. is a boundary edge parameter applied to data
that includes macroblock boundaries. The parameter .beta. is an
interior edge parameter applied to data that is within a macroblock
interior. In accordance with the present invention, motion vector
magnitude is used by neighbor management module 218 to generate
deblock strength data that adjusts the values for .alpha. and
.beta. for deblocking filter module 222. For instance, when the
motion vector magnitude indicates large motion vectors, e.g.
magnitudes above a first magnitude threshold, a larger value of
.alpha. can be selected. Further, motion vector magnitude indicates
small motion vectors, e.g. magnitudes below the same or other
threshold, a smaller value of .alpha. can be selected.
[0044] FIG. 8 presents a block diagram representation of a decoding
process in accordance with an embodiment of the present invention.
In this embodiment, the neighbor management module 218 receives
macroblock header and motion vector differential data 230 from the
EDC data 146 via buffer 300. The neighbor management module 218
checks the macroblock (MB) mode from the MB header. In
inter-prediction mode, the neighbor management module 218
calculates motion vectors and also calculates deblock strength data
and passes this data along with other EDC data, such as run level
data 272 to one or more frame buffers, represented in the process
flow as buffers 302, 304, 308, 310 and 318 implemented via memory
module 154. The decode motion compensation module 204 generates
inter-prediction data based on the motion vectors and on reference
frames retrieved from the frame buffer and stores the results in
buffer 314, such as a ring buffer. In intra-prediction mode, the
inverse intra prediction module 211 generates intra-prediction
data.
[0045] The inverse quantization module 274 retrieves run level data
272 from buffer 304 and inverse quantizes the data with data from
the frame buffer 302 and generates de-quantized data that is stored
in buffer 306. Inverse transforms module 276 inverse transforms the
de-quantized data based on the frame buffered data to generate
residual data that is stored in buffer 312. The residual data is
combined in inverse intra-prediction module 211 with either
intra-prediction data or inter-prediction data supplied in response
to the mode determination by neighbor management module 218, to
generate current reconstructed frames/fields that are buffered in
the buffer 316.
[0046] Deblocking filter module 222 applies deblocking filtering to
the reconstructed frames/fields in accordance with the deblock
strength data from neighbor management module 218 to generate
decoded video output in the form of filtered pictures 226 that are
buffered via buffer 320.
[0047] The buffers 306, 312, 314, 316, 318 and 320 can each be a
ring buffer implemented via memory module 154, however other buffer
configurations are likewise possible.
[0048] FIG. 9 presents a block diagram representation of a matrix
vector processing unit 190 in accordance with another embodiment of
the present invention. In particular, matrix vector processing unit
190 includes a dedicated hardware block that performs parallel
matrix operations such as matrix addition, multiplication,
transposition, inversion and/or other matrix operations on an input
matrix 192 to generate an output matrix 194.
[0049] The matrix vector processing unit 190 is configured via VPU
instructions 180 that include vector instructions, scalar
instructions and branching instructions. These VPU instructions 180
include configuration data and commands 170 that configure the
matrix VPU 190 in accordance with the selected video coding format
and command the matrix vector processing unit to perform the
corresponding functions such as all or part of an inverse discrete
cosine transform, inverse quantization or other matrix function of
the multi-format video decoder 150. The VPU instructions 180
further include vector and/or scalar data used in conjunction with
vector and scalar operations of the device.
[0050] FIG. 10 presents a block diagram representation of a filter
vector processing unit 195 in accordance with another embodiment of
the present invention. In particular, filter vector processing unit
195 includes a dedicated hardware block that performs parallel
filter operations such as an n-tap one-dimensional horizontal
filter, an n-tap one-dimensional vertical filter, or an n-tap
two-dimensional filter. The filter VPU 196 operates to filter input
data 196, such as a block of pixels, a row of pixels, a column of
pixels of a video picture or other data to generate filtered data
198.
[0051] The filter vector processing unit 195 is configured via VPU
instructions 181 that include vector instructions, scalar
instructions and branching instructions. These VPU instructions 181
include configuration data and commands 172 that configure the
filter VPU 195 in accordance with the selected video coding format
such as by programming the filter parameters, (e.g. the number of
taps, type of filter, and the particular filter coefficients) and
command the filter vector processing unit to perform the
corresponding functions such as all or part of the generation of
inter-prediction data, intra-prediction data and or filtered
picture data of the multi-function video decoder 150. The VPU
instructions 181 further include vector and/or scalar data used in
conjunction with vector and scalar operations of the device.
[0052] FIG. 11 presents a block diagram representation of a VPU
instruction 180 or 181 in accordance with another embodiment of the
present invention. As previously discussed, the VPU instructions
include three portions, vector instruction 182, scalar instruction
184, and branching instruction 186. Through the use of these
instructions, a vector processing unit, such as matrix vector
processing unit 190 or filter vector processing unit 195 can be
configured/programmed to move blocks of data, to perform vector or
scalar operations on the data, to perform conditional or
unconditional branching, or to perform other logical or arithmetic
operations.
[0053] In an embodiment of the present invention, the vector
instruction 182 can include commands and data to perform multiple
simultaneous logical or arithmetic operations via a single
instruction. In an embodiment of the present invention, the vector
data can include data blocks of 32 bits or more and the matrix or
vector filter operations include any of the operations discussed in
conjunction with either matrix VPU 190 or filter VPU 195. The
scalar instruction 184 can include commands and data to perform
single scalar logical or arithmetic operations via a single
instruction. In an embodiment of the present invention, the scalar
data can include scalar data blocks of 32 bits or less or long
scalar blocks of more than 32 bits. Matrix or filter scalar
operations include mask creation, data masking, addressing
instructions, data move operations, flag calculations, etc.
Branching instructions include conditional or unconditional
branching instructions based on logical or arithmetic
conditions.
[0054] In an example of operation, the filter VPU 195 implements a
deblocking filter as part of deblocking filter module 222. In one
mode of operation, the filter VPU 195 executes filter VPU
instructions 181 in a similar fashion to a function or subroutine
call. For example, in an initial VPU instruction 181, the filter
VPU 195 can execute a data move command to configure a particular
n-tap deblocking filter, based on the selection of the particular
video coding format, by loading filter coefficients and other
configuration data to establish an initial filter configuration. In
subsequent VPU instructions 181, the deblock strength is retrieved
to optionally adjust the filter coefficients or otherwise adjust
the filter configuration to a current deblock strength. In
addition, input data 196 is retrieved, filtered and transferred to
a buffer in response to filter commands.
[0055] FIG. 12 presents a block diagram representation of a VPU 200
in accordance with an embodiment of the present invention. In
particular, vector processor unit 200, is a further example of any
of the vector processor units (VPU 1, VPU 2, . . . VPU N) of
hardware accelerator module 156, including matrix VPU 190 and
filter VPU 195 or other vector processor or a component of any of
the foregoing. VPU 200 includes a vector function module 202 that
generates vector function data 204 based on a vector function of a
first input vector V.sub.1 and a second input vector V.sub.2. A
selection module 210 selects each element of a vector output 216 as
either a corresponding element of the vector function data 204 or a
corresponding element of a third input vector V.sub.3. As shown,
VPU 200 includes a control register 210 for storing the selection
data 212, based on selection instruction 214 that is input to
configure the vector processing unit 200.
[0056] The vector function can include an arithmetic function, such
as a binary addition, subtraction, multiplication, division or
other arithmetic operator. In addition, the vector function can be
a logical function or any other function. For example, the vector
function can includes a selection of the vector function data 204
as either the first input vector V.sub.1 or a second input vector
V.sub.2. Further, third input vector V.sub.3 can be either the
first input vector V.sub.1 and a second input vector V.sub.2. In
this fashion, the vector output 216 can be made up of interspersed
elements of the first input vector V.sub.1 and a second input
vector V.sub.2. In an embodiment of the present invention, the
particular vector function of vector function module 202 can be
configured via function instruction 215. For example, the vector
function module 202 can include a plurality of different vector
functions that are selected via corresponding values of the
function instruction 215. The function instruction 215 can be
scalar or vector instruction.
[0057] In operation, the selection module 210 selects each element
of the vector output 216 based on selection data 212. For example,
the selection data 212 indicates a subset of elements of the vector
output 216 that correspond to the vector function data 204 and
further another subset of elements of the vector output 216 that
correspond to the third input vector V.sub.3. In this fashion, the
selection data 212 indicates which of the elements of the third
input vector V.sub.3 will be modified to be the vector function
data 204 and which other elements of the third input vector V.sub.3
will be left alone, unmodified. The selection instruction 214 can
be a vector instruction, such as a binary selection vector that
indicates, via a binary value for each element the third input
vector V.sub.3, which elements of V.sub.3 will be modified and
unmodified. In an alternative embodiment, the instruction 214 can
be a scalar instruction, such as scalar value that corresponds to
one of a plurality of sets of selection data 212 that indicates
which elements of V.sub.3 will be modified and unmodified.
[0058] Consider the following example, where:
[0059] V.sub.1=(a.sub.1, a.sub.2, a.sub.3, . . . a.sub.n)
[0060] V.sub.2=(b.sub.1, b.sub.2, b.sub.3, . . . b.sub.n)
[0061] V.sub.3=(c.sub.1, c.sub.2, c.sub.3, . . . c.sub.n)
where, the vector function data 204, is represented as V.sub.f
[0062] V.sub.f=(d.sub.1, d.sub.2, d.sub.3, . . . d.sub.n)
where, the ith element d.sub.i is generated as
[0063] d.sub.i=f(a.sub.i, b.sub.i)
for i=(1, 2, 3, . . . n) and where f is the vector function.
Consider further that the selection data 212 is a vector as
follows:
[0064] S=(s.sub.1, s.sub.2, s.sub.3, . . . s.sub.n)
where s.sub.i=1, indicates a selection of the vector function data
204 in the ith position of the vector output 216 and s.sub.i=0,
indicates a selection of the third input vector V.sub.3 in the ith
position of the vector output 216. Representing the vector output
216 as
[0065] V.sub.o=(e.sub.1, e.sub.2, e.sub.3, . . . e.sub.n)
for i=(1, 2, 3, . . . n), the values of the vector output 216 can
be calculated as:
[0066] e.sub.i=d.sub.i, if s.sub.i=1, and
[0067] e.sub.i=c.sub.i, if s.sub.i=0
[0068] The vector processing unit 200 allows implementation of a
wide range of logical/arithmetic vector functions without branching
instructions. In effect, the conditional branching is implemented
as a hardware function or selection, without the need to implement
software branching, conditional statements, etc.
[0069] FIG. 13 presents a block diagram representation of a VPU 300
in accordance with an embodiment of the present invention. In
particular, vector processor unit 300, is a further example of any
of the vector processor units (VPU 1, VPU 2, . . . VPU N) of
hardware accelerator module 156, including matrix VPU 190 and
filter VPU 195 or other vector processor or a component of any of
the foregoing. As shown, VPU 300 includes a control register 310
for storing read/write instructions 314 of the operational
instructions of a program stored in memory to be executed to
generate a decoded video signal from EDC data. The matrix memory
304 can be a single memory device or a plurality of memory devices.
Such a memory device can include a random access memory, volatile
memory, non-volatile memory, static memory, dynamic memory, flash
memory, cache memory, and/or any device that stores digital
information.
[0070] The read/write instructions 314 include vector read
instructions that include vector read orientation data and vector
write instructions that include vector write orientation data that
are input to configure the vector processor unit 300 based on
command data 312. VPU 300 includes a matrix memory 304 that stores
matrix data corresponding to a plurality of rows and columns and
that generates vector read data 306 in a first read orientation
when the vector read orientation data has a first value and that
generates vector read data 306 in a second read orientation when
the vector read orientation data has a second value. The matrix
memory stores vector write data 302 in a first write orientation
when the vector write orientation data has a third value and that
stores vector write data 302 in a second write orientation when the
vector write orientation data has a fourth value.
[0071] For example, read/write instructions 314 can be formatted in
accordance with the following table:
TABLE-US-00001 Read/write indicator Orientation indicator Command
data Read Row Read in row orientation Read Column Read in column
orientation Write Row Write in row orientation Write Column Write
in column orientation
It should be noted that the values of the read/write indicator, the
orientation indicator and the command data 312 can be represented
by different digital values.
[0072] In operation, read/write instructions 314 are loaded in
control register 310. In response to command data 312 that
implements these instructions, matrix memory 304 reads vector read
data 306 from the matrix memory 304 in either column or row
orientation or writes vector write data to the matrix, memory 304
in either column or row orientation. By selection of the same
orientation for read and write operations, vector data can be
stored and retrieved in a traditional fashion. By selection of
different orientations for read and write operations however,
vector data can be automatically transposed, without the need for
further data manipulation and further instructions.
[0073] It should be noted that, while vector processor unit 300 is
shown as a separate unit, in other embodiments the components of
vector processor unit 300 can be included as components of another
vector processor unit such as a matrix multiplication unit or a
unit that performs other matrix mathematical functions that employ
matrix transpositions as part of the input/output manipulation of
matrix data.
[0074] Examples of read and write operations for the matrix memory
304 are shown in conjunction with FIGS. 14-17 that follow.
[0075] FIG. 14 presents a graphical representation of a write
operation in accordance with an embodiment of the present
invention. In particular, an example write operation of matrix
memory 304 is shown. In this example, vector write data 302 is
represented by (x.sub.1, x.sub.2, . . . ) and is stored in row
orientation of as the kth row as (x.sub.k1, x.sub.k2, . . .
x.sub.ki, . . . ). By repeating this operation for a plurality of
vector write data 302 and writing different rows, an entire matrix
can be stored.
[0076] FIG. 15 presents a graphical representation of a write
operation in accordance with an embodiment of the present
invention. In particular, another example write operation of matrix
memory 304 is shown. In this further example, vector write data 302
is also represented by (x.sub.1, x.sub.2, . . . ) but is stored in
column orientation of as the ith column as (x.sub.1i, x.sub.2i, . .
. x.sub.ki, . . . ). By repeating this operation for a plurality of
vector write data 302 and writing different columns, an entire
matrix can be stored.
[0077] FIG. 16 presents a graphical representation of a read
operation in accordance with an embodiment of the present
invention. In particular, an example read operation of matrix
memory 304 is shown. In this example, vector read data 306 is
represented by (x.sub.1, x.sub.2, . . . ) and is retrieved in row
orientation as the kth row as (x.sub.k1, x.sub.k2, . . . x.sub.ki,
. . . ). By repeating this operation for a plurality of vector read
data 306 and reading different rows, an entire matrix can be
read.
[0078] FIG. 17 presents a graphical representation of a read
operation in accordance with an embodiment of the present
invention. In particular, another example read operation of matrix
memory 304 is shown. In this example, vector read data 306 is also
represented by (x.sub.1, x.sub.2, . . . ) and is retrieved in
column orientation as the ith column as (x.sub.1i, x.sub.2i, . . .
x.sub.ki, . . . ). By repeating this operation for a plurality of
vector read data 306 and reading different columns, an entire
matrix can be read.
[0079] As discussed in conjunction with FIG. 13, by selection of
the same orientation for read and write operations, vector data can
be stored and retrieved in a traditional fashion. By selection of
different orientations for read and write operations however,
vector data can be automatically transposed, without the need for
further data manipulation and further instructions. For example,
writing data as shown in FIG. 14 and reading the data as shown in
FIG. 16 yields no transposition. However, writing data as shown in
FIG. 14 and reading the data as shown in FIG. 17 yields a
transposition of the matrix data.
[0080] FIG. 18 presents a block diagram representation of a VPU 325
in accordance with an embodiment of the present invention. In
particular, vector processor unit 325, is a further example of any
of the vector processor units (VPU 1, VPU 2, . . . VPU N) of
hardware accelerator module 156, including matrix VPU 190 and
filter VPU 195 or other vector processor or a component of any of
the foregoing. VPU 325 includes a matrix multiplier 320 that
generates output data 326 based on a matrix multiplication of input
data 322 and input data 324. As shown, VPU 325 includes a control
register 330 for storing a matrix instruction 334 that includes
matrix input configuration data 332 that configures the matrix
multiplier 320. In particular the matrix input configuration data
332 indicates the dimensionality of the input data 322 and the
input data 324, and by inference the dimensionality of the output
data 326.
[0081] In an embodiment of the present invention, different values
of the matrix input configuration data can correspond to input data
formatted as a 1.times.8 matrix, an 8.times.8 matrix, a 4.times.4
matrix or other dimensions. For example, matrix instructions 314
can be formatted in accordance with the following table:
TABLE-US-00002 Dimensions of Dimensions of Dimensions of input data
322 input data 324 output data 326 1 .times. 8 8 .times. 8 1
.times. 8 8 .times. 8 1 .times. 8 8 .times. 1 4 .times. 4 4 .times.
4 4 .times. 4
It should be noted that the dimensions of the input data 322, input
data 324 and output data 326 can be represented by different
digital values.
[0082] In operation, matrix instructions 314 are loaded in control
register 330. In response to matrix input configuration data 332
the matrix multiplier 320 multiplies the input data 322 by the
input data 324 to generate the output data 326. In an embodiment of
the present invention, the matrix multiplier includes a plurality
of multipliers and adders that are configured, based on the matrix
input configuration data 332 to perform the mathematical functions
associated with matrix multiplication.
[0083] FIG. 19 presents a block diagram representation of a video
distribution system 375 in accordance with an embodiment of the
present invention. In particular, video signal 110 is transmitted
from a video encoder via a transmission path 122 to a video decoder
102. The video decoder 102 operates to decode the video signal 110
for display on a display devices 12 or 14 or other display device.
In an embodiment of the present invention, video decoder 102 can be
implemented in a set-top box, digital video recorder, router or
home gateway. In the alternative, decoder 102 can optionally be
incorporated directly in the display device 12 or 14.
[0084] The transmission path 122 can include a wireless path that
operates in accordance with a wireless local area network protocol
such as an 802.11 protocol, a WIMAX protocol, a Bluetooth protocol,
etc. Further, the transmission path can include a wired path that
operates in accordance with a wired protocol such as a Universal
Serial Bus protocol, an Ethernet protocol or other high speed
protocol.
[0085] FIG. 20 presents a block diagram representation of a video
storage system 179 in accordance with an embodiment of the present
invention. In particular, device 11 is a set top box with built-in
digital video recorder functionality, a stand alone digital video
recorder, a DVD recorder/player or other device that stores the
video signal 110. In this configuration, device 11 can include
video decoder 102 that operates to decode the video signal 110 when
retrieved from storage to generate a processed video signal 112 in
a format that is suitable for display by video display device 12.
While these particular devices are illustrated, video storage
system 179 can include a hard drive, flash memory device, computer,
DVD burner, or any other device that is capable of generating,
storing, decoding, transcoding and/or displaying the video content
of video signal 110 in accordance with the methods and systems
described in conjunction with the features and functions of the
present invention as described herein.
[0086] FIG. 21 presents a flow diagram representation of a method
in accordance with an embodiment of the present invention. In
particular, a method is presented for use in conjunction with one
or more functions and features described in conjunction with FIGS.
1-20. In step 400, entropy decoded (EDC) data is generated from an
encoded video signal. In step 402, a decoded video signal is
generated from the EDC data, via a plurality of vector processor
units, in response to a plurality of operational instructions
including at least one vector read instruction that includes vector
read orientation data, wherein at least one of the plurality of
vector processor units operates by: storing matrix data in a matrix
memory corresponding to a plurality of rows and columns; generating
vector read data in a first read orientation when the vector read
orientation data has a first value; and generating vector read data
in a second read orientation when the vector read orientation data
has a second value.
[0087] In an embodiment of the present invention, the vector read
data in the first read orientation corresponds to matrix data in
one of the plurality of rows of the matrix memory, and the vector
read data in the second read orientation corresponds to matrix data
in one of the plurality of columns of the matrix memory.
[0088] The plurality of operational instructions can further
include at least one vector write instruction that includes vector
write orientation data, and wherein the at least one of the
plurality of vector processor units further operates by: storing
vector write data in a first write orientation when the vector
write orientation data has a third value; and storing vector write
data in a second write orientation when the vector write
orientation data has a fourth value. In an embodiment of the
present invention, the vector write data in the first write
orientation corresponds to matrix data in one of the plurality of
rows of the matrix memory, and the vector write data in the second
write orientation corresponds to matrix data in one of the
plurality of columns of the matrix memory.
[0089] FIG. 22 presents a flow diagram representation of a method
in accordance with an embodiment of the present invention. In
particular, a method is presented for use in conjunction with one
or more functions and features described in conjunction with FIGS.
1-21. In step 410, entropy decoded (EDC) data is generated from an
encoded video signal. In step 412, a decoded video signal is
generated from the EDC data, via a plurality of vector processor
units, in response to a plurality of operational instructions
including at least one matrix multiply instruction that includes
matrix input configuration data, wherein at least one of the
plurality of vector processor units operates by: generating output
data based on a multiplication of first input data and second input
data in accordance with the matrix input configuration data,
wherein the matrix input configuration data indicates the
dimensionality of the first input data and the second input
data.
[0090] In an embodiment of the present invention, the first input
data is formatted as a 1.times.8 matrix when the matrix input
configuration data has a first value, is formatted as an 8.times.8
matrix when the matrix input configuration data has a second value.
The second input data is formatted as a 1.times.8 matrix when the
matrix input configuration data has a first value and is formatted
as an 8.times.8 matrix when the matrix input configuration data has
a second value. The first input data and the second input data can
both be formatted as a 4.times.4 matrix when the matrix input
configuration data has another value.
[0091] While particular combinations of various functions and
features of the present invention have been expressly described
herein, other combinations of these features and functions are
possible that are not limited by the particular examples disclosed
herein are expressly incorporated in within the scope of the
present invention.
[0092] As one of ordinary skill in the art will appreciate, the
term "substantially" or "approximately", as may be used herein,
provides an industry-accepted tolerance to its corresponding term
and/or relativity between items. Such an industry-accepted
tolerance ranges from less than one percent to twenty percent and
corresponds to, but is not limited to, component values, integrated
circuit process variations, temperature variations, rise and fall
times, and/or thermal noise. Such relativity between items ranges
from a difference of a few percent to magnitude differences. As one
of ordinary skill in the art will further appreciate, the term
"coupled", as may be used herein, includes direct coupling and
indirect coupling via another component, element, circuit, or
module where, for indirect coupling, the intervening component,
element, circuit, or module does not modify the information of a
signal but may adjust its current level, voltage level, and/or
power level. As one of ordinary skill in the art will also
appreciate, inferred coupling (i.e., where one element is coupled
to another element by inference) includes direct and indirect
coupling between two elements in the same manner as "coupled". As
one of ordinary skill in the art will further appreciate, the term
"compares favorably", as may be used herein, indicates that a
comparison between two or more elements, items, signals, etc.,
provides a desired relationship. For example, when the desired
relationship is that signal 1 has a greater magnitude than signal
2, a favorable comparison may be achieved when the magnitude of
signal 1 is greater than that of signal 2 or when the magnitude of
signal 2 is less than that of signal 1.
[0093] As the term module is used in the description of the various
embodiments of the present invention, a module includes a
functional block that is implemented in hardware, software, and/or
firmware that performs one or module functions such as the
processing of an input signal to produce an output signal. As used
herein, a module may contain submodules that themselves are
modules.
[0094] Thus, there has been described herein an apparatus and
method, as well as several embodiments including a preferred
embodiment, for implementing a video decoder. Various embodiments
of the present invention herein-described have features that
distinguish the present invention from the prior art.
[0095] It will be apparent to those skilled in the art that the
disclosed invention may be modified in numerous ways and may assume
many embodiments other than the preferred forms specifically set
out and described above. Accordingly, it is intended by the
appended claims to cover all modifications of the invention which
fall within the true spirit and scope of the invention.
* * * * *