U.S. patent number 6,847,686 [Application Number 09/988,878] was granted by the patent office on 2005-01-25 for video encoding device.
This patent grant is currently assigned to Broadcom Corporation. Invention is credited to Amir Morad, Leonid Yavits.
United States Patent |
6,847,686 |
Morad , et al. |
January 25, 2005 |
Video encoding device
Abstract
Video encoding device including a video input processor, for
receiving said video signal, a global controller, for controlling
the global operation of the video encoding device, a motion
estimation processor, a digital signal processor and a bit-stream
processor, wherein the global controller stores encodes commands
received from a host interface thereby programming the video input
processor, the motion estimation processor, the digital signal
processor and the bit-stream processor, the video input processor
receives and stores the video signal in an external memory unit,
the motion estimation processor retrieves the video signal from the
memory unit, generates motion analysis of the video signal, stores
the motion analysis in the memory unit and provides the motion
analysis to the digital signal processor, the digital signal
processor processes the video signal according to the motion
analysis, thereby producing an encoding commands sequence and
encoded data, the bit-stream processor produces an encoded video
signal according to the encoding command sequence and the encoded
data.
Inventors: |
Morad; Amir (Tel Aviv,
IL), Yavits; Leonid (Haifa, IL) |
Assignee: |
Broadcom Corporation (Irvine,
CA)
|
Family
ID: |
11070893 |
Appl.
No.: |
09/988,878 |
Filed: |
November 19, 2001 |
Related U.S. Patent Documents
|
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
Issue Date |
|
|
010859 |
Jan 22, 1998 |
6385244 |
|
|
|
Foreign Application Priority Data
Current U.S.
Class: |
375/240.16;
375/E7.094; 375/E7.103 |
Current CPC
Class: |
H04N
19/423 (20141101); H04N 19/53 (20141101); H04N
19/436 (20141101); H04N 19/43 (20141101) |
Current International
Class: |
G06T
9/00 (20060101); H04N 7/26 (20060101); H04N
007/12 () |
Field of
Search: |
;348/394.1,402.1,403.1,412.1,430.1,404.1,699
;375/240.01,240.02,240.12,240.13,240.16 ;382/232,234,238,253
;345/112 ;704/501 |
References Cited
[Referenced By]
U.S. Patent Documents
Foreign Patent Documents
Primary Examiner: Diep; Nhon
Attorney, Agent or Firm: McAndrews, Held & Malloy
Parent Case Text
This is a continuation of application Ser. No. 09/010,859, filed
Jan. 22, 1998 now U.S. Pat. No. 6,385,244 . Each of these prior
applications is hereby incorporated by reference, in its entirety.
Claims
What is claimed is:
1. A digital signal processor for processing a multiple frame video
digital signal, comprising: a DSP controller, a plurality of
processing units connected to said DSP controller for processing
said multiple frame video digital signal; and at least one storage
unit, wherein each of said processing units is connected to at
least one of said at least one storage units, said DSP controller
controlling said plurality of processing units, wherein said DSP
controller, said plurality of processing units, and said at least
one storage unit are on a single chip.
2. The digital signal processor according to claim 1 wherein each
of said processing units is operative to access any storage address
of any of said at least one storage unit, connected thereto.
3. The digital signal processor according to claim 1 wherein each
of said processing units operates according to a different program
command.
4. The digital signal processor according to claim 1 wherein each
of said processing units operates on a different portion of data.
Description
FIELD OF THE INVENTION
The present invention relates to video encoding devices in general
and to single chip video encoding devices, in particular.
BACKGROUND OF THE INVENTION
Methods for encoding an audio-visual signal are known in the art.
According to these methods, a video signal is digitized, analyzed
and encoded in a compressed manner. These methods are implemented
in computer systems, either in software, hardware or a combined
software-hardware form.
Most hardware encoding systems consist of a set of semiconductor
circuits, which are arranged on a large circuit board. State of the
art encoding systems include a single semiconductor circuit, which
is based on a high power processor.
Reference is now made to FIG. 1, which is a schematic illustration
of a video encoding circuit, referenced 10, which is known in the
art.
Circuit 10 includes a motion estimation processor 12, a motion
estimation memory 14 connected to the motion estimation processor
12, a RISC processor 16 connected to the motion estimation
processor 12 and an image buffer 18, connected to RISC processor
16.
RISC processor 16 transfers a portion of video signal from image
buffer 18 to memory unit 14. Motion estimation processor 12
analyzes the motion of the video signal. Motion estimation
processor 12 utilizes memory unit 14 as a storage area for the
video signal portion which is currently processed by it. When the
motion estimation processor 12 completed analyzing the motion of a
video signal portion, it transfers the results of the motion
estimation analysis to the RISC processor 16.
The RISC processor 16 performs all other processing and encoding
tasks which the video signal has to undergo, such as discrete
COSINE transform (DCT), quantization, entropy encoding, bit-stream
production and the like. The RISC processor 16 utilizes the image
buffer 18 as a storage area for the video signal portion which is
currently processed by it, and as a temporary storage for its
computational purposes.
It will be appreciated by those skilled in the art that such
encoding systems have several disadvantages. For example, one
disadvantage of circuit 10 is that each of the processing units 12
and 16 have a separate storage area. Accordingly, each of the
processed portions of video signal, such as and ISO/IEC 13818
(MPEG-2) macro-blocks, have to be transferred to both memory unit
14 and image buffer 18. RISC processor 16 has to access image
buffer 18 for the same data, each time this data is required. Such
Retrieval of large data blocks, many times, greatly increases data
traffic volume over the encoding system data transmission
lines.
Another disadvantage is that circuit 10 is able to execute all
processing and encoding tasks in a serial manner, thereby capable
of processing only a single macro-block at a time, requiring high
operational processor frequencies. Circuit 10 receives a
macro-block, processes it and produces an encoded bit-stream.
Internally, the RISC processor 16 operates in the same manner.
Hence, as long as the RISC processor 10 hasn't completed
transmitting the encoded bit-stream of a selected macro-block, it
cannot receive the next macro-block.
It will be appreciated by those skilled in the art that the
operational frequency of circuit 10 has a direct affect over the
heat produced by it, thereby requiring large cooling elements as
well as massive cooling devices such as fans and the like.
It will be appreciated by those skilled in the art that such
circuit structure requires that input-output (I.backslash.O)
operations have to be performed extremely fast, thereby greatly
increasing the storage memory bandwidth requirements.
Another disadvantage of such systems is that all processing and
encoding procedures (excluding motion estimation) are executed by
the same RISC processor. In this case, the same circuit performs
various types of computations, which makes the utilization of the
processor's hardware resources very inefficient.
Methods for estimating motion in a video signal are known in the
art. According to these methods a frame is compared with previous
frames. The difference between the frames is used to estimate a
level of motion. These methods analyze a frame and map it, thereby
indicating areas in frame which have no motion over previous frames
and areas in the frame which are assigned with a motion level.
According to one such like method each pixel in the search area is
analyzed. This method requires a vast number of estimation
operations and is thereby extremely resource consuming. This method
is also called a full exhaustive search.
According to another method, known in the art, the search area is
scanned in a center weighted manner, which can be logarithmic, and
the like, whereby the center of the search area is scanned
thoroughly at full resolution and the rest of the search area is
scanned at lower resolution. Areas which detected as having some
motion, in the low resolution search, are scanned again in full
resolution. This reduces the overall number of estimation
operations.
Reference is now made to FIG. 2, which is a schematic illustration
of a DSP processor, referenced 50, which is known in the art.
DSP processor 50 is of a single instruction multiple data (SIMD)
type machine. It includes a plurality of identical processing units
(P.U.) 52, 56, 60, 64, 68 and 72, and a random access memory (RAM)
61. RAM 61 is divided into segments 54, 58, 62, 66, 70 and 74.
Each memory segment is exclusively assigned and connected to a
processing unit, whereas RAM segment units 54, 58, 62, 66, 70 and
74 are assigned to and connected to processing units (P.U.) 52, 56,
60, 64, 68 and 72, respectively.
This structure has several disadvantages. One disadvantage of such
machine is that the same operation is performed by all of the
processing units at same time.
Another disadvantage of the SIMD machine is that the data is not
shared among the processing units. For example, processing unit 56
can access data contained in RAM segment 66 via processing unit 64
only. It cannot do so directly. It will be appreciated by those
skilled in the art that such a configuration is inefficient.
A further disadvantage is that individual operations that vary for
different data items can not be efficiently performed by an SIMD
machine. The programming of such operations into the processing
units, is very difficult. Such individual operations can be only
performed in serial manner, while masking all irrelevant data,
resulting in shutting off most of the processing units. The
utilization of the hardware resources in an SIMD machine during
such programming operations is very low, and performance of the
machine are dramatically decreased.
Another disadvantage relates to the interconnection structure
between the processing units. It will be appreciated that, a
processing unit within an SIMD machine is connected to a limited
number of neighboring processing units. Hence communication between
such a processing unit and a processing unit not connected thereto,
is often a complex operation.
Bit-stream processing and generation, in a conventional encoding
circuit, is performed by a general purpose processor. Bit-stream
generation requires some specific operations, which can not be
performed efficiently by a general purpose processor. In order to
perform such special operation, a general purpose processor uses a
small portion of its processing resources, while shutting off rest
of them. Therefore, the disadvantage is that the resources of such
processor are not utilized efficiently.
SUMMARY OF THE PRESENT INVENTION
It is an object of the present invention to provide a novel device
for encoding an audio-visual signal, which overcomes the
disadvantages of the prior art.
It is another object of the present invention to provide a novel
motion estimation processor.
It is a further object of the present invention to provide a novel
DSP for processing framed information.
In accordance with the present invention there is thus provided a
video encoding system including a video source providing multiple
frame video signal, a memory unit, a compressed data interface, a
host interface and a video encoding device, connected to the video
source, the memory unit and the host interface. The video encoding
device includes a video input processor, for receiving the video
signal, a global controller, for controlling the global operation
of the video encoding device, connected to the video input
processor, a motion estimation processor, connected to the global
controller, a digital signal processor, connected to the global
controller and the motion estimation processor, and a bit-stream
processor connected to the digital signal processor and the global
controller and the compressed data interface.
The global controller stores encoding commands received from the
host interface thereby programming the video input processor, the
motion estimation processor, the digital signal processor and the
bit-stream processor.
The video input processor receives and stores the video signal in
the memory unit.
The motion estimation processor retrieves the video signal from the
memory unit, generates motion analysis of the video signal, stores
the motion analysis in the memory unit and provides the motion
analysis to the digital signal processor.
The digital signal processor processes the video signal according
to the motion analysis, thereby producing an encoding commands
sequence and encoded data.
The bit-stream processor produces an encoded video signal according
to the encoding command sequence and the encoded data.
According to another aspect of the invention at least one of the
video input processor, the motion estimation processor, the digital
signal processor and the bit-stream processor is directly connected
to the memory unit.
According to another aspect of the invention at least one of the
video input processor, the motion estimation processor, the digital
signal processor and the bit-stream processor is connected to the
memory unit via the global controller.
The motion estimation processor, the digital signal processor and
the bit-stream processor can operate in parallel, whereas the
motion estimation processor operates on a macro-block i, the
digital signal processor operates on a macro-block j and the
bit-stream processor operates on a macro-block k, wherein
i>j>k.
Furthermore, in accordance with a preferred embodiment of the
invention the entire video encoding device in constructed on a
monolithic semiconductor.
According to another aspect of the invention the motion estimation
processor includes a controller and a plurality of resolution
processors, connected to the controller. The plurality of
resolution processors analyze the development of the video signal
in time, thereby producing motion analysis. The controller controls
the plurality of resolution processors.
The plurality of resolution processors include at least one low
resolution processor, for producing low resolution motion analysis,
at least one full resolution processor, for producing full
resolution motion analysis and at least one hyper resolution
processor, for producing hyper resolution motion analysis.
At least one of the least one low resolution processor reduces the
resolution of a selected frame before producing the low motion
analysis and at least one of the least one hyper resolution
processors, enhances the resolution of a selected frame before
producing the hyper resolution motion analysis.
The plurality of resolution processors include at least one full
resolution processor, for producing full resolution motion
analysis. The one full resolution processor processes the selected
frame according to the low resolution motion analysis.
The plurality of resolution processors include at least one hyper
resolution processor, for producing hyper resolution motion
analysis, wherein at least one of the hyper resolution processors,
processes a selected frame according to the full resolution motion
analysis.
The digital signal processor includes a DSP controller, a plurality
of processing units, for processing the multiple frame video
digital signal, connected to the DSP controller, and at least one
storage unit. Each of the processing units is connected to at least
one of the storage units. The DSP controller controls the plurality
of processing units.
Each of the processing units is operative to access any storage
address of any of the storage units, connected thereto. Each of the
processing units is capable of operating according to a different
program command. Each of the processing units operates on a
different portion of data.
In accordance with a further aspect of the present invention, there
is provided a video camera including an optical assembly, a light
sensitive device, and the video encoding device, connected to the
light sensitive device.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will be understood and appreciated more fully
from the following detailed description taken in conjunction with
the drawings in which:
FIG. 1 is a schematic illustration of an encoding circuit, known in
the art;
FIG. 2 is a schematic illustration of an DSP unit, which is known
in the art;
FIG. 3 is a schematic illustration of a video encoding device,
constructed and operative in accordance with a preferred embodiment
of the invention;
FIG. 4 is a schematic illustration of a video encoding device,
constructed and operative in accordance with another preferred
embodiment of the invention;
FIG. 5 is a schematic illustration in detail of the motion
estimation processor, of the video encoding device of the device of
FIG. 3, constructed and operative in accordance with a further
preferred embodiment of the invention;
FIG. 6 is a schematic illustration of a video frame, including
three resolution representation of a portion of this frame;
FIG. 7 is a schematic illustration in detail of the digital signal
processor, of the video encoding device of FIG. 3, constructed and
operative in accordance with another preferred embodiment of the
invention;
FIG. 8 is a schematic illustration in detail of a digital signal
processor, constructed and operative in accordance with a further
preferred embodiment of the invention;
FIG. 9 is a schematic illustration in detail of the video input
processor, of the encoding device of FIG. 3, constructed and
operative in accordance with a further preferred embodiment of the
invention;
FIG. 10 is a schematic illustration in detail of the bit-stream
processor, of the encoding device of FIG. 3, constructed and
operative in accordance with another preferred embodiment of the
invention;
FIG. 11 is a schematic illustration in detail of the global
controller of the encoding device of FIG. 3, constructed and
operative in accordance with another preferred embodiment of the
invention; and
FIG. 12 is a schematic illustration in detail of the encoding
device of FIG. 3, incorporated in a video camera, constructed and
operative in accordance with a further preferred embodiment of the
invention.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
The present invention overcomes the disadvantages of the prior art
by providing a novel approach to video compression processing and a
novel structure for a device, according to this approach.
The device according to the invention is a massively parallel
digital video processor designed, for the purpose of real-time
video encoding, like MPEG. This device can be incorporated in a
single chip, and installed in digital camcorders, recordable
digital video disk (DVD), PC and workstation multimedia,
educational and training systems, video conferencing, broadcast
equipment, security, content creation/authoring/video editing
equipment, and the like.
Reference is now made to FIG. 3, which is a schematic illustration
of a video encoding device, generally referenced 100, constructed
and operative in accordance with a preferred embodiment of the
invention.
Device 100 includes a video input processor 106, a motion
estimation processor 108, a digital signal processor 104, a
bit-stream processor 102 and a global controller 110.
The video input processor 106 is connected to the global controller
110. The motion estimation processor 108 is connected to the global
controller 110 and to the digital signal processor 104. The digital
signal processor 104 is connected to the global controller 110 and
to the bit-stream processor 102, which is also connected to the
global controller 110.
The bit-stream processor 102 is further connected to a compressed
data interface 128. The global controller 110 is further connected
to a host interface 120 and to a memory unit 122. The input of the
video input processor 106 is further connected to a digital video
source (not shown) via a digital video interface 124. Such a host
is typically a user interface which is operative to receive
commands, operational parameters, and the like, from a user or a
supervising system and also to provide to the user information
received from device 100.
Device 100 operates in two modes: a programming mode and an
operational mode. Device 100 is operative to run according to both
modes at the same time. In the programming mode, an external host
transfers the data and control parameters to the global controller
110, via the host interface 120.
The global controller 110 can transfer the data and control signals
to the video input processor 106, motion estimation processor 108,
digital signal processor 104 and bit-stream processor 102.
In the operational mode, the video input processor 106 captures
motion video signal from an external video source via the digitized
video interface 124. Video input processor 106 also performs
preprocessing of the video signal, such as spatial filtering, noise
reduction, image quality improvement, image size adjustment, and
the like, color format conversion, and the like, thereby producing
preprocessed video data.
Video input processor 106 accumulates the preprocessed video data
into data blocks and transfers them to the global controller 110.
Global controller 110 stores the data blocks in memory unit 122. In
the present example, the device operates under MPEG-2 video
compression standard. Hence, a data block represents an MPEG-2
macro-block, which is a sixteen by sixteen [16.times.16] matrix of
luminance pixels and two, four or eight, eight by eight [8.times.8]
matrices of chrominance pixels, as defined by the MPEG-2 standard.
A reference frame represents a picture which is compared versus
current picture during the motion estimation.
The global controller 110 retrieves a current picture macro-block
and reference picture macro-blocks from the memory unit 122 and
loads them to the motion estimation processor 108. Motion
estimation processor 108 compares the current picture macro-block
with the respective reference frame macro-blocks, thereby producing
an estimation of the motion of the current picture macro-block.
The motion estimation processor 108 uses this estimation to remove
temporal redundancy of the video signal, as will be described in
detail hereinbelow. The motion estimation processor 108 transfers
the resulting motion estimation data to the global controller 110.
Motion estimation processor 108 also transfers the current picture
macro-block and the corresponding reference frames macro-blocks to
the digital signal processor 104.
Digital signal processor 104 (DSP) executes procedures which are
intended to remove the spatial redundancy of the video signal,
thereby producing a sequence of compression commands, as will be
described in detail hereinbelow. This sequence of compression
commands includes instruction as to which frame of the original
video signal is to be compressed into an I-frame, a B-frame or a
P-frame, and according to which reference frames.
Then, the digital signal processor 104 transfers the sequence of
compressed data to the bit-stream processor 102 and to the global
controller 110. The bit-stream processor 102 performs a series of
encoding procedures, such as entropy encoding, and the like, as
will be described in detail hereinbelow.
The bit-stream processor 102 compresses data into an MPEG-2
standard format data, in accordance with the sequence of
compression commands.
Then, the bit-stream processor 102 transfers the MPEG-2 standard
format data to the compressed data interface 128. It will be noted
that compressed data interface 128 can be connected to any data
receptacle element such as a storage unit or a communication
transmission line.
Global controller 110 controls and schedules the video input
processor 106, the motion estimation processor 108, the digital
signal processor 104 and the bit-stream processor 102.
Global controller 110 also governs the data transfer among the
motion estimation processor 108, digital signal processor 104 and
the bit-stream processor 102.
The global controller 110 also connects between the external host
and video input processor 106, motion estimation processor 108,
digital signal processor 104 and bit-stream processor 102. In the
operational mode, an external host can access the register file 408
(FIG. 10) of global controller 110 for read and/or write
operations.
According to one aspect of the invention, in operational mode,
macro-blocks are fed into device 100, in a horizontal or vertical
raster scan manner, from the top left macro-block through to the
right bottom macro-block, of a specified frame. Device 100
processes a number of successive macro-blocks, at the same time.
For example, while the bit-stream processor 102 processes the i-th
macro-block, digital signal processor 104 processes the i+1-th
macro-block and motion estimation processor 108 processes the
i+2-th through i+4-th macro-blocks.
According to another example, while the bit-stream processor 102
processes the i-th macro-block, digital signal processor 104
processes the k-th macro-block and motion estimation processor 108
processes the j-th through j+m-th macro-blocks, wherein i<k<j
and m.gtoreq.1.
Device 100 overcomes a disadvantage of the prior art by using
memory unit 122 as a shared storage area which is accessible to all
of its internal units, via global controller 110. In the present
example, all access to storage unit 122 is provided via global
controller 110.
Reference is now made to FIG. 4, which is a schematic illustration
of a video encoding device, generally referenced 200, constructed
and operative in accordance with another preferred embodiment of
the invention.
Device 200 is generally similar to device 100 and includes a video
input processor 206, a motion estimation processor 208, a digital
signal processor 204, a bit-stream processor 202 and a global
controller 210.
Device 200 is also connected to a compressed data interface 228, a
memory unit 222, a digital video source (not shown) via a digital
video interface 224 and a host interface 220. In device 200, all of
the internal components are connected directly to memory unit
222.
Accordingly, video input processor 206, motion estimation processor
208, digital signal processor 204, bit-stream processor 202 and
global controller 210 can, each, access any storage address within
memory unit 222, directly, thereby performing any I.backslash.O
operation.
It will be noted that a shared memory structure according to the
invention can include a combination of the examples disclosed in
FIGS. 3 and 4, whereby some components are directly connected to
memory unit and the rest are connected to the memory unit via a
mediating element, such as global controller 110.
Referring back to FIG. 3, it is noted that according to another
aspect of the invention, memory unit 110 can partitioned into many
sub-areas, whereby each of the internal units of device 100, is
granted an access level which is selected from a list of access
levels such as read-write, read only, write only, no access and the
like. It will be appreciated by those skilled that such a structure
is provides tremendous flexibility, whereby the amount of memory
assigned to a selected internal unit can be increased or decreased
dynamically, in real-time.
According to another aspect of the present invention, device 100
performs different processing and encoding procedures in parallel,
by processing a number of successive macro-blocks simultaneously.
Hence, a selected macro-block is permitted to be present in the
device 100 for an extended period of time, with comparison to
device 10 (FIG. 1), thereby greatly reducing the operational
frequency of device 100, by factor of at least five.
Reference is now made to FIGS. 5 and 6. FIG. 5 is a schematic
illustration in detail of motion estimation processor 108,
constructed and operative in accordance with a further preferred
embodiment of the invention. FIG. 6 is a schematic illustration of
a video frame, generally referenced 170, including three resolution
representation of a portion of this frame, generally referenced A,
B and C.
Motion estimation processor 108 includes a low resolution processor
150, a full resolution processor 152, a hyper resolution processor
154 and a micro-controller 158. The frame input of the motion
estimation processor 108 is connected to the inputs of the low
resolution processor 150, the full resolution processor 152, the
hyper resolution processor.
The frame output of motion estimation processor 108 is connected to
the hyper resolution processor 154. The micro-controller 158 is
connected to the low resolution processor 150, the full resolution
processor 152 and the hyper resolution processor and the control
port of the motion estimation processor 108, which is connected to
global controller 110 (FIG. 3).
Referring now to FIG. 6, frame 170 is a [6.times.6] matrix which
includes a digital representation of a video frame F.sub.i. The
currently scanned block is block B, which is a [4.times.4] matrix
from pixel B:(1,1) to pixel B:(4,4). Frame 170 is provided to
processors 150, 152 and 154 (FIG. 5).
Referring back to FIG. 5, motion estimation processor 108 can
operate in two modes: a programming mode and an operational mode.
The motion estimation processor can operate according to both modes
at the same time.
In the programming mode, the global controller 110 (FIG. 3)
provides control parameters as well as data parameters, to
micro-controller 158, via the control port.
Micro-controller 158 controls and synchronizes processors 150, 152
and 154. According to a preferred embodiment of the present
invention, processors 150, 152 and 154 operate either in a parallel
or in a serial manner.
In an example of parallel operation, the low resolution processor
150 processes i-th macro-block, while the full resolution processor
152 processes I-1-th macro-block, whereas the hyper resolution
processor 154 processes I-2-th macro-block, all at the same
time.
In an example of serial operation, the low resolution processor 150
processes i-th macro-block, while both full resolution processor
152 and the hyper resolution processor 154 process I+1-th
macro-block in a serial manner.
Low resolution processor 150 operates as follows. The global
controller 110 loads the current picture macro-block and the
reference pictures data blocks into the low resolution processor
150. The low resolution processor 150 performs a resolution
reduction, resulting in decreasing the amount of image data. The
low resolution processor 150 can perform the resolution reduction
by different methods, like decimation, low pass filtering,
non-linear filtering, and the like.
Reference is now made to FIG. 6. In the present example, low
resolution processor 150 (FIG. 5) generates a low resolution block
A from block B. Block A is a [2.times.2] matrix, wherein pixel
A:(1,1) is a combined representation of pixels B:(1,1), B:(1,2),
B:(2,1) and B:(2,2), pixel A:(2,1) is a combined representation of
pixels B:(3,1), B:(3,2), B:(4,1) and B:(4,2), pixel A:(1,2) is a
combined representation of pixels B:(1,3), B:(1,4), B:(2,3) and
B:(2,4) and pixel A:(2,2) is a combined representation of pixels
B:(3,3), B:(3,4), B:(4,3) and B:(4,4).
It will be noted that such pixel combination can be performed in
many ways such as calculating the average value of the combined
pixels, selecting the dominant one, and the like.
The resolution reduction of the present example is at a ratio of
1:4. It will be noted that low resolution processor 150 can perform
a resolution reduction at any ratio desired.
After low resolution processor completes the resolution reduction,
then, it performs a search procedure. The low resolution processor
150 can performs different types of search, like full exhaustive
search, telescopic search, and the like, thereby producing low
resolution motion analysis. After the search is completed, the
global controller 110 (FIG. 3) reads the low resolution motion
analysis data from the low resolution processor 150, via the local
controller 158.
Full resolution processor 152 operates as follows: The global
controller 110 loads the current picture block (referenced B in
FIG. 6) and the reference pictures data block into the full
resolution processor 152, according to the low resolution motion
analysis. Then, the full resolution processor 152 performs a search
procedure.
The full resolution processor 152 can perform different types of
search, like full exhaustive search, telescopic search, and the
like, thereby producing full resolution motion analysis. After the
search is completed, the global controller 110 reads the full
resolution motion analysis data from the full resolution processor
152, via the local controller 158.
Full resolution processor 152 and hyper resolution processor 154
can have a shared storage area. This aspect of the invention
reduces the memory requirements of the encoding system.
Hyper resolution processor 154 operates as follows. The global
controller 110 loads the current picture macro-block and the
reference pictures data blocks into the hyper resolution processor
154. The hyper resolution processor 154 multiplies the resolution
of the image data, enabling a motion prediction with a resolution
higher than a single pixel step.
Hyper resolution processor 154 can perform the resolution
multiplication by different methods, like zero order interpolation,
first order interpolation, and the like.
With reference to FIG. 6, in the present example, hyper resolution
processor 154 (FIG. 5) generates a hyper resolution block C from
block B. Block C is a [9.times.9] matrix, wherein pixels C:(1,1),
C:(3,1), C:(5,1), C:(7,1), C:(1,3), C:(3,3), C:(5,3), C:(7,3),
C:(1,5), C:(3,5), C:(5,5), C:(7,5), C:(1,7), C:(3,7), C:(5,7) and
C:(7,7) are equal to pixels B:(1,1), B:(2,1), B:(3,1), B:(4,1),
B:(1,2), B:(2,2), B:(3,2), B:(4,2), B:(1,3), B:(2,3), B:(3,3),
B:(4,3), B:(1,4), B:(2,4), B:(3,4) and B:(4,4), respectively.
Each other pixel in block C: can be an interpolation of selected B:
pixels adjacent to its respective place therein. For example, pixel
C:(2,2) is an interpolation of pixels B:(1,1), B:(1,2), B:(2,1) and
B:(2,2). Pixel C:(2,1) is an interpolation of pixels B:(1,1), and
B:(2,1). Pixel C:(1,0) is an interpolation of pixels B:(1,0), and
B:(1,1).
The resolution enhancement of the present example is at a ratio of
4:1. It will be noted that hyper resolution processor 154 can
perform a resolution enhancement at any ratio desired.
After the hyper resolution processor 154 completes the resolution
multiplication, it performs a search procedure. The hyper
resolution processor 154 can perform different types of search,
like full exhaustive search, telescopic search, and the like,
thereby producing hyper resolution motion analysis. After the
search is completed, the global controller 110 reads the hyper
resolution motion analysis data, from the hyper resolution
processor 154, via the local controller 158.
A motion estimation processor according to another embodiment of
the invention can include as many resolution processors as desired,
wherein some resolution processors are low resolution processors,
at a variety of low resolutions and other resolution processors are
hyper resolution processors, at a variety of hyper resolutions.
Reference is now made to FIG. 7, which is a schematic illustration
in detail of digital signal processor 104, of FIG. 3, constructed
and operative in accordance with another preferred embodiment of
the invention.
Digital signal processor 104 includes a master controller 250, a
random access unit (RAM) 270 and N processing units 252, 254, 256,
258, 260 and 262. Each processing unit 252, 254, 256, 258, 260 and
262 is connected to the master controller 250 and to the RAM
270.
The processing units 252, 254, 256, 258, 260 and 262 are further
connected to motion estimation processor 108 and to the global
controller 110 (FIG. 3), for retrieving macro block data
therefrom.
Digital signal processor 104 can operate in two modes: a
programming mode and an operational mode. Digital signal processor
104 can operate according to both modes at the same time. In the
programming mode, the global controller 110 (FIG. 3) transfers data
and control parameters to/from master controller 250. The master
controller 250 can independently program each processing unit 252,
254, 256, 258, 260 and 262.
In the operational mode, the master controller 250 and all
processing units 252, 254, 256, 258, 260 and 262, operate in
parallel. The motion estimation processor 108 (FIG. 3) transfers
the current macro-block and the reference macro-blocks data to the
processing units 252, 254, 256, 258, 260 and 262 of the digital
signal processor 104. In the present example, the motion estimation
processor 108 (FIG. 3) provides luminance macro-blocks and the
global controller 110 provides chrominance macro-blocks retrieved
from memory unit 122.
The global controller 110 transfers the appropriate data (like
motion vectors, macro-block type, perdition type, and the like) to
the master controller 250. The master controller 250 performs
special processing procedures such as like rate control, DCT type
selection, macro-block type selection and the like.
The master controller 250 distributes control instructions to the
processing units 252, 254, 256, 258, 260 and 262, and receives
processed data from each of these processing units. Processing
units 252, 254, 256, 258, 260 and 262 perform processing procedures
on large data blocks, such as discrete cosine transform (DCT),
inverse discrete cosine transform (IDCT), quantization, inverse
quantization, frame type decisions, and the like.
Each of these processing units processes different data blocks.
Each processing unit can access the data blocks associated with
other processing units, in RAM 270. All processing unit can execute
different operations in parallel. The processing units transfer the
processed coefficient data to the bit-stream processor 102 (FIG.
3). The master controller 250 of the digital signal processor 104
transfers the appropriate data (like macro-block type, DCT type,
quantizer scale, etc.) to the global controller 110 (FIG. 3).
The present invention overcomes the disadvantages of the prior art
by configuring the master controller 250 so as to perform
individual processing tasks on some data items while the processing
units 252, 254, 256, 258, 260 and 262, simultaneously perform
massive processing tasks on large data blocks.
According to the present example, the master controller 250
temporarily assigns a storage area in RAM 270, to each of the
processing unit 252, 254, 256, 258, 260 and 262.
Hence, each processing unit 252, 254, 256, 258, 260 and 262 can
access the data which is associated with the other processing
units. It will be appreciated by those skilled in the art that such
a structure greatly enhances the efficiency of processing and data
transfer operations in DSP 104. Such parallel access structure of
the processing units also allows very fast and efficient data
transfer to and from the digital signal processor 104.
Reference is now made to FIG. 8, which is a schematic illustration
in detail of a digital signal processor, generally referenced 144,
constructed and operative in accordance with a further preferred
embodiment of the invention.
Digital signal processor 144 is generally similar to digital signal
processor 104, with a slightly different memory structure. Digital
signal processor 144 includes a master controller 288, N processing
units 271, 274, 276, 280, 282 and 286 and N/2 random access units
(RAM) 272, 278 and 284.
Each RAM unit is connected to two processing unit. Processing units
271 and 274 are connected to RAM unit 272. Processing units 276 and
280 are connected to RAM unit 278. Processing units 282 and 286 are
connected to RAM unit 284.
Each processing unit is able to access any address in the RAM unit
connected thereto.
According to FIGS. 7 and 8 it will be appreciated that the
invention is is not limited to any shared memory structure between
processing units.
Reference is now made to FIG. 9, which is a schematic illustration
in detail of video input processor 106, of FIG. 3, constructed and
operative in accordance with a further preferred embodiment of the
invention.
Video input processor 106 includes a video capture unit 230, a
video preprocessor 232 and a temporary video storage 236. The
inputs of the video capture unit 230, the video preprocessor 232
and the temporary video storage 236 are connected to the video
input processor 106 (FIG. 3). The input of the video capture unit
230 is connected to video input processor 106. The video capture
unit 230 is connected to the video preprocessor 232. The video
preprocessor 232 is connected to the temporary video storage 236.
The output of the temporary video storage 236 is connected to the
global controller 110.
Video input processor 106 operates in two modes: programming and
operational. Video input processor 106 is operative to run
according to both modes at the same time. In the programming mode,
the global controller 110 (FIG. 3) loads data and control
parameters to the video capture unit 230, to the video preprocessor
232 and to the temporary video storage 236.
In the operational mode, the video capture unit 230 acquires the
input video signal. The video capture unit 230 is self synchronized
with the input video signal, according to its format (NTSC, PAL,
SECAM, and the like), programmable resolution (D1, SIF, QSIF, and
the like), and the like.
The video capture unit 230 also provides video synchronization
signals (like a new frame start, a new field start, etc.) to the
global controller 110.
The video preprocessor 232 performs a series of video processing
procedures to enhance the captured video signal. The video
processing procedures can include a color format conversion, size
reduction, noise reduction, edge sharpening, image quality
improvement, and the like.
The temporary video storage 236 accumulates the processed video
signal and provides a "data ready" signal to the global controller
110 (FIG. 3). The global controller 110 reads the accumulated image
data from the temporary video storage 236. The global controller
110 also provides control signals to the video input processor
106.
The usage of the temporary video storage 236 allows to efficiently
adjust the data rates of an external video signal and the internal
data transfer. The video input processor 106 can accumulate the
processed video signal in a real time variable rate whereas the
global controller 110 can transfer the accumulated data to the
memory unit 122 in a burst. This greatly reduces the memory
bandwidth requirements, and makes the usage of a memory unit 122
more efficient.
Reference is now made to FIG. 10, which is a schematic illustration
in detail of bit-stream processor 102, of FIG. 3, constructed and
operative in accordance with a further preferred embodiment of the
invention.
Bit-stream processor 102 includes a local processing unit 356, an
entropy encoder 358, a temporary coefficient storage 360, a
bit-stream formatter 352, an output buffer 354 and a
micro-controller 350. The input of the bit-stream processor 102 is
connected to the input of the temporary coefficient storage 360 and
of the local processing unit 356. The temporary coefficient storage
360 is connected to the entropy encoder 358. The entropy encoder
358 is further connected to the local processing unit 356 and to
the bit-stream formatter 352. The local processing unit 356 is
further connected to the output of the bit-stream processor 102.
The bit-stream formatter 352 is connected to the output buffer 354.
The output of the output buffer 354 is connected to the output of
the bit-stream processor 102. The input of the micro-controller 158
is connected to the input of the bit-stream processor 102. The
micro-controller 158 of the bit-stream processor 102 is connected
to the processing unit, the entropy encoder 358, temporary
coefficient storage 360, bit-stream formatter 352 and output buffer
354.
The bit-stream processor 102 can operate in two modes: programming
and operational. Bit-stream processor 102 is operative to run
according to both modes at the same time.
In the programming mode, the global controller 110 (FIG. 3) loads
the data and control parameters to micro-controller 350 and to
local processing unit 356. The digital signal processor 104 (FIG.
3) loads the processed coefficients to the temporary coefficient
storage 360.
In the operational mode, the entropy encoder 358 loads the data
from the local processing unit 356 registers and the temporary
coefficient storage 360. The entropy encoder 358 performs a series
of encoding procedures, like zigzag/alternate scan, run-length
encoding, variable length encoding of data, and the like, thereby
producing encoded data.
The local processing unit 356 performs arithmetic and logical
operations required to support the entropy encoding. The local
processing unit 356 also provides a temporary storage for the data
loaded from the global controller 110 (FIG. 3) in the programming
mode.
The bit-stream formatter 352 reads the encoded data from the
entropy encoder 358 and formats it into a standard bit-stream. The
output buffer 354 provides a temporary storage to the bit-stream
data. The micro-controller 350 provides the control and
synchronization signals to the local processing unit 356, the
entropy encoder 358, the temporary coefficient storage 360, the
bit-stream formatter 352 and the output buffer 354.
The global controller 110 can put the bit-stream processor 102 into
programming or operational mode by loading an appropriate control
signal to the micro-controller 158 of the bit-stream processor
102.
Entropy encoding and computational operations in the bit-stream
processor 102 are performed in parallel, by operating entropy
encoder 358 and processing unit 356, simultaneously. This allows a
very efficient utilization of the bit-stream processor 102
resources.
Reference is now made to FIG. 11, which is a schematic illustration
in detail of global controller 110, of FIG. 3, constructed and
operative in accordance with another preferred embodiment of the
invention.
Global controller 110 includes a memory controller 400, an exchange
unit 406, a register file 408, a processing unit 410, a host
storage 404 and a micro-controller 402. The input of the
micro-controller 402 is connected to the input of the global
controller 110. The micro-controller 402 is connected to the memory
controller 400, exchange unit 406, register file 408, processing
unit, host storage 404 and to output of global controller 110. The
external storage interface of the global controller 110 is
connected to the memory controller 400. The input/output of the
memory controller 400 is connected to the input/output of the
global controller 110 respectively. The exchange unit 406 is
connected to the register file 408. The register file 408 is
connected to the processing unit 410 and to the host storage 404 of
the global controller 110. The host storage 404 is connected to the
host interface 120 (FIG. 3).
With further reference to FIG. 3, the global controller 110
schedules, synchronizes and controls motion estimation processor
108, digital signal processor 104, bit-stream processor 102 and
video input processor 106. The global controller 110 controls the
internal data transfer of device 100 and the external data transfer
from device 100 to external devices such as memory unit 122, an
external host, and the like. The global controller 110 can also
initializes and performs a variety of testing procedures on motion
estimation processor 108, digital signal processor 104, bit-stream
processor 102 and video input processor 106 and also memory unit
122. The global controller 110 is a massively parallel processor,
capable of simultaneous execution of computational operations,
internal data transfer and external data transfer.
The global controller 110 operates in two modes: programming and
operational. Global controller 110 is operative to run according to
both modes at the same time.
According to the programming mode, an external host loads the data
and control parameters to the host storage 404. The external host
instruct the is global controller 110 to perform according to the
operational mode by loading a predetermined control signal into the
host storage 404.
According to the operational mode, the micro-controller 402 is
synchronized to an external video source by the video
synchronization signal which comes from the video input processor
106 (FIG. 3).
The micro-controller 402 provides control and synchronization
signals to the motion estimation processor 108, the digital signal
processor 104, the bit-stream processor 102 and the video input
processor 106. The micro-controller 402 can also instruct each of
these units to perform in accordance with a programming mode or an
operational mode.
According to the operational mode, the global controller 110 loads
a new frame from the video input processor 106 to the memory
controller 400. The memory controller 400 transfers this data to
the external storage. The memory controller 400 also reads the
current and reference frames macro-blocks from the external storage
and transfers them to the motion estimation processor 108 (FIG.
3).
The memory controller 400 provides the control signals, addresses
and the like to memory unit 122 (FIG. 3).
According to the operational mode, the exchange unit 406 of the
global controller 110 reads and writes different data items to and
from the motion estimation processor 108, the digital signal
processor 104, the bit-stream processor 102 and the video input
processor 106.
The exchange unit 406 transfers the data to and from the register
file 408. In order to support the simultaneous parallel processing
of multiple macro-blocks in device 100, register file 408 maintains
a memory structure such as a stack, which contains the sets of
parameters associated with each macro-block.
This memory structure can be divided into multiple stacks of
variable depth. The processing unit 410 can read the data from the
register file 408, perform various arithmetic and logical
operations, and store the processed data back into register file
408.
The register file 408 can access the host storage 404 to retrieve
the data which an external host loaded into the host storage 404
during the programming mode.
The register file 408 can also transfer the data to the host
storage 404, such that an external host can access the data during
both programming and operational modes.
Reference is now made to FIG. 12, which is a schematic illustration
of encoding device 100, incorporated in a video camera, generally
referenced 500, constructed and operative in accordance with
another preferred embodiment of the invention.
Camera 500 includes an optical assembly 502, a charge coupled
device (CCD) 510, a host interface 504, a random access memory 506,
a communication interface 520, two storage units 508 and 516 and
encoding device 100.
Device 100 is connected charge coupled device 510, host interface
504, a communication interface 520, random access memory 506 and
storage units 508 and 516.
Host interface 504 includes a display 514 and a keyboard 512 and
can be used to display the status of encoding device 100 as well as
to receive instructions from a user.
Storage unit 508 is a tape based storage device. Storage unit 516
is a disk based storage device, such as a magnetic hard drive, an
optical storage device, a magneto-optical storage device and the
like. It will be noted that other types of storage devices can also
be used for this purpose, like semiconductor based memory units
such as flash memory, RAM and the like.
CCD 510 converts light, arriving from the optical assembly 502,
representing an image, into an electrical signal. CCD 510 is
preferably a digital light sensitive device which can be replaced
by an analog light sensitive device, followed by an analog to
digital converter, for converting an analog video signal into a
digital video signal.
Then, CCD 510 provides the digital video signal to video input
processor 106, of encoding device 100. The encoding device 100
encodes the digital video signal, produces an encoded video signal
and provides it at the output of bit-stream processor 102. During
the encoding process, the encoding device 100 uses a random access
memory 506 as a temporary storage area for video data as well as
analysis data, produced by its inner components.
Encoding device 100 provides the encoded video signal to storage
devices 508 and 516, and to communication interface 520.
It will be appreciated that any of the storage units 508 and 5016
as well as the communication interface 520 can provide digital
video signal as input for device 100. In the present example,
storage device 516 is also connected to device 100 via the video
input processor 106.
It will be noted that the present invention provides an encoding
device which can easily be adapted to ISO/IEC 11172 (MPEG-1) as
well as other encoding standards which are similar to MPEG such as
ISO/IEC H.320, H.261 and H.263, as well as different motion JPEG
methods.
It will be appreciated by persons skilled in the art that the
present invention is not limited to what has been particularly
shown and described hereinabove. Rather the scope of the present
invention is defined by the claims which follow.
* * * * *