U.S. patent application number 09/902976 was filed with the patent office on 2002-02-21 for video compression using adaptive selection of groups of frames, adaptive bit allocation, and adaptive replenishment.
This patent application is currently assigned to MediaFlow, LLC.. Invention is credited to Jang, Seong H., Jayant, Nuggehally S., Yoon, Janghyun.
Application Number | 20020021756 09/902976 |
Document ID | / |
Family ID | 22810485 |
Filed Date | 2002-02-21 |
United States Patent
Application |
20020021756 |
Kind Code |
A1 |
Jayant, Nuggehally S. ; et
al. |
February 21, 2002 |
Video compression using adaptive selection of groups of frames,
adaptive bit allocation, and adaptive replenishment
Abstract
The present invention provides video signal compression that
efficiently groups pictures in a video stream into variably-sized
groups of pictures (GOPs), thereby providing lower achievable
output signal bit rates and higher output signal quality. The video
signal compression maximizes the output signal quality by
appropriately allocating bits among individual pictures and GOPs in
the output signal. The video signal compression of the present
invention also applies compression methods that reduce noise in the
output signal, by utilizing a macroblock-based tunable conditional
replenishment technique. The conditional replenishment technique
exploits the similarities among images in the variably-sized GOPs
to further minimize output bit rate and maximize the output signal
quality. An analysis-by-synthesis method is also provided to select
a best asynchronous sampling method among various generated
candidate output streams.
Inventors: |
Jayant, Nuggehally S.;
(Alpharetta, GA) ; Jang, Seong H.; (Doraville,
GA) ; Yoon, Janghyun; (Smyrna, GA) |
Correspondence
Address: |
KING & SPALDING
191 PEACHTREE STREET, N.E.
ATLANTA
GA
30303-1763
US
|
Assignee: |
MediaFlow, LLC.
4390 Candacraig
Alpharetta
GA
|
Family ID: |
22810485 |
Appl. No.: |
09/902976 |
Filed: |
July 11, 2001 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60217301 |
Jul 11, 2000 |
|
|
|
Current U.S.
Class: |
375/240.16 ;
348/700; 348/E5.067; 375/240.12; 375/240.24; 375/E7.148;
375/E7.151; 375/E7.159; 375/E7.163; 375/E7.181 |
Current CPC
Class: |
G06T 2207/10016
20130101; H04N 19/152 20141101; G06T 3/4092 20130101; H04N 19/172
20141101; G06T 7/12 20170101; H04N 19/114 20141101; H04N 19/527
20141101; H04N 19/137 20141101; G06T 9/00 20130101; G06T 9/007
20130101; G06T 3/403 20130101; H04N 19/107 20141101; H04N 5/147
20130101; G06T 2207/20192 20130101 |
Class at
Publication: |
375/240.16 ;
348/700; 375/240.12; 375/240.24 |
International
Class: |
H04N 007/12 |
Claims
What is claimed is:
1. A method for processing an input video stream comprising a
series of pictures, the method comprising the steps of: detecting a
first scene change between a first scene in the input video stream
and a second scene in the input video stream; and classifying a
first picture in the input video stream as a first intra-picture
(I-picture), wherein the first picture coincides with the first
scene change.
2. The method of claim 1, further comprising the steps of:
determining whether there are a predetermined number of pictures
between the first intra-picture and a second scene change;
classifying a second picture in the input video stream as a second
intra-picture, in response to a determination that the
predetermined number of pictures exist between the first
intra-picture and the second scene change, wherein the second
picture coincides with the predetermined number of pictures.
3. The method of claim 2, further comprising the steps of:
classifying a third picture in the input video stream as a third
intra-picture, wherein the third intra-picture coincides with the
second scene change.
4. The method of claim 1, wherein the step of determining a scene
change, comprises the step of determining whether a change in a
motion vector in the first picture exceeds a predetermined motion
vector threshold.
5. A system for organizing a series of pictures in an input video
stream into at least one group of pictures (GOP), comprising: a
scene change detector operative to detect a scene change in the
series of pictures and to classify a first picture following the
scene change as a first intra-picture (I-picture) and to classify
at least one other picture following the scene change as a
predicted picture (P-picture) and to classify at least one second
picture as a bi-directionally predicted picture (B-picture); and a
bit allocation module operative to determine whether a first GOP
uses less than a predetermined target number of bits and further
operative to allocate an unneeded bit to a second GOP in response
to a determination that the first GOP uses less than the
predetermined target number of bits.
6. The system of claim 5, further comprising a bit rate controller
operative to compare a previous macroblock of a first picture to a
subsequent macroblock in a second picture and to determine that the
subsequent macroblock is different than the previous
macroblock.
7. The system of claim 6, wherein the bit rate controller is
further operative to determine a first criterion characterizing the
relationship between the previous macroblock and the subsequent
macroblock and to compare the first criterion to a first threshold
value.
8. The system of claim 7, further comprising a decoder operative to
represent the subsequent macroblock in an output video stream,
wherein the bit rate controller is further operative to instruct
the decoder to represent the subsequent macroblock in an identical
form as the previous macroblock, in response to a determination
that the first criterion is less than the first threshold
value.
9. The system of claim 7, wherein the bit rate controller is
further operative to instruct the decoder to represent the
subsequent macroblock in a non-identical form as the previous
macroblock, in response to a determination that the first criterion
is less than the first threshold value.
10. An encoding system for compressing an input video stream having
a series of pictures, the encoding system comprising: a video
encoder operative to receive the input video stream and an input
control stream and to generate an encoded video stream; a picture
grouping module operative to receive the input video stream and to
generate at least one adaptive picture grouping for the pictures in
the encoded video stream; a bit allocation module operative to
receive the input video stream and to adaptively allocate bits
among the series of pictures and to adaptively allocate bits among
the adaptive picture groupings.
11. The encoding system of claim 10, wherein the adaptive grouping
comprises classifying the pictures in the input video stream as
intra-pictures (I-pictures), predicted-pictures (P-pictures), and
bidirectionally predicted pictures (B-pictures)
12. The encoding system of claim 10, further comprising a bit rate
controller operative to compare a previous macroblock of a first
picture to a subsequent macroblock in a second picture and to
determine that the subsequent macroblock is different than the
previous macroblock.
13. The encoding system of claim 12, wherein the bit rate
controller is further operative to determine a first criterion
characterizing the relationship between the previous macroblock and
the subsequent macroblock and to compare the first criterion to a
first threshold value and to instruct a decoder to represent the
subsequent macroblock in an identical form as the previous
macroblock, in response to a determination that the first criterion
is less than the first threshold value.
14. A method for selecting a video stream sampling technique, the
method comprising the steps of: encoding an input video stream
using a first sampling technique to generate a first encoded video
stream; encoding an input video stream using a second sampling
technique to generate a second encoded video stream; comparing at
least one characteristic of the first encoded video stream to at
least one characteristic of the second encoded video stream;
selecting the first encoded video stream as an output encoded video
stream, in response to a determination that the at least one
characteristic of the first encoded video stream is preferable to
the at least one characteristic of the second encoded video stream;
and selecting the second encoded video stream as an output encoded
video stream, in response to a determination that the at least one
characteristic of the second encoded video stream is preferable to
the at least one characteristic of the first encoded video
stream.
15. A method for adaptively grouping pictures in an input video
stream, the method comprising: creating a first group of pictures
(GOP); classifying a first picture in the input video stream as an
intra-picture (I-picture) and adding the first picture to the first
GOP; retrieving a second picture from the input video stream making
a determination as to whether a second picture in the input video
stream coincides with a scene change; classifying the second
picture as an I-picture, in response to a determination that the
second picture in the input video stream coincides with a scene
change; and classifying the second picture as a non-I-picture and
adding the second picture to the first GOP, in response to a
determination that the second picture in the input video stream
does not coincide with a scene change.
16. The method of claim 15, further comprising the step of creating
a second GOP and adding the second picture to the second GOP, in
response to a determination that the second picture in the input
video stream coincides with a scene change.
17. The method of claim 16, wherein the first GOP and the second
GOP can contain different numbers of pictures.
18. The method of claim 15, wherein the non-I-picture is a
predicted picture (P-picture).
19. The method of claim 15, wherein the non-I-picture is a
bidirectionally predicted picture (B-picture).
20. The method of claim 15, wherein the determination that the
second picture in the input video stream coincides with a scene
change, comprises a making a determination that a motion vector
corresponding to the second picture has been changed.
Description
PRIORITY AND RELATED APPLICATIONS
[0001] The present application claims priority to provisional
patent application entitled, "Video Processing Method with General
and Specific Applications," filed on Jul. 11, 2000 and assigned
U.S. application Ser. No. 60/217,301. The present application is
also related to non-provisional application entitled, "Adaptive
Edge Detection and Enhancement for Image Processing," (attorney
docket number 07816-105003) filed on Jul. 11, 2001 and assigned
U.S. application Ser. No. ______; and non-provisional application
entitled, and non-provisional application entitled, "System and
Method for Calculating an Optimum Display Size for a Visual
Object," (attorney docket number 07816-105002) filed on Jul. 11,
2001 and assigned U.S. application Ser. No. ______.
FIELD OF THE INVENTION
[0002] The present invention relates to the processing of a video
stream and more specifically relates to the improvement of video
stream compression by adaptively selecting a group of pictures
based on video stream content, by adaptively allocating bits to
generate a compressed video stream, and by adaptively replenishing
macroblocks.
BACKGROUND OF THE INVENTION
[0003] Recent advancements in communication technologies have
enabled the widespread distribution of data over communication
mediums such as the Internet and broadband cable systems. This
increased capability has lead to increased demand for the
distribution of a diverse range of content over these communication
mediums. Whereas early uses of the Internet were often limited to
the distribution of raw data, more recent advances include the
distribution of HTML-based graphics and audio files.
[0004] More recent efforts have been made to distribute video media
over these communication mediums. However, because of the large
amount of data needed to represent a video presentation, the data
is typically compressed prior to distribution. Data compression is
a well-known means for conserving transmission resources when
transmitting large amounts of data or conserving storage resources
when storing large amounts of data. In short, data compression
involves minimizing or reducing the size of a data signal (e.g., a
data file) in order to yield a more compact digital representation
of that data signal. Because digital representations of audio and
video data signals tend to be very large, data compression is
virtually a necessary step in the process of widespread
distribution of digital representations of audio and video
signals.
[0005] Fortunately, video signals are typically well suited for
standard data compression techniques. Most video signals include
significant data redundancy. Within a single video frame (image),
there typically exists significant correlation among adjacent
portions of the frame, referred to as spatial correlation.
Similarly, adjacent video frames tend to include significant
correlation between corresponding image portions, referred to as
temporal correlation. Moreover, there is typically a considerable
amount of data in an uncompressed video signal that is irrelevant.
That is, the presence or absence of that data will not perceivably
affect the quality of the output video signal. Because video
signals often include large amounts of such redundant and
irrelevant data, video signals are typically compressed prior to
transmission and then decompressed again after transmission.
[0006] Generally, the distribution of a video signal includes a
transmission unit and a receiving unit. The transmission unit will
receive a video signal as input and will compress the video signal
and transmit the signal to the receiving unit. Compression of a
video signal is usually performed by an encoder. The encoder
typically reduces the data rate of the input video signal to a
level that is predetermined by the capacity of the transmission
medium. For example, for a typical video file transfer, the
required data rate can be reduced from about 30 Megabits per second
to about 384 kilobits per second. The compression ratio is defined
as the ratio between the size of the input video signal and the
size of the compressed video signal. If the transmission medium is
capable of a high transmission rate, then a lower compression
ration can be used. On the other hand, if the transmission medium
is capable of a relatively low transmission rate, then a lower
compression ratio can be used.
[0007] After the receiving unit receives the compressed video
signal, the signal must be decompressed before it can be adequately
displayed. The decompression process is performed by a decoder. In
some applications, the decoder is used to decompress the compressed
video signal so that it is identical to the original input video
signal. This is referred to as lossless compression, because no
data is lost in the compression and decompression processes. The
majority of encoding and decoding applications, however, use lossy
compression, wherein some predefined amount of the original data is
irretrievably lost in the compression and expansion process. In
order to decompress the video stream to its original (pre-encoding)
data size, the lost data must be replaced by new data.
Unfortunately, lossy compression of video signals will almost
always result in the degradation of the output video signal when
displayed after decoding, because the new data is usually not
identical to the lost original data. Video signal degradation
typically manifests itself as a perceivable flaw in a displayed
video image. These flaws are typically referred to as noise.
Well-known kinds of video noise include blockiness, mosquito noise,
salt-and-pepper noise, and fuzzy edges. The data rate (or bit rate)
often determines the quality of the decoded video stream. A video
stream that was encoded with a high bit rate is generally a higher
quality video stream than one encoded at a lower bit rate.
[0008] Conventional methods of compressing video signals include
the partitioning of the video signal into groups of pictures.
Unfortunately, conventional compression techniques utilize
inefficient and arbitrarily simple methods of grouping pictures
that result in higher output signal bit rates and/or lower output
signal quality. Moreover, because these conventional techniques use
arbitrarily simple picture groupings, they do not provide the
opportunity to maximize the output signal quality by appropriately
allocating bits among pictures and picture groups in the output
signal. Finally, these compression techniques typically apply
compression methods that result in the propagation and
amplification of noise, especially in background potions of a video
picture.
[0009] Therefore, there is a need in the art for video signal
compression that efficiently groups pictures in a video stream and
provides for lower output signal bit rates and higher output signal
quality. The video signal compression also should maximize the
output signal quality by appropriately allocating bits among
pictures and picture groups in the output signal. In addition, the
video signal compression also should apply compression methods that
reduce noise in the output signal. Finally, the method should
enable the use of various sampling techniques and should enable the
selection of an output stream, based on the sampling technique
providing the best video stream.
SUMMARY OF THE INVENTION
[0010] The present invention provides video signal compression that
efficiently groups pictures in a video stream into variably-sized
groups of pictures (GOPs) thereby providing lower achievable output
signal bit rates and higher output signal quality. The video signal
compression maximizes the output signal quality by appropriately
allocating bits among pictures and picture groups in the output
signal. An adaptive method of bit allocation among picture groups
and within the pictures in those picture groups enables the
efficient allocation of bits, according to the relative sizes of
the picture groups. The video signal compression of the present
invention also applies compression methods that reduce noise in the
output signal, by utilizing a macroblock-based tunable conditional
replenishment technique. The conditional replenishment technique
exploits the similarities among images in the variably-sized GOPs
to further minimize output bit rate and maximize the output signal
quality. An analysis-by-synthesis method is also provided to select
a best asynchronous sampling method among candidate sampling
procedures.
[0011] In one aspect of the invention, a method is provided for
processing an input video stream comprising a series of pictures. A
first scene change is detected between a first scene in the input
video stream and a second scene in the input video stream. The
method classifies the first picture following the first scene
change as an intra-picture (I-picture).
[0012] In another aspect of the invention, the input stream
processing method determines whether there are a predetermined
number of pictures between the first I-picture and a second scene
change. A second picture in the input video stream is classified as
a second I-picture, where it is determined that the predetermined
number of pictures exist between the first intra-picture and the
second scene change, wherein the second picture coincides with the
predetermined number of pictures.
[0013] In yet another aspect of the invention, a system is provided
for organizing a series of pictures in an input video stream into
at least one group of pictures (GOP). The system includes a picture
grouping module for detecting a scene change in the series of
pictures and for classifying a first picture following the scene
change as a first intra-picture (I-picture). The picture grouping
module also can classify at least one other picture following the
scene change as a predicted picture (P-picture) and can classify at
least one second picture as a bi-directionally predicted picture
(B-picture). The system also includes a bit allocation module for
determining whether a first GOP uses less than a predetermined
target number of bits and further operative to allocate an unneeded
bit to a second GOP in response to a determination that the first
GOP uses less than the predetermined target number of bits.
[0014] The various aspects of the present invention may be more
clearly understood and appreciated from a review of the following
detailed description of the disclosed embodiments and by reference
to the drawings and claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] FIG. 1 is a block diagram depicting an exemplary video
stream comprised of a series of video pictures.
[0016] FIG. 2 is a flowchart depicting an exemplary method for
coding, transmitting, and decoding a video stream.
[0017] FIG. 3 is a block diagram depicting a system for encoding a
video stream that is an exemplary embodiment of the present
invention.
[0018] FIG. 4 depicts a conventional decoding system for receiving
an encoded video stream and providing decoded video and audio
output.
[0019] FIG. 5 is a block diagram depicting an exemplary selection
of picture encoding modes in a GOP.
[0020] FIG. 6 is a block diagram depicting an exemplary timeline
comparing the occurrence of scene changes in a video stream with
alternative GOP size formats.
[0021] FIG. 7 is a flowchart depicting an exemplary method for
creating GOPs of varying sizes.
[0022] FIG. 8 is a graph depicting a typical relationship between
the bits generated by a conventional compression method and a
conventional group of pictures.
[0023] FIG. 9 is a series of block diagrams and graphs comparing
the generated bit graph of a conventional compression method with a
generated bit graph of an exemplary embodiment of the present
invention.
[0024] FIG. 10a is a flow chart depicting an exemplary method for
adaptively allocating bits among variable-sized groups of
pictures.
[0025] FIG. 10b is a flow chart depicting an exemplary method for
adaptively allocating bits among pictures within a GOP.
[0026] FIG. 11 is a simplified illustration depicting successive
pictures in an exemplary GOP divided into macroblocks.
[0027] FIG. 12 is a flowchart depicting an exemplary method for
performing conditional replenishment on a macroblock-basis.
[0028] FIG. 13 is a flowchart depicting an exemplary method for
generating and selecting between two sampling methods.
DETAILED DESCRIPTION OF THE EXEMPLARY EMBODIMENTS
[0029] The present invention provides video signal compression that
efficiently groups pictures in a video stream into variably-sized
groups of pictures (GOPs) thereby providing lower achievable output
signal bit rates and higher output signal quality. The video signal
compression maximizes the output signal quality by appropriately
allocating bits among pictures and picture groups in the output
signal. An adaptive method of bit allocation among picture groups
and within the pictures in those picture groups enables the
efficient allocation of bits, according to the relative sizes of
the picture groups. The video signal compression of the present
invention also applies compression methods that reduce noise in the
output signal, by utilizing a macroblock-based tunable conditional
replenishment technique. The conditional replenishment technique
exploits the similarities among images in the variably-sized GOPs
to further minimize output bit rate and maximize the output signal
quality. An analysis-by-synthesis method is also provided to select
a best asynchronous sampling method among multiple non-uniform
and/or uniform sampling procedures.
[0030] An Exemplary Operating Environment
[0031] FIG. 1 is a block diagram depicting an exemplary video
stream comprised of a series of video pictures. A video stream is
simply a collection of related images that have been connected in a
series to create the perception that objects in the image series
are moving. Because of the large number of separate images that are
required to produce a video stream, it is common that the series of
images will be digitized and compressed, so that the entire video
stream requires less space for transmission or storage. The process
of compressing such a digitized video stream is often referred to
as "encoding." Among other things, encoding a video stream
typically involves removing the irrelevant and/or redundant digital
data from the digitized video stream. Once the video stream has
been so compressed, a video stream must usually be decompressed
before it can be properly rendered or displayed.
[0032] The video stream 100 depicted in FIG. 1 includes six,
separate images or pictures 102-112. Typically, a video stream is
displayed to a viewer at about 30 frames per second. Therefore, the
video stream 100 depicted in FIG. 1 would provide about 0.2 seconds
of playback at the typical display rate.
[0033] Generally, there is little noticeable change from one
picture in the series to the next. If a video stream were to be
stored or transmitted without compression, large amounts of
redundant data would be stored because of the significant video
data overlap from one frame to the next. For video stream storage,
the storage of such redundant data is consumptive of memory
resources. For video stream transmission, the transmission of such
redundant data significantly increases transmission time and may be
impossible at certain data transmission rates.
[0034] Video stream compression is one means for reducing the size
of a video stream. In short, video stream compression involves the
elimination of irrelevant and/or redundant video data from the
video stream. Moreover, many compression methods store only enough
video data on a frame-by-frame basis to represent the differences
between one frame to the next. For example, many compression
methods store an intra-picture (I-Picture) that includes all or
most of the video data for a particular frame/picture in a video
stream. Subsequent pictures can be represented by predicted
pictures (P-pictures) or by bi-directionally predicted pictures
(B-pictures). P-pictures are encoded using motion-compensated
prediction from a previous I-Picture or a previous P-Picture.
B-pictures are encoded using motion-compensation prediction from
either previous or subsequent I-pictures or P-pictures. B-pictures
are not used in the prediction of other B-pictures or other
P-pictures. Accordingly, I-pictures require the most amount of
video data and can be compressed the least. P-pictures require less
video data than I-pictures and can be significantly compressed.
B-pictures require the least amount of video data and can be
compressed the most.
[0035] In the example of FIG. 1, the first picture 102 is an
I-Picture. Accordingly, much of the video data of the image of the
first picture 102 would be used to represent the first picture 102.
The second picture 104 may be a B-Picture and, thus, may be
represented in terms of video data differences with the I-Picture
102. Because the B-Picture 104 is bi-directionally predicted, it
may also be presented in terms of differences with the P-Picture
106. The P-Picture 106, in turn, is predicted in terms of
differences with the I-Picture 102. The P-Picture 106 is not
represented in terms of differences with the B-Picture 104.
[0036] Differences between video pictures are often predicted based
on calculated motion vectors. Motion vectors are well-known
mathematical representations of the movement and/or expected
movement of visual "objects" in a series of pictures in a video
stream. In order to track and predict the motion of objects,
pictures are divided into picture elements (pels). Pels may be a
video pixel or some other definable division of a picture. In any
event, object motion can be tracked by reference to corresponding
pels in a series of related video pictures.
[0037] Often, a video picture (or other digitized picture) is
encoded as a collection of blocks 116. Each block is typically an
8-by-8-square of pels. In addition, video pictures also are
commonly divided into macroblocks that usually contain 6 blocks (4
blocks for luminance and 2 blocks for chrominance signal). Those
skilled in the art will appreciate that the division of video
pictures into blocks and macroblocks is arbitrary, but helpful to
the creation of video compression standards. Moreover, the division
of pictures into such blocks enables the representation of
P-pictures and B-pictures in terms of other pictures in the video
stream. This block/macroblock-based representation facilitates
picture comparisons, based on corresponding portions of successive
pictures. As described above, this representation further
facilitates the compression of a video stream.
[0038] FIG. 2 is a flowchart depicting an exemplary method for
coding, transmitting, and decoding a video stream. One application
for which the described exemplary embodiment of the present
invention is particularly suited is that of video stream
processing. Because of the large number of separate images that are
required to produce a video stream, it is common that the series of
images will be digitized and compressed (encoded), so that the
entire video stream requires less space for transmission or
storage. Once the video stream has been so compressed, the video
stream must usually be decompressed before it can be properly
displayed. The flow chart of FIG. 2 depicts the steps that are
generally followed to encode, decode, and display a video
stream.
[0039] The method of FIG. 2 begins at start block 200 and proceeds
to step 202. At step 202, the input video stream is prepared for
encoding. Step 202 may be performed by an encoder or prior to
sending the video stream to an encoder. In any event, the video
stream can be modified to facilitate encoding. Indeed various
exemplary embodiments of the present invention are directed to
various aspects of performing this step. The following Figures and
accompanying text are drawn to describing those embodiments.
[0040] The method proceeds from step 202 to step 204. At step 204,
the input video stream is encoded. As described, the encoding
process involves, among other things, the compression of the
digitized data making up the input video stream. For the purposes
of this description, the terms "encoding" and "compression" are
used interchangeably. Once the video stream has been encoded, it
can be transmitted or stored in its compressed form. At step 206,
the encoded video bit stream is transmitted. Often this
transmission can be made over conventional broadcast
infrastructure, but could also be over broadband communication
resources and/or internet-based communication resources.
[0041] The method proceeds from step 206 to step 208. At step 208,
the received, encoded video stream is stored. As described above,
the compressed video stream is significantly smaller than the input
video stream. Accordingly, the storage of the received, encoded
video stream requires fewer memory resources than storage of the
input video stream would require. This storage step may be
performed, for example, by a computer receiving the encoded video
stream over the Internet. Those skilled in the art will appreciate
that step 208 could be performed a variety of well-known means and
could be even be eliminated from the method depicted in FIG. 2. For
example, in a real-time streaming video application, the video
stream is typically not stored prior to display.
[0042] The method proceeds from step 208 to step 210. At step 210,
the video stream is decoded. Decoding a video stream includes,
among other things, expanding (decompressing) the encoded video
stream to its original data size. That is, the encoded video stream
is expanded so that it is the same size as the input video stream.
The irrelevant and/or redundant video data that was removed in the
encoding process is replaced with new data. Various, well-known
algorithms are available for decoding an encoded video stream.
Unfortunately, these algorithms are typically unable to return the
encoded video stream to its original form without some image
degradation. Consequently, a decoded video stream is typically
filtered by a post-processing filter to reduce flaws (e.g., noise)
in the decoded video stream.
[0043] Once the video stream has been decoded, it is suitable for
displaying. The method of FIG. 2 proceeds from step 210 to step 212
and the enhanced video stream is displayed. The method then
proceeds to end block 214 and terminates.
[0044] An Exemplary Encoding System
[0045] FIG. 3 is a block diagram depicting a system for encoding a
video stream that is an exemplary embodiment of the present
invention. The encoding system 300 receives a video input signal
302 and an audio input signal 304. The video input 302 is typically
a series of digitized images that are linked together in series.
The audio input 304 is simply the audio signal that is associated
with the series of images making up the video input 302.
[0046] The video input 302 is first passed through a pre-processing
filter 306 that, among other things, filters noise from the video
input 302 to prepare the input video stream for encoding. The input
video stream is then passed to the video encoder 310. The video
encoder compresses the video signal by eliminating irrelevant
and/or redundant data from the input video signal. The video
encoder 310 may reduce the input video signal to a predetermined
size to match the transmission requirements of the encoding system
300. Alternatively, the video encoder 310 may simply be configured
to minimize the size of the encoded video signal. This
configuration might be used, for example, to maximize the storage
capacity of a storage medium (e.g., hard drive).
[0047] In a similar fashion, the audio input 304 is compressed by
the audio encoder 308. The encoded audio signal is then passed with
the encoded video signal to the video stream multiplexer 312. The
video stream multiplexer 312 combines the encoded audio signal and
the encoded video signal so that the signals can be separated and
played-back substantially simultaneously. After the encoded video
and encoded audio signals have been combined, the encoding system
outputs the combined signal as an encoded video stream 314. The
encoded video stream 314 is thus prepared for transmission,
storage, or other processing as needed by a particular application.
Often, the encoded video stream 314 will be transmitted to a
decoding system that will decode the encoding video stream 314 and
prepare it for subsequent display.
[0048] In an exemplary embodiment of the present invention, the
video input stream 302 can be further processed prior to encoding.
In addition to the pre-processing performed by the pre-processing
filter 306, the exemplary encoding system 300 can prepare the input
video stream 302 for encoding by generating a control signal for
the input video stream to facilitate compression. For example, a
rate controller 320 can be used to match the output bit rate of the
encoder to the capacity of transmission channel or storage device.
Furthermore, The rate controller 320 can be used to control the
output video quality. For efficient rate control, the exemplary
encoding system 300 includes a picture grouping module 316, a bit
allocation module 318 and a bit rate controller 320.
[0049] The picture grouping module 316 can process a video input
stream by selecting and classifying I-pictures in the video stream.
The picture grouping module 316 can also select and classify
P-pictures in the video stream. As is discussed in more detail
below, the picture grouping module 316 can significantly improve
the quality of the encoded video stream. Conventional encoding
systems arbitrarily select I-pictures, by adhering to fixed-size
picture groups. The exemplary coding system 300 can adaptively
select I-pictures to maximize the encoded video stream quality.
[0050] The bit allocation module 318 can be used to enhance the
quality of the encoded video bit stream by adaptively allocating
bits among the groups of pictures defined by the picture grouping
module 316 and by allocating bits among the pictures within a given
group of pictures. Whereas conventional decoding systems often
allocate bits in an arbitrary manner, the allocation module 318 can
reallocate bits from the picture groups requiring less video data
to picture groups requiring more video data. Consequently, the
quality of the encoded video bit stream is enhanced by improving
the quality of the groups of pictures requiring more video data for
high quality representation.
[0051] The bit rate controller 320 uses an improved method of
conditional replenishment to further reduce the presence of noise
in an encoded video bit stream. Conditional replenishment is a
well-known aspect of video data compression. In conventional
encoding systems, a picture element or a picture block will be
encoded in a particular picture if the picture element or block has
changed when compared to a previous picture. Where the picture
element or block has not changed, the encoder will typically set a
flag or send an instruction to the decoder to simply replenish the
picture element or block with the corresponding picture element or
block from the previous picture. The bit rate controller 320 of an
exemplary embodiment of the present invention instead focuses on
macroblocks and may condition the replenishment of a macroblock on
the change of one or more picture elements and/or blocks within the
macroblock. Alternatively, the bit rate controller 320 may
condition the replenishment of a macroblock on a quantification of
the change within the macroblock (e.g., the average change of each
block) meeting a certain threshold requirement. In any event, the
objective of the bit rate controller 320 is to further reduce the
presence of noise in video data and to simplify the encoding of a
video stream.
[0052] A Conventional Decoding System
[0053] FIG. 4 depicts a conventional decoding system for receiving
an encoded video stream and providing decoded video and audio
output. The decoding system 400 receives an encoded video stream
402 as input to a video stream demultiplexer 404. The video stream
demultiplexer separates the encoded video signal and the encoded
audio signal from the encoded video stream 402. The encoded video
signal is passed from the video stream demultiplexer 404 to the
video decoder 406. Similarly, the encoded audio signal is passed
from the video stream demultiplexer 404 to the audio decoder 410.
The video decoder 406 and a audio decoder 410 expand the video
signal and the audio signal to a size that is substantially
identical to the size of the video input and audio input described
above in connection with FIG. 3. Those skilled in the art will
appreciate that various well-known algorithms and processes exist
for decoding an encoded video and/or audio signal. It will also be
appreciated that most encoding and decoding processes are lossy, in
that some of the data in the original input signal is lost.
Accordingly, the video decoder 406 will reconstruct the video
signal with some signal degradation, which is often perceivable as
flaws in the output image.
[0054] The post-processing filter 408 is used to counteract noise
found in a decoded video signal that has been encoded and/or
decoded using a lossy process. Examples of well-known noise types
include mosquito noise, salt-and-pepper noise, and blockiness. The
conventional post-processing filter 408 includes well-known
algorithms to detect and counteract these and other known noise
problems. The post-processing filter 408 generates a filtered,
decoded video output 412. Similarly, the audio decoder 410
generates a decoded audio output 414. The video output 412 and the
audio output 414 may be fed to appropriate ports on a display
device, such as a television, or may be provided to some other
display means such as a software-based media playback component on
a computer. Alternatively, the video output 412 and the audio
output 414 may be stored for subsequent display.
[0055] As described above, the video decoder 406 decompresses or
expands the encoded video signal 402. While there are various
well-known methods for encoding and decoding a video signal, in all
of the methods, the decoder must be able to interpret the encoded
signal. The typical decoder is able to interpret the encoded signal
received from an encoder, as long as the encoded signal conforms to
an accepted video signal encoding standard, such as the well-known
MPEG-1 and MPEG-2 standards. In addition to raw video data, the
encoder typically encodes instructions to the decoder as to how the
raw video data should be interpreted and represented (i.e.,
displayed). For example, an encoded video stream may include
instructions that a subsequent video picture is identical to a
previous picture in a video stream. In this case, the encoded video
stream can be further compressed, because the encoder need not send
any raw video data for the subsequent video picture. When the
decoder receives the instruction, the decoder will simply represent
the subsequent picture using the same raw video data provided for
the previous picture. Those skilled in the art will appreciate that
such instructions can be provided in a variety of ways, including
setting a flag or bit within a data stream.
[0056] FIG. 5 is a block diagram depicting an exemplary selection
of picture encoding modes in a GOP. As described above in
connection with FIG. 1, the video stream can be described in terms
of I-pictures 503, B-pictures 504, and P-pictures 506. A video
stream can be represented by a series of groups of pictures (GOPs).
Each GOP begins with an I-Picture and includes one or more
P-pictures and/or B-pictures. As described above, the I-Picture
requires the most video data and is represented without reference
to any other picture in the video stream. The P-Picture 506 can be
represented in terms of differences with the I-Picture 502.
Likewise, the B-Picture 504 can be represented in terms of
differences with the I-Picture 502 and/or the P-Picture 506. In
conventional encoding methods, the size of the GOP 508 is
arbitrarily set to a specific number of pictures. Consequently,
during the encoding process, the first picture is classified as the
I-Picture and is followed by a collection of P-pictures and
B-pictures. When the predetermined number of pictures have been
collected into a GOP, a new GOP can be started. The new GOP is
started by identifying a next picture as an I-Picture.
[0057] In an exemplary embodiment of the present invention, the
size of each GOP may be variable. In one embodiment, I-Frames
coincide with scene changes in the input video stream. As is well
known in the art, a scene change can be detected by significant
changes and/or structural breakdown of motion vectors from one
picture to the next. Once a scene change has been detected, the
picture following the scene change (i.e., first picture of the new
scene) may be classified as an I-Picture.
[0058] FIG. 6 is a block diagram depicting an exemplary timeline
comparing the occurrence of scene changes in a video stream with
alternative GOP size formats. The video stream 600 is represented
as a series of four scenes. Scene changes occur at times 608, 610,
and 612. In a conventional encoding system, the GOP is set at a
constant number of frames, as depicted by GOP series 604. Notably,
the I-Frames in GOP format 604 occur at times 616, 618, 620, and
622. None of these times correspond with the times of the scene
changes in the video stream 600.
[0059] The variable GOP format 602 is an exemplary embodiment of
the present invention. Typically, the I-Frames of the variable GOP
format coincide with the scene changes in the video stream 600.
However, where a scene is sufficiently long, the variable GOP
format 602 will default to a constant GOP size and insert an
I-Picture as needed, as shown at time 606. Consequently, some GOPs
of the variable GOP format 602 will be longer than the typical size
of constant GOP format 604. Other GOPs of the variable GOP format
602 (e.g., GOP 614) will be significantly longer than the typical
size of the constant GOP format 604.
[0060] A major objective of the variable GOP format 602 of an
exemplary embodiment of the present invention is to coincide
I-pictures and scene changes. Because both I-pictures and scene
changes require the most amount of video data storage, the
coincidence of these frames reduces the amount of data required to
represent and encoded video stream. Another major objective of the
variable GOP format 602 of an exemplary embodiment of the present
invention is to maximize the benefit of novel adaptive bit
allocation and conditional replenishment methods that are described
in more detail in connection with FIGS. 8-12.
[0061] An Exemplary Method for Generating Variably-sized Groups of
Pictures
[0062] FIG. 7 is a flowchart depicting an exemplary method for
creating GOPs of varying sizes. The method begins at start block
700 and proceeds to step 702. At step 702, the first GOP is created
and a first picture from an input video stream is retrieved. The
method proceeds to step 704, wherein the first picture is
classified as the I-Picture and is added to the first GOP.
[0063] The method proceeds from step 704 to decision block 706. At
decision block 706, a determination is made as to whether more
pictures exist in the input video stream. If a determination is
made that more pictures exist in the video stream, the method
branches to step 710. If, on the other hand, a determination is
made that no more pictures exist in the video stream, the method
branches to end block 708 and terminates.
[0064] At step 710, the next picture from the video stream is
retrieved. The method then proceeds to decision block 712. At
decision block 712, a determination is made as to whether the
predefined GOP picture limit has been reached. As described above
in connection with FIG. 6, in the case where a scene is longer than
the predefined GOP size, the method will created a new GOP rather
than allow the variable GOP to reach an indefinite size. If the
predefined GOP picture limit has been reached, the method branches
to step 716 and a new GOP is started. If, on the other hand, the
standard GOP picture limit has not been reached, the method
branches to decision block 714.
[0065] At decision block 714, a determination is made as to whether
a scene change has been reached in the video stream. As described
above, a scene change can be detected by various well-known means.
If a scene change has been detected, the method branches to step
716 and new GOP is started. If, on the other hand, a scene change
has not been reached, the method branches to step 718 and the
retrieved picture is added to the current GOP. The method proceeds
from step 718 to decision block 706 and proceeds as described
above.
[0066] Accordingly, pictures from an input video stream are added
to a GOP until either a scene change occurs or the predefined GOP
size is reached. Exemplary GOP sizes range from a minimum of 15
frames to a maximum 60 frames. Those skilled in the art will
appreciate that GOPs of widely varying sizes could be used within
the scope of the present invention. As described above, the
objective of the exemplary method is to coincide scene changes and
I-Frames so as to minimize the number I-Frames and scene change
frames stored in an encoded video stream.
[0067] FIG. 8 is a graph depicting a typical relationship between
the bits generated by a conventional compression method and a
conventional group of pictures. The graph 800 is divided into three
groups of pictures (GOPs) 802, 804, 806. Each GOP 802, 804, 806
begins with an I-picture 808, 810, 812. As described above, most
conventional compression methods remove irrelevant, redundant,
and/or expendable bits from a video stream. This is done by
removing as much video data as possible from each picture in an
input video stream. In addition, conventional compression methods
encode pictures such that the content of the encoded pictures can
be predicted from previous and/or subsequent pictures and the
encoded video stream. Accordingly, much of the video data for such
predictable pictures can be eliminated from the encoded video
stream, thereby further reducing the size of (i.e., further
compressing) the encoded video stream. I-pictures 808, 810, 812,
however, are used to predict the video data content of other
pictures (e.g., B-pictures, P-pictures) and typically contain more
video data than other pictures in an encoded video stream.
[0068] Referring again to FIG. 8, it is apparent that for the
I-pictures 808, 810, 812 more bits are generated during the
compression process than for non-I-pictures 814, 816, 818. As
described above, conventional compression methods select pictures
in an input video stream as I-pictures in an arbitrary fashion,
based primarily on the number of pictures in a particular GOP. In
an exemplary embodiment of the present invention, I-pictures 808,
810, 812 can be selected to coincide with scene changes. Typically,
scene-change pictures and I-pictures require the compression
process to generate more bits than for non-scene change pictures or
for non-I-pictures. By classifying scene-change pictures as
I-pictures, an exemplary embodiment of the present invention
reduces the overall number of bits generated by the compression
process. Because a large number of bits must be stored with an
I-picture, regardless of the picture content, classifying
scene-change pictures as I-pictures simply capitalizes on this
feature to reduce the overall number of bits generated by the
compression process.
[0069] FIG. 9 is a series of block diagrams and graphs comparing
the generated bit graph of a conventional compression method with a
generated bit graph of an exemplary embodiment of the present
invention. An input video stream is represented as a block diagram
900 divided into scenes. As described above, a conventional
compression method divides groups of pictures on a fixed bases
(i.e., the same number of pictures per group). A fixed-sized GOP
structure is depicted as a block diagram 904. As described in
connection with FIG. 8, each GOP begins with an I-picture 910-916.
The fixed GOP Graph 908 has generated bit peaks that coincide with
the I-frames 910-916 of each of the fixed-sized GOPs in the block
diagram 904. In addition, the fixed-sized GOP graph 908 also
includes peaks coinciding with the scene changes between Scene 1
and Scene 2, between Scene 2 and Scene 3, and between Scene 3 and
Scene 4. Accordingly, the conventional, fixed-size GOP compression
method generates output bit peaks for both I-pictures and
scene-change pictures. Therefore, the bit budget for the remaining
P-pictures and B-pictures is decreased. The encoding quality of the
remaining P-pictures and B-pictures is, therefore, compromised or
degraded.
[0070] The variable size GOP graph 906, on the other hand, depicts
output bit peaks coinciding primarily with scene changes in the
input video stream 900. Accordingly, the variable-sized GOP
compression method of an exemplary embodiment of the present
invention reduces the number of output bit peaks in the encoded
video stream. More specifically, the variable-sized GOP compression
method minimizes the number of double output bit peaks. These
double peaks are present in the fixed-sized GOP graph 908 and are
created when scene changes occur within a GOP, instead of
coinciding with an I-picture of the GOP. As a result, the overall
number of output bits generated by the fixed-sized GOP compression
method is greater than the overall number of bits generated by the
variable-sized GOP compression method of an exemplary embodiment of
the present invention.
[0071] Accordingly, the exemplary compression method results in a
smaller number of generated compression bits. This advantage
provides various benefits to an encoding/decoding process. First,
the resultant, smaller encoded video stream can be stored and/or
transmitted in its smaller state, thereby conserving system
resources. Alternatively, the encoding quality can be improved by
re-allocating bits from smaller GOPs to larger GOPs. This is
referred to as adaptive bit allocation, because the bit allocated
to a given GOP can be adapted to the GOP size, which varies
depending on the scene changes in the input video stream. This
benefit is described in more detail in connection with FIG. 10.
[0072] Exemplary Methods for Adaptive Bit Allocation
[0073] FIG. 10a is a flow chart depicting an exemplary method for
adaptively allocating bits among variable-sized groups of pictures
(GOPs). In an exemplary embodiment of the present invention, bits
can be allocated among the variable-sized GOPs. In addition, bits
may be allocated among the pictures within a single GOP. These
methods may be utilized individually or in concert to maximize the
image quality of a compressed video stream and of the pictures
within a GOP, while benefiting from the enhanced compression
processes of exemplary embodiments of the present invention.
[0074] The method of FIG. 10a begins at start block 1000 and
proceeds to step 1002. At step 1002, the target bit number of a
first GOP is determined. This step may be performed prior to
encoding a GOP. For example, after an input stream has been
segregated into GOPs, the GOPs may be stored in a buffer. Because
the GOPs in the buffer may have different sizes (i.e., contain
variable numbers of pictures), they also may have different numbers
of bits allocated thereto. The method of FIG. 10a provides a means
for adaptively allocating bits among GOPs, depending on the
relative sizes of the GOPs.
[0075] The method proceeds from step 1002 to step 1004. At step
1004, the number of bits actually generated for the pictures in the
GOP is determined. The method proceeds from step 1004 to decision
block 1006. At decision block 1006, a determination is made as
whether the bit size of the first GOP is less than the target bit
number. If the GOP bit size is less than the target bit number, the
method branches to step 1010. If, on the other hand, the GOP size
is not less than the target bit number, the method branches to end
block 1016 and terminates.
[0076] At step 1010 the size and target bit number of a second GOP
is determined. The method proceeds from step 1010 to step 1014. At
step 1014, bits from the first GOP are allocated to the second GOP.
That is, bits that would otherwise be assigned to the first GOP are
reassigned to the second GOP, so that the quality of the second GOP
is enhanced. As described above, the picture quality of the encoded
video stream is directly related to the bit rate of the encoded
video stream. Accordingly, by reallocating bits between GOPs in a
video stream, an exemplary embodiment of the present invention can
maximize the quality of the GOPs having bit sizes larger than the
target size, while retaining the picture quality of GOPs having bit
sizes less than the target bit size. Conventional encoding methods
cap the bit size of any given GOP at the target bit size. Thus, for
GOPs having a larger bit size, the picture quality is reduced as
compared to those GOPs having smaller bit sizes.
[0077] FIG. 10b is a flow chart depicting an exemplary method for
adaptively allocating bits among pictures within a GOP. In this
embodiment of the present invention, bits can be adaptively
allocated between pictures within a GOP. For a GOP containing
N-frames, N-1 bit values can be allocated to the non-I-picture
frames. The bit allocation can be based on a per-picture target bit
size. The bits may be allocated using the Root Mean Square (RMS) of
the difference between the successive frames. Preferably, the
amount of bit allocation for the i.sup.th picture in a GOP can be
calculated as follows: 1 T p ( i ) = R .times. R M S ( i ) l = 1 N
- 1 R M S ( l )
[0078] where T.sub.p.sup.(i) represents the target bit rate for a
current picture, R represents the target bit rate for the remaining
pictures in the GOP and RMS(i) represents the RMS value of the
difference between i.sup.th picture and i-l.sup.th picture in the
GOP. After encoding each picture in the GOP, the target bit rate
for the remaining pictures in the GOP (R) can be updated by
subtracting the number of actually generated bits for each picture.
When the number of bits that have actually been generated for all
of the pictures in the GOP is less than the target bit rate, then
the bits may be made available for allocation to pictures in other
GOPs. In this embodiment of the present invention, bits can be
allocated on a picture-by-picture basis within a GOP, so as to
maximize the picture quality on a picture-by-picture basis.
[0079] Turning now to FIG. 10b, an exemplary method is depicted,
wherein bits are adaptively allocated among the pictures in a GOP.
The method of FIG. 10b may be implemented at the time that the
picture size (i.e., number of pictures) for a subject (current) GOP
has been defined, for example, by the Picture Grouping Module 316
described in connection with FIG. 3. The method begins at start
block 1050 and proceeds to step 1052. At step 1052, the size of the
GOP is determined. This step may be performed by the Picture
Grouping Module 316 or the pictures in the GOP may simply be
re-counted. The method then proceeds to step 1054, wherein the
target bit number for the current GOP is determined. Typically, a
compression process is implemented for a particular application
wherein an overall bit rate is predetermined. Those skilled in the
art will appreciate that this overall bit rate may be used to
determine a bit rate on a per-picture basis.
[0080] The method proceeds from step 1054 to step 1056. At step
1056, the Root Mean Square (RMS) of the difference between a
current picture and a previous picture is determined. Initially,
the current picture will be the first picture in the GOP. This step
can be performed using the formula described above. The method then
proceeds to step 1058, wherein the appropriate number of bits is
actually allocated to the current picture. The method then proceeds
to decision block 1060, wherein a determination is made as to
whether all of the pictures in the GOP have been encoded. If a
determination is made that all of the pictures in the GOP have been
encoded, the method branches to decision block 1062. If, on the
other hand, a determination is made that all of the pictures in the
GOP have not been encoded, the method branches to step 1068.
[0081] At step 1068, the current picture is incremented. That is,
the next picture in the GOP is identified for bit allocation
consideration. The method then proceeds to step 1056 and proceeds
as described above. Returning now to decision block 1062, a
determination is made as to whether the number of bits actually
generated by encoding all of the pictures in the GOP is less than
the target bit total for all of the pictures in the GOP. If the
number of bits actually generated by encoding the pictures in the
GOP is not less than the target bit total for all of the pictures
in the GOP, then the method branches to end block 1066 and
terminates. If, on the other hand, the number of bits actually
allocated to the pictures in the GOP is less than the target bit
total for all of the pictures in the GOP, then the method branches
to step 1064. At step 1064, the remaining bits (not allocated) are
made available to the next GOP (or some other subsequently
processed GOP) to be considered for bit allocation. The method
proceeds from step 1064 to end block 1066 and terminates.
[0082] Accordingly, the method efficiently allocates bits among
pictures within a GOP. Where a surplus of bits exists, the method
can make those bits available for subsequent GOPs, for which such a
surplus does not exist. Because the GOP size is variable in
accordance with exemplary embodiments of the present invention,
this bit allocation method capitalizes on bit surpluses that are
created by using variable GOP sizes. The described bit allocation
methods can be used to significantly improve the output quality of
an encoding system by efficiently using bits that might otherwise
be imprudently allocated.
[0083] An Exemplary Method of Conditional Replenishment
[0084] Conditional replenishment is a well-known aspect of
conventional compression methods. Generally conditional
replenishment refers to the elimination of redundant video data in
a condition wherein video data remains unchanged between successive
pictures in a GOP. More specifically, conditional replenishment is
a method of "re-using" (i.e., replenishing) previously encoded
video data to populate an area of a video image that is unchanged
from a previous video image. When possible, such replenishment
reduces the amount of new video data that must be encoded,
therefore reducing the output bit rate and increasing output bit
quality.
[0085] Because successive pictures within an exemplary
variable-sized GOP are typically members of the same scene in an
input video stream, the opportunity for conditional replenishment
is increased with a given GOP. Accordingly, the scene-oriented GOP
sizing of exemplary embodiments of the present invention enhance
the performance of conventional replenishment methods. In addition,
because of the similarity between successive pictures in a given
GOP, a novel variation of conditional replenishment is applied in
an exemplary embodiment of the present invention to further enhance
video stream compression.
[0086] FIG. 11 is a simplified illustration depicting successive
pictures in an exemplary GOP divided into macroblocks. Picture 1100
is divided into macroblocks 1102-1114. Likewise, picture 1150 is
divided into macroblocks 1152-1164. Although the image in picture
1100 is different than the image in picture 1150, only certain
macroblocks are different. Specifically, macroblocks 1102-1110 of
picture 1100 are different than macroblocks 1152-1160 of picture
1150. On the other hand macroblocks 1112-1114 of picture 1100 are
identical to macroblocks 1162-1164 of picture 1150. Accordingly,
picture 1150 may be represented (i.e., encoded) as being identical
to picture 1100, except for changes to macroblocks 1152-1160.
[0087] When it is determined that a difference exists between
corresponding coded pixels in the macroblock, the differences can
be stored or transmitted in connection with the corresponding
picture. If, on the other hand, it is determined that no difference
exists between corresponding coded pixels, then a flag can be set
to indicate (or other instruction provided) that the pixel from the
previous picture can be used, thereby eliminating a need to store
additional information for the successive picture graph.
[0088] In conventional conditional replenishment, the replenishment
condition is determined by examining the results of the encoding
process. If the encoding results (quantized DCT coefficients) are
exactly same between the macroblocks of current frame and previous
frame, replenishment is used. In an exemplary embodiment of the
present invention, on the other hand, conditional replenishment is
performed intelligently by the encoder, based on a calculation of
relevant criteria. Accordingly, if the encoder does not detect a
replenishment condition, any change detected between corresponding
macroblocks in successive pictures may be stored or transmitted. On
the other hand, when the encoder detects a replenishment condition,
then an instruction and/or flag can be used to indicate that the
macroblock should be replenished using the video data from the
previous picture.
[0089] Advantageously, conditional replenishment on a macroblock
basis enables noise reduction in an encoded video stream. When an
encoded video stream is decoded, noise is commonly detectable in a
displayed video stream as a flickering or otherwise perceivable
image. Often, such noise is more perceivable when it occurs in a
background region (i.e., a region of substantially constant image
intensity). In an exemplary embodiment of the present invention,
conditional replenishment is processed on a macroblock basis,
utilizing 2-part criteria and selectable thresholds for modifying
the criterion . As a result, slight differences resulting from
noise in a particular macroblock can be muted (i.e., filtered). The
first criterion can be used to determine the differences between an
original macroblock and a previous macroblock. This criterion, C1,
is given by the expression: 2 C 1 = 1 256 i = 1 16 j = 1 16 ( org (
i , j ) - prev ( i , j ) ) 2
[0090] where org(i,j) represents the i.sup.th and j.sup.th pixel of
the original (subject) macroblock and prev(i,j) represents the
i.sup.th and j.sup.th pixel of original macroblock of the previous
frame.
[0091] The second criterion, may be used to evaluate the effect of
the decoder, by reference to the original macroblock. The second
criterion, C2, is given by the expression: 3 C 2 = 1 256 i = 1 16 j
= 1 16 ( org ( i , j ) - coded ( i , j ) ) 2
[0092] where org(i,j) represents the i.sup.th and j.sup.th pixel of
the original (subject) macroblock and coded(i,j) represents the
i.sup.th and j.sup.th pixel of the decoded macroblock of the
previous frame. Criterion 1 is the measurement of similarity of the
corresponding macroblocks of the current frame and the previous
frame. Criterion 2 is for double check of the similarity with the
decoded macroblock.
[0093] In addition, threshold values may be selected for the two
criteria, to set the sensitivity of the conditional replenishment
process. Alternatively, the threshold may be automatically set such
that it is adaptive to a particular bit rate. The following table
provides an exemplary relationship between bit rate and Criterion 1
(C1) threshold values.
1 BIT RATE THRESHOLD 1 greater than 400 k 8 300 k-400 k 11 200
k-300 k 13 110 k-200 k 14 less than 100 k 15
[0094] Similarly, the threshold value for Criterion 2 may be set
manually or automatically (an exemplary value for Threshold 2 is
8). By applying the 2-part criteria in conjunction with the
threshold values, the macroblock-based conditional replenishment
method of an exemplary embodiment of present invention can be used
and fine-tuned to reduce noise in a displayed video stream.
[0095] FIG. 12 is a flowchart depicting an exemplary method for
performing conditional replenishment on a macroblock-basis. The
method of FIG. 12 begins at start block 1200 and proceeds to step
1202, wherein a first macroblock is compared to a second
macroblock. The method then proceeds to decision block 1204,
wherein a determination is made as to whether Criterion 1 (C1) is
less than Threshold 1. If at decision block 1204, a determination
is made that Criterion 1 is not less than Threshold 1, the method
branches to step 1210. At step 1210, a flag can be set for an
instruction providing that the second macroblock should be encoded
using the data from the first macroblock, rather than simply
replenished. The method proceeds from 1210 to end block 1212 and
terminates.
[0096] Returning now to decision block 1204, if a determination is
made that the Criterion 1 is less than Threshold 1, the method
branches to decision block 1206. At decision block 1206 a
determination is made as to whether Criterion 2 is less than
Threshold 2. If a determination is made at decision block 1206 that
Criterion 2 is not less than the Threshold 2, the method branches
to step 1210 and proceeds as described above. If on the other hand,
a determination is made at decision block 1206 that Criterion 2 is
less than Threshold 2, the method branches to step 1208. At step
1208 the replenishment flag is set for the second macroblock. The
method proceeds from step 1208 to step 1212 and ends.
[0097] Accordingly, the method of FIG. 12 can be used to utilize
selectable criteria to reduce the encoding, decoding and display of
noise. The replenishment of an exemplary embodiment of the present
invention, thus, can be used to filter noise from a displayed video
stream. Those skilled in the art will appreciate that various
criteria and/threshold values may be used within the scope of the
described embodiments of the present invention.
[0098] An Exemplary Method for Selecting an Asynchronous Sampling
Technique
[0099] To maximize the quality of compressed video at a low bit
rate (e.g., less than 128 kbps), it may be useful to sample the
video at optimum points in time and space. Sampling is roughly
defined as the determination of which pictures in a video stream
will be encoded as I-pictures, B-pictures, and P-pictures.
Generally, optimum sampling can be non-uniform (asynchronous) in
one or both of the space and time domains. Various asynchronous
techniques are well known to those skilled in the art and can be
used to implement various embodiments of the present invention. In
an exemplary embodiment of the present invention, an
analysis-by-synthesis method of selecting an asynchronous sampling
technique is provided. In the exemplary analysis-by-synthesis
method, separately encoded candidate streams are generated using
various sampling methods. Once generated, the separate candidate
streams can be compared on virtually any basis to determine, for
example, which has the best bit rate and signal quality
characteristics. The best candidate stream can be selected and
designated as the output video stream. The selected sampling method
can be identified to the receiver (decoder) with a small overhead.
For example, by using a codebook or dictionary of 16 possible
sampling techniques, only 4 bits of overhead are needed to signify
the selection. The codebook could be either predetermined or
generated adaptively (and automatically) over time, based on
criteria including extrapolation from a recent history of optimum
sampling.
[0100] FIG. 13 is a flowchart depicting an exemplary method for
generating and selecting between two sampling methods. Those
skilled in the art will appreciate that any number of sampling
methods could be used and evaluated within the scope of the present
invention. It also will be appreciated that the generation of
multiple candidate streams creates overhead as described above, and
that the exemplary sampling selection method may be more easily
applied to one-way communications (e.g., video streaming), than to
two-way communications (video teleconferencing).
[0101] The method of FIG. 13 begins at start block 1300 and
proceeds to step 1302. At step 1302, a first input video stream is
encoded using a first sampling technique. The method then proceeds
to step 1304. At step 1304, a second input stream is encoded using
a second sampling technique. The method then proceeds to step 1306,
wherein the encoded candidate video streams are compared. This
comparison could be based on various characteristics of the
candidate video streams. However, it is preferable that the
characteristics are perceptually meaningful characteristics. An
exemplary characteristic is the signal-to-noise-ratio of each
encoded candidate video stream, as compared to the original
uncompressed signal.
[0102] The method proceeds from step 1306 to decision block 1308.
At decision block 1308, a determination is made as to whether the
signal-to-noise-ratio (SNR) for the first stream is higher than the
SNR for the second stream. If the SNR for the first stream is
better than the SNR for the second stream, then the method branches
to step 1310. At step 1310, the first stream is output. Returning
to decision block 1308, if the SNR for the second stream is better
than the SNR for the first stream, then the method branches to step
1312. At step 1312, the second stream is output. Accordingly, the
encoded candidate streams having been encoded using different
sampling techniques are compared and the best stream is output, for
example, from an encoding system, together with the overhead
information that signifies the corresponding sampling method.
[0103] Although the present invention has been described in
connection with various exemplary embodiments, those of ordinary
skill in the art will understand that many modifications can be
made thereto within the scope of the claims that follow.
Accordingly, it is not intended that the scope of the invention in
any way be limited by the above description, but instead be
determined entirely by reference to the claims that follow.
* * * * *