U.S. patent application number 09/163655 was filed with the patent office on 2001-08-23 for multiple description transform coding of images using optimal transforms of arbitrary dimension.
Invention is credited to GOYAL, VIVEK K., KOVACEVIC, JELENA, VETTERLI, MARTIN.
Application Number | 20010016080 09/163655 |
Document ID | / |
Family ID | 46256111 |
Filed Date | 2001-08-23 |
United States Patent
Application |
20010016080 |
Kind Code |
A1 |
GOYAL, VIVEK K. ; et
al. |
August 23, 2001 |
MULTIPLE DESCRIPTION TRANSFORM CODING OF IMAGES USING OPTIMAL
TRANSFORMS OF ARBITRARY DIMENSION
Abstract
A multiple description (MD) joint source-channel (JSC) encoder
in accordance with the invention encodes n components of an image
signal for transmission over m channels of a communication medium.
In an illustrative embodiment which uses statistical redundancy
between the different descriptions of the image signal, the encoder
forms vectors from transform coefficients of the image signal
separated both in frequency and in space. The vectors may be formed
such that the spatial separation between the transform coefficients
is maximized. A correlating transform is then applied, followed by
entropy coding, grouping as a function of frequency, and
application of a cascade transform. In an illustrative embodiment
which uses deterministic redundancy between the different
descriptions of the image signal, the encoder may apply a linear
transform, followed by quantization, to generate the multiple
descriptions of the image signal. For example, vectors may be
formed from transform coefficients of the image signal so as to
include coefficients of like frequency separated in space. The
vectors are expanded by multiplication with a frame operator, and
then quantized using a step size which may be a function of
frequency.
Inventors: |
GOYAL, VIVEK K.; (HUDSON
COUNTY, NJ) ; KOVACEVIC, JELENA; (MANHATTAN COUNTY,
NJ) ; VETTERLI, MARTIN; (GRANDVAUX, SE) |
Correspondence
Address: |
RYAN & MASON
90 FOREST AVENUE
LOCUST VALLEY
NY
11560
|
Family ID: |
46256111 |
Appl. No.: |
09/163655 |
Filed: |
September 30, 1998 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
09163655 |
Sep 30, 1998 |
|
|
|
09030488 |
Feb 25, 1998 |
|
|
|
Current U.S.
Class: |
382/251 |
Current CPC
Class: |
H04S 1/00 20130101 |
Class at
Publication: |
382/251 |
International
Class: |
G06K 009/36; G06K
009/38; G06K 009/46 |
Claims
What is claimed is:
1. A method of processing an image signal for transmission,
comprising the steps of: encoding a plurality of components of the
image signal in a multiple description joint source-channel encoder
for transmission over a plurality of channels, wherein the encoding
step includes forming vectors from coefficients of the image signal
such that the coefficients associated with a given one of the
vectors are separated in at least one of frequency and space; and
transmitting the encoded components of the image signal.
2. The method of claim 1 wherein the image signal comprises one or
more vectors having uncorrelated components.
3. The method of claim 1 wherein the encoding step includes
generating a multiple description representation of the image
signal with statistical redundancy between the different
descriptions.
4. The method of claim 1 wherein the encoding step includes forming
vectors from transform coefficients of the image signal separated
both in frequency and in space.
5. The method of claim 4 wherein the vectors are formed such that
spatial separation between the transform coefficients in at least a
subset of the vectors is maximized.
6. The method of claim 4 wherein the encoding step further includes
the steps of: computing a transform of the image; quantizing
coefficients of the resulting transform; forming vectors of
transform coefficients separated in frequency and space; applying
correlating transforms to at least a subset of the vectors;
applying entropy coding to the transformed vectors; grouping the
coded vectors as a function of frequency; and applying a cascade
transform to at least a subset of the resulting groups.
7. The method of claim 1 wherein the encoding step includes
generating a multiple description representation of the image
signal with deterministic redundancy between the different
descriptions.
8. The method of claim 1 wherein the encoding step includes
applying a linear transform, followed by quantization, to generate
multiple descriptions of the image signal.
9. The method of claim 8 wherein the encoding step further includes
the steps of: computing a transform of the image signal; forming
vectors from coefficients of the resulting transform, wherein each
vector includes coefficients of like frequency, separated in space;
expanding the vectors by multiplication with a frame operator; and
quantizing the expanded vectors using a quantization step size
which is a function of frequency.
10. The method of claim 1 wherein the encoding step includes
encoding n components of the image signal for transmission over m
channels using a transform which is in the form of a cascade
structure of a plurality of transforms each having dimension less
than n.times.m.
11. An apparatus for encoding an image signal for transmission,
comprising: a multiple description joint source-channel encoder for
encoding a plurality of components of the image signal for
transmission over a plurality of channels, wherein the encoder
forms vectors from coefficients of the image signal such that the
coefficients associated with a given one of the vectors are
separated in at least one of frequency and space.
12. The apparatus of claim 11 wherein the image signal comprises
one or more vectors having uncorrelated components.
13. The apparatus of claim 11 wherein the encoder generates a
multiple description representation of the image signal with
statistical redundancy between the different descriptions.
14. The apparatus of claim 11 wherein the encoder forms vectors
from transform coefficients of the image signal separated both in
frequency and in space.
15. The apparatus of claim 14 wherein the vectors are formed such
that spatial separation between the transform coefficients in at
least a subset of the vectors is maximized.
16. The apparatus of claim 14 wherein the encoder is further
operative to compute a transform of the image; to quantize
coefficients of the resulting transform; to form vectors of
transform coefficients separated in frequency and space; to apply
correlating transforms to at least a subset of the vectors; to
apply entropy coding to the transformed vectors; to group the coded
vectors as a function of frequency; and to apply a cascade
transform to at least a subset of the resulting groups.
17. The apparatus of claim 11 wherein the encoder generates a
multiple description representation of the image signal with
deterministic redundancy between different descriptions.
18. The apparatus of claim 11 wherein the encoder applies a linear
transform, followed by quantization, to generate the multiple
descriptions of the image signal.
19. The apparatus of claim 18 wherein the encoder is further
operative to compute a transform of the image signal; to form
vectors from coefficients of the resulting transform, wherein each
vector includes coefficients of like frequency, separated in space;
to expand the vectors by multiplication with a frame operator; and
to quantize the expanded vectors using a quantization step size
which is a function of frequency.
20. The apparatus of claim 11 wherein the multiple description
joint source-channel encoder is operative to encode n components of
the signal for transmission over m channels using a transform which
is in the form of a cascade structure of a plurality of transforms
each having dimension less than n.times.m.
21. The apparatus of claim 11 wherein the multiple description
joint source-channel encoder further includes a series combination
of N multiple description encoders followed by an entropy coder,
wherein each of the N multiple description encoders includes a
parallel arrangement of M multiple description encoders.
22. The apparatus of claim 21 wherein each of the M multiple
description encoders implements one of: (i) a quantizer block
followed by a transform block, (ii) a transform block followed by a
quantizer block, (iii) a quantizer block with no transform block,
and (iv) an identity function.
Description
RELATED APPLICATION
[0001] The present application is a continuation-in-part of U.S.
patent application Ser. No. 09/030,488 filed Feb. 25, 1998 in the
name of inventors Vivek K. Goyal and Jelena Kovacevic and entitled
"Multiple Description Transform Coding Using Optimal Transforms of
Arbitrary Dimension."
FIELD OF THE INVENTION
[0002] The present invention relates generally to multiple
description transform coding (MDTC) of signals for transmission
over a network or other type of communication medium, and more
particularly to MDTC of images.
BACKGROUND OF THE INVENTION
[0003] Multiple description transform coding (MDTC) is a type of
joint source-channel coding (JSC) designed for transmission
channels which are subject to failure or "erasure." The objective
of MDTC is to ensure that a decoder which receives an arbitrary
subset of the channels can produce a useful reconstruction of the
original signal. One type of MDTC introduces correlation between
transmitted coefficients in a known, controlled manner so that lost
coefficients can be statistically estimated from received
coefficients. This correlation is used at the decoder at the
coefficient level, as opposed to the bit level, so it is
fundamentally different than techniques that use information about
the transmitted data to produce likelihood information for the
channel decoder. The latter is a common element in other types of
JSC coding systems, as shown, for example, in P. G. Sherwood and K.
Zeger, "Error Protection of Wavelet Coded Images Using Residual
Source Redundancy," Proc. of the 31st Asilomar Conference on
Signals, Systems and Computers, November 1997. Other types of MDTC
may be based on techniques such as frame expansions, as described
in V. K. Goyal et al., "Multiple Description Transform Coding:
Robustness to Erasures Using Tight Frame Expansions," In Proc. IEEE
Int. Symp. Inform. Theory, August 1998.
[0004] A known MDTC technique for coding pairs of independent
Gaussian random variables is described in M. T. Orchard et al.,
"Redundancy Rate-Distortion Analysis of Multiple Description Coding
Using Pairwise Correlating Transforms," Proc. IEEE Int. Conf. Image
Proc., Santa Barbara, Calif., October 1997. This MDTC technique
provides optimal 2.times.2 transforms for coding pairs of signals
for transmission over two channels. However, this technique as well
as other conventional techniques fail to provide optimal
generalized n.times.m transforms for coding any n signal components
for transmission over any m channels. In addition, conventional
transforms such as those in the M. T. Orchard et al. reference fail
to provide a sufficient number of degrees of freedom, and are
therefore unduly limited in terms of design flexibility. Moreover,
the optimality of the 2.times.2 transforms in the M. T. Orchard et
al. reference requires that the channel failures be independent and
have equal probabilities. The conventional techniques thus
generally do not provide optimal transforms for applications in
which, for example, channel failures either are dependent or have
unequal probabilities, or both. These and other drawbacks of
conventional MDTC prevent its effective implementation in many
important applications.
SUMMARY OF THE INVENTION
[0005] The invention provides MDTC techniques which can be used to
implement optimal or near-optimal n.times.m transforms for coding
any number n of signal components for transmission over any number
m of channels. A multiple description (MD) joint source-channel
(JSC) encoder in accordance with an illustrative embodiment of the
invention encodes n components of an image signal for transmission
over m channels of a communication medium, in applications in which
at least one of n and m may be greater than two, and in which the
failure probabilities of the m channels may be non-independent and
non-equivalent.
[0006] In accordance with one aspect of the invention, the MD JSC
encoder may be configured to provide statistical redundancy between
different descriptions of the image signal. For example, the
encoder may form vectors from discrete cosine transform (DCT)
coefficients of the image signal separated both in frequency and in
space. The vectors may be formed such that the spatial separation
between the DCT coefficients is maximized. A correlating transform
is applied to the resulting vectors, followed by entropy coding,
grouping of the coded vectors as a function of frequency, and
application of a cascade transform to each of the groups, in order
to generate the multiple descriptions of the image signal.
[0007] In accordance with another aspect of the invention, the MD
JSC encoder may be configured to provide deterministic redundancy
between different descriptions of the image signal. For example,
the encoder may form vectors from DCT coefficients of the image
signal so as to include coefficients of like frequency separated in
space. The vectors are expanded by multiplication with a frame
operator, and then quantized using a step size which may be a
function of frequency, in order to generate the multiple
descriptions of the image signal. In both the statistical
redundancy and deterministic redundancy embodiments noted above,
other types of linear transforms may be used in place of the
DCT.
[0008] An MD JSC encoder in accordance with the invention may
include a series combination of N "macro" MD encoders followed by
an entropy coder, and each of the N macro MD encoders includes a
parallel arrangement of M "micro" MD encoders. Each of the M micro
MD encoders implements one of: (i) a quantizer block followed by a
transform block, (ii) a transform block followed by a quantizer
block, (iii) a quantizer block with no transform block, and (iv) an
identity function. In addition, a given n.times.m transform
implemented by the MD JSC encoder may be in the form of a cascade
structure of several transforms each having dimension less than
n.times.m. This general MD JSC encoder structure allows the encoder
to implement any desired n.times.m transform while also minimizing
design complexity.
[0009] The MDTC techniques of the invention do not require
independent or equivalent channel failure probabilities. As a
result, the invention allows MDTC to be implemented effectively in
a much wider range of applications than has heretofore been
possible using conventional techniques. The MDTC techniques of the
invention are suitable for use in conjunction with signal
transmission over many different types of channels, including, for
example, lossy packet networks such as the Internet, wireless
networks, and broadband ATM networks.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] FIG. 1 shows an exemplary communication system in accordance
with the invention.
[0011] FIG. 2 shows a multiple description (MD) joint
source-channel (JSC) encoder in accordance with the invention.
[0012] FIG. 3 shows an exemplary macro MD encoder for use in the MD
JSC encoder of FIG. 2.
[0013] FIG. 4 shows an entropy encoder for use in the MD JSC
encoder of FIG. 2.
[0014] FIGS. 5A through 5D show exemplary micro MD encoders for use
in the macro MD encoder of FIG. 3.
[0015] FIGS. 6A, 6B and 6C show respective audio encoder, image
encoder and video encoder embodiments of the invention, each
including the MD JSC encoder of FIG. 2.
[0016] FIG. 7 illustrates an exemplary 4.times.4 cascade structure
which may be used in an MD JSC encoder in accordance with the
invention.
[0017] FIGS. 8 and 9 are flow diagrams illustrating exemplary image
encoding processes in accordance with the invention.
DETAILED DESCRIPTION OF THE INVENTION
[0018] The invention will be illustrated below in conjunction with
exemplary MDTC systems. The techniques described may be applied to
transmission of a wide variety of different types of signals,
including data signals, speech signals, audio signals, image
signals, and video signals, in either compressed or uncompressed
formats. The term "channel" as used herein refers generally to any
type of communication medium for conveying a portion of an encoded
signal, and is intended to include a packet or a group of packets.
The term "packet" is intended to include any portion of an encoded
signal suitable for transmission as a unit over a network or other
type of communication medium. The term "linear transform" should be
understood to include a discrete cosine transform (DCT) as well as
any other type of linear transform. The term "vector" as used
herein is intended to include any grouping of coefficients or other
elements representative of at least a portion of a signal.
[0019] FIG. 1 shows a communication system 10 configured in
accordance with an illustrative embodiment of the invention. A
discrete-time signal is applied to a pre-processor 12. The
discrete-time signal may represent, for example, a data signal, a
speech signal, an audio signal, an image signal or a video signal,
as well as various combinations of these and other types of
signals. The operations performed by the pre-processor 12 will
generally vary depending upon the application. The output of the
preprocessor is a source sequence {x.sub.k} which is applied to a
multiple description (MD) joint source-channel (JSC) encoder 14.
The encoder 14 encodes n different components of the source
sequence {x.sub.k} for transmission over m channels, using
transform, quantization and entropy coding operations. Each of the
m channels may represent, for example, a packet or a group of
packets. The m channels are passed through a network 15 or other
suitable communication medium to an MD JSC decoder 16. The decoder
16 reconstructs the original source sequence {x.sub.k} from the
received channels. The MD coding implemented in encoder 14 operates
to ensure optimal reconstruction of the source sequence in the
event that one or more of the m channels are lost in transmission
through the network 15. The output of the MD JSC decoder 16 is
further processed in a post processor 18 in order to generate a
reconstructed version of the original discrete-time signal.
[0020] FIG. 2 illustrates the MD JSC encoder 14 in greater detail.
The encoder 14 includes a series arrangement of N macro MD.sub.l
encoders MD.sub.l, . . . MD.sub.N corresponding to reference
designators 20-1, . . . 20-N. An output of the final macro MD.sub.l
encoder 20-N is applied to an entropy coder 22. FIG. 3 shows the
structure of each of the macro MD.sub.l encoders 20-i. Each of the
macro MD.sub.l encoders 20-i receives as an input an r-tuple, where
r is an integer. Each of the elements of the r-tuple is applied to
one of M micro MD.sub.j encoders MD.sub.l, . . . MD.sub.N
corresponding to reference designators 30-1, . . . 30-M. The output
of each of the macro MD.sub.l encoders 20-i is an s-tuple, where s
is an integer greater than or equal to r.
[0021] FIG. 4 indicates that the entropy coder 22 of FIG. 2
receives an r-tuple as an input, and generates as outputs the m
channels for transmission over the network 15. In accordance with
the invention, the m channels may have any distribution of
dependent or independent failure probabilities. More specifically,
given that a channel i is in a state S.sub.t .epsilon.{0,1}, where
S.sub.i=0 indicates that the channel has failed while S.sub.i=1
indicates that the channel is working, the overall state S of the
system is given by the cartesian product of the channel states
S.sub.l over m, and the individual channel probabilities may be
configured so as to provide any probability distribution function
which can be defined on the overall state S.
[0022] FIGS. 5A through 5D illustrate a number of possible
embodiments for each of the micro MD.sub.j encoders 30-j. FIG. 5A
shows an embodiment in which a micro MD.sub.j encoder 30-j includes
a quantizer (Q) block 50 followed by a transform (T) block 51. The
Q block 50 receives an r-tuple as input and generates a
corresponding quantized r-tuple as an output. The T block 51
receives the r-tuple from the Q block 50, and generates a
transformed r-tuple as an output. FIG. 5B shows an embodiment in
which a micro MD.sub.j encoder 30-j includes a T block 52 followed
by a Q block 53. The T block 52 receives an r-tuple as input and
generates a corresponding transformed s-tuple as an output. The Q
block 53 receives the s-tuple from the T block 52, and generates a
quantized s-tuple as an output, where s is greater than or equal to
r. FIG. 5C shows an embodiment in which a micro MD.sub.j encoder
30-j includes only a Q block 54. The Q block 54 receives an r-tuple
as input and generates a quantized s-tuple as an output, where s is
greater than or equal to r. FIG. 5D shows another possible
embodiment, in which a micro MD, encoder 30-j does not include a Q
block or a T block but instead implements an identity function,
simply passing an r-tuple at its input though to its output. The
micro MD.sub.j encoders 30-j of FIG. 3 may each include a different
one of the structures shown in FIGS. 5A through 5D.
[0023] FIGS. 6A through 6C illustrate the manner in which the MD
JSC encoder 14 of FIG. 2 can be implemented in a variety of
different encoding applications. In each of the embodiments shown
in FIGS. 6A through 6C, the MD JSC encoder 14 is used to implement
the quantization, transform and entropy coding operations typically
associated with the corresponding encoding application. FIG. 6A
shows an audio coder 60 which includes an MD JSC encoder 14
configured to receive input from a conventional psychoacoustics
processor 61. FIG. 6B shows an image coder 62 which includes an MD
JSC encoder 14 configured to interact with an element 63 providing
preprocessing functions and perceptual table specifications. FIG.
6C shows a video coder 64 which includes first and second MD JSC
encoders 14-1 and 14-2. The encoder 14-1 receives input from a
conventional motion compensation element 66, while the second
encoder receives input from a conventional motion estimation
element 68. The encoders 14-1 and 14-2 are interconnected as shown.
It should be noted that these are only examples of applications of
an MD JSC encoder in accordance with the invention. It will be
apparent to those skilled in the art that numerous alternate
configurations may also be used, in audio, image, video and other
applications.
[0024] A general model for analyzing MDTC techniques in accordance
with the invention will now be described. Assume that a source
sequence {x.sub.k} is input to an MD JSC encoder, which outputs m
streams at rates R.sub.1, R.sub.2, . . . R.sub.m. These streams are
transmitted on m separate channels. One version of the model may be
viewed as including many receivers, each of which receives a subset
of the channels and uses a decoding algorithm based on which
channels it receives. More specifically, there may be 2.sup.m-1
receivers, one for each distinct subset of streams except for the
empty set, and each experiences some distortion. An equivalent
version of this model includes a single receiver when each channel
may have failed or not failed, and the status of the channel is
known to the receiver decoder but not to the encoder. Both versions
of the model provide reasonable approximations of behavior in a
lossy packet network. As previously noted, each channel may
correspond to a packet or a set of packets. Some packets may be
lost in transmission, but because of header information it is known
which packets are lost. An appropriate objective in a system which
can be characterized in this manner is to minimize a weighted sum
of the distortions subject to a constraint on a total rate R. For
m=2, this minimization problem is related to a problem from
information theory called the multiple description problem.
D.sub.0, D.sub.1 and D.sub.2 denote the distortions when both
channels are received, only channel 1 is received, and only channel
2 is received, respectively. The multiple description problem
involves determining the achievable (R.sub.1, R.sub.2, D.sub.0,
D.sub.1, D.sub.2)-tuples. A complete characterization for an
independent, identically-distributed (i.i.d.) Gaussian source and
squared-error distortion is described in L. Ozarow, "On a
source-coding problem with two channels and three receivers," Bell
Syst. Tech. J., 59 (8): 1417-1426, 1980. It should be noted that
the solution described in the L. Ozarow reference is
non-constructive, as are other achievability results from the
information theory literature.
[0025] An MDTC coding structure for implementation in the MD JSC
encoder 14 of FIG. 2 in accordance with the invention will now be
described. In this illustrative embodiment, it will be assumed for
simplicity that the source sequence {x.sub.k} input to the encoder
is an i.i.d. sequence of zero-mean jointly Gaussian vectors with a
known correlation matrix R.sub.x=[x.sub.kx.sub.k.sup.T]. The
vectors can be obtained by blocking a scalar Gaussian source. The
distortion will be measured in terms of mean-squared error (MSE).
Since the source in this example is jointly Gaussian, it can also
be assumed without loss of generality that the components are
independent. If the components are not independent, one can use a
Karhunen-Loeve transform of the source at the encoder and the
inverse at each decoder. This embodiment of the invention utilizes
the following steps for implementing MDTC of a given source vector
x:
[0026] 1. The source vector x is quantized using a uniform scalar
quantizer with stepsize .DELTA.:x.sub.qi= [x.sub.l].DELTA., where
[.].sub..DELTA.denotes rounding to the nearest multiple of
.DELTA..
[0027] 2. The vector x.sub.q=[x.sub.q1, x.sub.q2, . . .
x.sub.qn].sup.T is transformed with an invertible, discrete
transform {circumflex over (T)}:
.DELTA.Z.sup.n.fwdarw..DELTA.Z.sup.n, y={circumflex over (T)}
(x.sub.q). The design and implementation of {circumflex over (T)}
are described in greater detail below.
[0028] 3. The components of y are independently entropy coded.
[0029] 4. If m>n, the components of y are grouped to be sent
over the m channels.
[0030] When all of the components of y are received, the
reconstruction process is to exactly invert the transform
{circumflex over (T)} to get {circumflex over (x)}=x.sub.q. The
distortion is the quantization error from Step 1 above. If some
components of y are lost, these components are estimated from the
received components using the statistical correlation introduced by
the transform {circumflex over (T)}. The estimate {circumflex over
(x)} is then generated by inverting the transform as before.
[0031] Starting with a linear transform T with a determinant of
one, the first step in deriving a discrete version {circumflex over
(T)} is to factor T into "lifting" steps. This means that T is
factored into a product of lower and upper triangular matrices with
unit diagonals T=T.sub.1 T.sub.2 . . . T.sub.k. The discrete
version of the transform is then given by:
{circumflex over (T)}(x.sub.q)=[T.sub.1[T.sub.2. . .
[T.sub.kx.sub.q].sub.66 ].sub.66].sub.66. (1)
[0032] The lifting structure ensures that the inverse of
{circumflex over (T)} can be implemented by reversing the
calculations in (1):
{circumflex over (T)}.sup.-1(y)=[T.sub.k.sup.-1. . .
[T.sub.2.sup.-1[T.sub.1.sup.-1y].sub..DELTA.].sub..DELTA.].sub..DELTA..
[0033] The factorization of T is not unique. Different
factorizations yield different discrete transforms, except in the
limit as .DELTA. approaches zero. The above-described coding
structure is a generalization of a 2.times.2 structure described in
the above-cited M. T. Orchard et al. reference. As previously
noted, this reference considered only a subset of the possible
2.times.2 transforms; namely, those implementable in two lifting
steps.
[0034] It is important to note that the illustrative embodiment of
the invention described above first quantizes and then applies a
discrete transform. If one were to instead apply a continuous
transform first and then quantize, the use of a nonorthogonal
transform could lead to non-cubic partition cells, which are
inherently suboptimal among the class of partition cells obtainable
with scalar quantization. See, for example, A. Gersho and R. M.
Gray, "Vector Quantization and Signal Compression," Kluwer Acad.
Pub., Boston, Mass., 1992. The above embodiment permits the use of
discrete transforms derived from nonorthogonal linear transforms,
resulting in improved performance.
[0035] An analysis of an exemplary MDTC system in accordance with
the invention will now be described. This analysis is based on a
number of fine quantization approximations which are generally
valid for small .DELTA.. First, it is assumed that the scalar
entropy of y={circumflex over (T)}([x].sub..DELTA.) is the same as
that of [Tx].sub..DELTA.. Second, it is assumed that the
correlation structure of y is unaffected by the quantization.
Finally, when at least one component of y is lost, it is assumed
that the distortion is dominated by the effect of the erasure, such
that quantization can be ignored. The variances of the components
of x are denoted by (.sigma..sub.1.sup.2,.sigma..sub.2.sup.2 . . .
.sigma..sub.n.sup.2 and the correlation matrix of x is denoted by
R.sub.x, where R.sub.x=diag (.sigma..sub.1.sup.2,
.sigma..sub.2.sup.2 . . . .sigma..sub.n.sup.2). Let
R.sub.y=TR.sub.xT.sup.T. In the absence of quantization, R.sub.y
would correspond to the correlation matrix of y. Under the
above-noted fine quantization approximations, R.sub.y will be used
in the estimation of rates and distortions.
[0036] The rate can be estimated as follows. Since the quantization
is fine, y.sub.l is approximately the same as
[(Tx).sub.l].sub..DELTA., i.e., a uniformly quantized Gaussian
random variable. If y.sub.l is treated as a Gaussian random
variable with power .sigma..sub.yl.sup.2=(R.- sub.y).sub.12
quantized with stepsize .DELTA., the entropy of the quantized
coefficient is given by: 1 H ( y i ) 1 2 log 2 yi 2 - log = 1 2 log
yi 2 + 1 2 log 2 - log = 1 2 log yi 2 + k ,
[0037] where k.sub..DELTA..DELTA. (log 2 .pi.e)/2 - log .DELTA. and
all logarithms are base two. Notice that k.sub..DELTA.depends only
on .DELTA.. The total rate R can therefore be estimated as: 2 R = i
= 1 n H ( y t ) = nk + 1 2 log i = 1 n yi 2 , ( 2 )
[0038] The minimum rate occurs when the product from i=1 to n of
.sigma..sub.yl.sup.2 is equivalent to the product from i=1 to n of
.sigma..sub.l.sup.2, and at this rate the components of y are
uncorrelated. It should be noted that T=I is not the only transform
which achieves the minimum rate. In fact, it will be shown below
that an arbitrary split of the total rate among the different
components of y is possible. This provides a justification for
using a total rate constraint in subsequent analysis.
[0039] The distortion will now be estimated, considering first the
average distortion due only to quantization. Since the quantization
noise is approximately uniform, the distortion is .DELTA..sup.2/12
for each component. Thus the distortion when no components are lost
is given by: 3 D 0 = n 2 12 ( 3 )
[0040] and is independent of T.
[0041] The case when l>0 components are lost will now be
considered. It first must be determined how the reconstruction will
proceed. By renumbering the components if necessary, assume that
y.sub.1, y.sub.2, . . . y.sub.n-l are received and y.sub.n-l+1, . .
. y.sub.n are lost. First partition y into "received" and "not
received" portions as y=[y.sub.ry.sub.nr] where y.sub.r=[y.sub.1,
y.sub.2, . . . y.sub.n-l].sup.T and y.sub.nr=[y.sub.n-l+1, . . .
y.sub.n].sup.T. The minimum MSE estimate {circumflex over (x)} of x
given y.sub.r is E[x.vertline.y.sub.r], which has a simple closed
form because in this example x is a jointly Gaussian vector. Using
the linearity of the expectation operator gives the following
sequence of calculations:
{circumflex over
(x)}=E[x.vertline.y.sub.r]=E[T.sup.-1Tx.vertline.y.sub.r]-
=T.sup.-1E[Tx.vertline.y.sub.r]
[0042] 4 = T - 1 E [ [ y r y nr ] y r ] = T - 1 [ y r E [ y nr y r
] ] . ( 4 )
[0043] If the correlation matrix of y is partitioned in a way
compatible with the partition ofy as: then it can be shown that the
conditional signal y.sub.r.vertline.y.sub.nr is Gaussian with mean
B.sub.TR.sub.1.sup.-1y.sub.r and 5 R y = TR x T T = [ R 1 B B T R 2
] ,
[0044] correlation matrix A .DELTA. R.sub.2-B.sup.TR.sub.1.sup.-1B.
Thus, E[y.sub.r.vertline.y.sub.nr]=B.sup.TR.sub.1.sup.-1y.sub.r,
and .eta..DELTA. y.sub.nr-E[y.sub.nr.vertline.y.sub.r] is Gaussian
with zero mean and correlation matrix A. The variable .eta. denotes
the error in predicting y.sub.nr from y.sub.r and hence is the
error caused by the erasure. However, because a nonorthogonal
transform has been used in this example, T.sup.-1 is used to return
to the original coordinates before computing the distortion.
Substituting y.sub.nr-.eta. in (4) above gives the following
expression for {circumflex over (x)}: 6 T - 1 [ y r y nr - ] = x +
T - 1 [ 0 - ] ,
[0045] such that .parallel.x-{circumflex over (x)}.parallel. is
given by: 7 ; T - 1 [ 0 ] r; 2 = T U T U ,
[0046] where U is the last l columns of T.sup.-1. The expected
value E[.parallel.x-{circumflex over (x)} .parallel.] is then given
by: 8 i = 1 l j = 1 l ( U T U ) ij A ij . ( 5 )
[0047] The distortion with l erasures is denoted by D.sub.l. To
determine D.sub.l, (5) above is averaged over all possible
combinations of erasures of I out of n components, weighted by
their probabilities if the probabilities are non-equivalent. An
additional distortion criteria is a weighted sum {overscore (D)} of
the distortions incurred with different numbers of channels
available, where {overscore (D)} is given by: 9 l = 1 n l D l .
[0048] For a case in which each channel has a failure probability
of p and the channel failures are independent, the weighting 10 1 =
( n l ) p l ( 1 - p ) n - 1
[0049] makes the weighted sum {overscore (D)} the overall expected
MSE. Other choices of weighting could be used in alternative
embodiments. Consider an image coding example in which an image is
split over ten packets. One might want acceptable image quality as
long as eight or more packets are received. In this case, one could
set .alpha..sub.3=.alpha..s- ub.4= . . . =.alpha..sub.10 =0.
[0050] The above expressions may be used to determine optimal
transforms which minimize the weighted sum {overscore (D)} for a
given rate R. Analytical solutions to this minimization problem are
possible in many applications. For example, an analytical solution
is possible for the general case in which n=2 components are sent
over m=2 channels, where the channel failures have unequal
probabilities and may be dependent. Assume that the channel failure
probabilities in this general case are as given in the following
table.
1 Channel 1 no failure failure Channel 2 failure
1-P.sub.0-P.sub.1-P.sub.2 P.sub.1 no failure P.sub.2 P.sub.0
[0051] If the transform T is given by: 11 T = [ a b c d ] ,
[0052] minimizing (2) over transforms with a determinant of one
gives a minimum possible rate of:
R*=2k.sub..DELTA.+log .sigma..sub.1.sigma..sub.2.
[0053] The difference .rho.=R-R* is referred to as the redundancy,
i.e., the price that is paid to reduce the distortion in the
presence of erasures. Applying the above expressions for rate and
distortion to this example, and assuming that
.sigma..sub.1>.sigma..sub.2, it can be shown that the optimal
transform will satisfy the following expression: 12 a = 2 2 c 1 [ 2
2 - 1 + 2 2 - 1 - 4 bc ( bc + 1 ) ] .
[0054] The optimal value of bc is then given by: 13 ( bc ) optimal
= - 1 2 + 1 2 ( p 1 p 2 - 1 ) [ ( p 1 p 2 + 1 ) 2 - 4 ( p 1 p 2 ) 2
- 2 ] - 1 / 2 .
[0055] The value of (bc).sub.opimal ranges from -1 to 0 as
p.sub.1/p.sub.2 ranges from 0 to .infin.. The limiting behavior can
be explained as follows: Suppose p.sub.1>>p.sub.2, i.e.,
channel 1 is much more reliable than channel 2. Since
(bc).sub.optimal approaches 0, ad must approach 1, and hence one
optimally sends x.sub.l (the larger variance component) over
channel 1 (the more reliable channel) and vice-versa.
[0056] If p.sub.1=p.sub.2 in the above example, then
(bc).sub.optimal=-1/2, independent of .rho.. The optimal set of
transforms is then given by: a.about.0 (but otherwise arbitrary),
c=-1/2b,d=1/2a and
b=.+-.(2.sup..rho.-{square root}{square root over
(2.sup.2.rho.-1)}).sigma- ..sub.1a/.sigma..sub.2.
[0057] Using a transform from this set gives: 14 D 1 = 1 2 ( D 1 ,
1 + D 1 , 2 ) = 1 2 - 1 2 2 ( 2 - 2 2 - 1 ) ( 1 2 - 2 2 ) . ( 6
)
[0058] For values of .sigma..sub.1=1 and .sigma..sub.2=0.5,
D.sub.1, as expected, starts at a maximum value of
(.sigma..sub.1.sup.2+.sigma..sub.2- .sup.2)/2 and asymptotically
approaches a minimum value of .sigma..sub.2.sup.2. By combining
(2), (3) and (6), one can find the relationship between R, D.sub.0
and D.sub.1. It should be noted that the optimal set of transforms
given above for this example provides an "extra" degree of freedom,
after fixing .rho., that does not affect the .rho. vs. D.sub.1
performance. This extra degree of freedom can be used, for example,
to control the partitioning of the total rate between the channels,
or to simplify the implementation.
[0059] Although the conventional 2.times.2 transforms described in
the above-cited M. T. Orchard et al. reference can be shown to fall
within the optimal set of transforms described herein when channel
failures are independent and equally likely, the conventional
transforms fail to provide the above-noted extra degree of freedom,
and are therefore unduly limited in terms of design flexibility.
Moreover, the conventional transforms in the M. T. Orchard et al.
reference do not provide channels with equal rate (or,
equivalently, equal power). The extra degree of freedom in the
above example can be used to ensure that the channels have equal
rate, i.e., that R.sub.1=R.sub.2, by implementing the transform
such that .vertline.a.vertline.=.vertline.c.vertline. and
.vertline.b.vertline.=.vertline.d.vertline.. This type of rate
equalization would generally not be possible using conventional
techniques without either rendering the resulting transform
suboptimal or introducing additional complexity, e.g., through the
use of multiplexing.
[0060] As previously noted, the invention may be applied to any
number of components and any number of channels. For example, the
above-described analysis of rate and distortion may be applied to
transmission of n=3 components over m=3 channels. Although it
becomes more complicated to obtain a closed form solution, various
simplifications can be made in order to obtain a near-optimal
solution. If it is assumed in this example that
.sigma..sub.1>.sigma..sub.2>.sigma..sub.3, and that the
channel failure probabilities are equal and small, a set of
transforms that gives near-optimal performance is given by: 15 [ a
- 3 1 a 2 - 2 6 3 1 2 a 2 2 a 0 2 6 3 1 2 a 2 a 3 1 a 2 - 2 6 3 1 2
a 2 ] .
[0061] Optimal or near-optimal transforms can be generated in a
similar manner for any desired number of components and number of
channels.
[0062] FIG. 7 illustrates one possible way in which the MDTC
techniques described above can be extended to an arbitrary number
of channels, while maintaining reasonable ease of transform design.
This 4.times.4 transform embodiment utilizes a cascade structure of
2.times.2 transforms, which simplifies the transform design, as
well as the encoding and decoding processes (both with and without
erasures), when compared to use of a general 4.times.4 transform.
In this embodiment, a 2.times.2 transform T.sub..alpha. is applied
to components x.sub.1 and x.sub.2, and a 2.times.2 transform
T.sub..beta. is applied to components x.sub.3 and x.sub.4. The
outputs of the transforms T.sub..alpha. and T.sub..beta. are routed
to inputs of two 2.times.2 transforms T.sub..gamma. as shown. The
outputs of the two 2.times.2 transforms T.sub..gamma. correspond to
the four channels y.sub.1 through y.sub.4. This type of cascade
structure can provide substantial performance improvements as
compared to the simple pairing of coefficients in conventional
techniques, which generally cannot be expected to be near optimal
for values of m larger than two. Moreover, the failure
probabilities of the channels y.sub.1 through y.sub.4 need not have
any particular distribution or relationship. FIGS. 2, 3, 4 and
5A-5D above illustrate more general extensions of the MDTC
techniques of the invention to any number of signal components and
channels.
[0063] Illustrative embodiments of the invention more particularly
directed to transmission of images will be described below with
reference to the flow diagrams of FIGS. 8 and 9. A conventional
technique for communicating an image over a network such as the
Internet is to use a progressive encoding system and to transmit
the coded image as a sequence of packets over a Transmission
Control Protocol (TCP) connection. When there are no packet losses,
the receiver can reconstruct the image as the packets arrive; but
when there is a packet loss, there is a large period of latency
while the transmitter determines that the packet must be
retransmitted and then retransmits the packet. The latency is due
to the fact that the application at the receiving end typically
uses the packets only after they have been put in the proper
sequence. The use of another transmission protocol generally does
not solve the problem: because of the progressive nature of the
encoding, the packets are useful only in the proper sequence. The
problem is more acute if there are stringent delay requirements,
e.g., for fast browsing, and is some cases retransmission may be
not just undesirable but impossible. The present invention
alleviates this latency problem by providing a communication system
that is robust to arbitrarily placed packet erasures and that can
reconstruct an image progressively from packets received in any
order.
[0064] The flow diagram of FIG. 8 illustrates an example of an MDTC
process particularly well suited for use with still images. In this
example, the process codes four channels using a technique which
operates on source vectors with uncorrelated components. In
accordance with the invention, a suitable approximation of this
condition can be obtained by forming vectors from discrete cosine
transform (DCT) coefficients separated both in frequency and in
space. It should be noted that the use of the DCT in the
embodiments of FIGS. 8 and 9 is by way of example only, and any
other suitable linear transform could also be used. In step 100 of
FIG. 8, an 8.times.8 block DCT of the image is computed. The DCT
coefficients are then uniformly quantized in step 102. In step 104,
vectors of length 4 are formed from DCT coefficients separated in
frequency and in space. The spatial separation is maximized, e.g.,
for 512.times.512 images, the samples that are grouped together are
spaced by 256 pixels horizontally and/or vertically. Correlating
transforms are then applied to each 4-tuple vector, as indicated in
step 106. Entropy encoding, such as, e.g., JPEG coding, is then
applied in step 108.
[0065] After the above steps 100-108 are performed, a determination
is made in step 110 as to which frequencies are to be grouped
together, and a cascade transform of the type illustrated in FIG.
8, i.e., an (.alpha., .beta., .gamma.)-tuple, is designed in step
112 for each group of frequencies. The operations in steps 110 and
112 can be based, e.g., on training data or other considerations.
It should be noted that, even in cases in which the source data is
characterized by, e.g., a Gaussian model, the transform parameters
should be numerically optimized. The embodiment illustrated in FIG.
8 may be implemented using one or more of the micro MD.sub.j
encoders 30-j of FIG. 5A, each of which includes a quantizer (Q)
block 50 followed by a transform (1) block 51. As previously noted,
the Q block 50 receives an r-tuple as input and generates a
corresponding quantized r-tuple as an output. The T block 51
receives the r-tuple from the Q block 50, and generates a
transformed r-tuple as an output.
[0066] In the embodiment of FIG. 8, the importance of the DC
coefficient may dictate allocating most of the redundancy to the
group containing the DC coefficient. In an alternative embodiment,
it may be assumed that the quantized DC coefficient is communicated
reliably through some other means, e.g., a separate channel. The
remaining coefficients are then separated, e.g, into those that are
placed in groups of four and those that are sent by one of the four
channels only. Because the optimal allocation of redundancy between
the groups is often difficult to determine, it may instead be
desirable to allocate approximately the same redundancy to each
group. The AC coefficients for each block are then sent over one of
the four channels. It can be shown that such an embodiment provides
a higher quality reconstructed image when one of four packets is
lost, at the expense of worse rate-distortion performance when
there are no packet losses. In addition, the expected number of
bits for each channel is approximately equal, which facilitates
packetization. This is in contrast to certain conventional
techniques in which one must multiplex channel bit streams in order
to produce packets of approximately the same size.
[0067] It should be noted that effects of factors such as coarse
quantization, dead zone, divergence from Gaussian, run length
coding and Huffinan coding are not addressed in the above examples,
but could be addressed through, e.g., an expansive numerical
optimization. The encoding process could be further improved by,
e.g., using a perceptually tuned quantization matrix as suggested
by the JPEG standard, rather than the uniform quantization used for
simplicity in the above examples. Using perceptually tuned
quantization, one can design a system which, e.g., performs as well
as conventional systems when two or four of four packets arrive,
but which performs better when one or three packets arrive.
[0068] In the embodiment of FIG. 8, the redundancy in the source
representation is statistical, i.e., the distribution of one part
of the representation is reduced in variance by conditioning on
another part. Another possible technique for implementing MDTC of
images in accordance with the invention, illustrated in the flow
diagram of FIG. 9, uses a deterministic redundancy between
descriptions. Consider a conventional discrete block code which
represents k input symbols through a set of n output symbols such
that any k of the n can be used to recover the original k. One
possible example is a systematic (n, k) Reed-Solomon code over
GF(2.sup.m) with n=2.sup.m-1, as described in S. Lin and D. J.
Costello, "Error Control Coding: Fundamentals and Applications,"
Prentice-Hall, 1983. If the k input symbols are quantized transform
coefficients, the discrete block code may be a good way to
communicate a k-dimensional source over an erasure channel that
erases symbols with probability less than (n-k)/n. A problem with
this conventional approach is that except in the case that exactly
k of the n transmitted symbols are received, the channel has not
been used efficiently. When more than k symbols are received, those
in excess of k provide no information about the source vector; and
when less than k symbols are received, it is computationally
difficult to use more than just the systematic part of the
code.
[0069] An alternative to the above-described discrete block coding
involves using a linear transform from R.sup.k to R.sup.n, followed
by scalar quantization, to generate n descriptions of a
k-dimensional source. These n descriptions are such that a good
reconstruction can be computed from any k descriptions, but also
descriptions beyond the kth are also useful and reconstructions
from less than k descriptions are easy to compute.
[0070] Assume that we have a tight frame
.PHI.={.phi..sup.m}.sup.n.sub.k=1 R.sup.k with
.parallel..phi..sup.m.parallel.=1 for all m and that y=Fx, where F
is the frame operator associated with .PHI. as described in, for
example, V. K. Goyal, M. Vetterli and N. T. Thao, "Quantized
Overcomplete Expansions in R.sup.N: Analysis, Synthesis and
Algorithms," IEEE Trans. Inform. Th., 44 (1): 16-31, 1998, which is
incorporated by reference herein. This vector passes through the
scalar quantizer Q: =Q(y). The entropy-coded components of can each
be considered a description of x. For simplicity, it will be
assumed that Q is a uniform quantizer with step size .DELTA. and
that n<2 k. If m.gtoreq.k of the components of are known to the
decoder, then x can be specified to within a cell with diameter
approximately equal to .DELTA. and thus is well approximated. Since
the constraints on x provided by each description are independent,
on average, the diameter is a non-increasing function of m. When
m<k components of are received, R.sup.k can be partitioned into
an m-dimensional subspace and a (k-m)-dimensional orthogonal
subspace, such that the component of x in the first subspace is
well specified. With a mild zero-mean condition on the component in
the latter space, a reasonable estimate of x is easily computed.
For any m, estimating x can be posed as a simple least-squares
problem, although for m.gtoreq.k, a better estimate may be found by
exploiting the boundedness of the quantization error, as described
in the above-cited V. Goyal et al. reference.
[0071] The flow diagram of FIG. 9 is an example of the
above-described deterministic redundancy approach, using a frame
alternative to a (10, 8) block code. For the 10.times.8 frame
operator F we use a matrix corresponding to a length -10 real
Discrete Fourier Transform (DFT) of a length-8 sequence. This
matrix can be constructed as F=[F.sup.(1) F.sup.(2)], where 16 F ij
( 1 ) = 1 2 cos ( i - 1 ) ( 2 j - 1 ) 10 and F ij ( 2 ) = 1 2 sin (
i - 1 ) ( 2 j - 1 ) 10 , 1 i 10 , 1 j 4.
[0072] In order to obtain the benefit of perceptual tuning, we
apply this technique to DCT coefficients and use quantization step
sizes as in a typical JPEG decoder. FIG. 9 illustrates the encoding
process. In step 120, an 8.times.8 block DCT of the image is
computed. In step 122, vectors of length 8 are then formed from DCT
coefficients of like frequency, separated in space. Each length 8
vector is expanded in step 124 by left-multiplication with the
frame operator F, and each length 10 vector is uniformly quantized
in step 126 with a step size depending on the frequency. The
encoding process illustrated in FIG. 9 can be implemented using,
e.g., one or more of the micro MD.sub.j encoders 30-j of FIG. 5B,
each of which includes a T block 52 followed by a Q block 53. The
Tblock 52 receives an r-tuple as input and generates a
corresponding transformed s-tuple as an output. The Q block 53
receives the s-tuple from the T block 52, and generates a quantized
s-tuple as an output, where s is greater than or equal to r.
[0073] The reconstruction for the above-described frame-based
process may follow a least-squares strategy. It can be shown that
the frame-based process of FIG. 9 provides better performance than
a corresponding systematic block code when less than eight packets
are received, and the performance degrades gracefully as the number
of lost packets increases. It should be noted, however, that the
process of FIG. 9 may not provide better performance than the
corresponding block code when all ten packets are received.
[0074] The above-described embodiments of the invention are
intended to be illustrative only. For example, image
characteristics, e.g., resolution, block size, etc., coding
parameters, e.g., quantization, frame type, etc., and other aspects
of the examples of FIGS. 8 and 9 may be varied in alternative
embodiments of the invention. It should be noted that a
complementary decoder structure corresponding to the encoder
structure of FIGS. 2, 3, 4 and 5A-5D may be implemented in the MD
JSC decoder 16 of FIG. 1. Alternative embodiments of the invention
may utilize other coding structures and arrangements. Moreover, the
invention may be used for a wide variety of different types of
compressed and uncompressed signals, and in numerous coding
applications other than those described herein. These and numerous
other alternative embodiments within the scope of the following
claims will be apparent to those skilled in the art.
* * * * *