U.S. patent application number 09/283017 was filed with the patent office on 2001-09-06 for method for transferring and displaying compressed images.
This patent application is currently assigned to AMERICA ONLINE, INC.. Invention is credited to JOHNSON, STEPHEN G..
Application Number | 20010019630 09/283017 |
Document ID | / |
Family ID | 23055447 |
Filed Date | 2001-09-06 |
United States Patent
Application |
20010019630 |
Kind Code |
A1 |
JOHNSON, STEPHEN G. |
September 6, 2001 |
METHOD FOR TRANSFERRING AND DISPLAYING COMPRESSED IMAGES
Abstract
An image of certain resolution higher than possible in a single
transmission over a finite bandwidth channel is obtained by
transferring a progressively-rendered, compressed image. Initially,
a low quality image is compressed and transmitted over a finite
bandwidth channel. Then, a successively higher resolution image
information is compressed at a source and is transmitted. The
successively higher resolution image information received at the
destination end is used to display a higher resolution image at the
destination end.
Inventors: |
JOHNSON, STEPHEN G.;
(NEWPORT BEACH, CA) |
Correspondence
Address: |
John F. Hayden
Fish & Richardson P.C.
601 Thiteenth Street, NW
Washington
DC
20005
US
|
Assignee: |
AMERICA ONLINE, INC.
|
Family ID: |
23055447 |
Appl. No.: |
09/283017 |
Filed: |
March 31, 1999 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
09283017 |
Mar 31, 1999 |
|
|
|
08636170 |
Apr 22, 1996 |
|
|
|
5892847 |
|
|
|
|
08636170 |
Apr 22, 1996 |
|
|
|
08276161 |
Jul 14, 1994 |
|
|
|
Current U.S.
Class: |
382/232 ;
375/E7.04; 375/E7.088; 375/E7.09; 375/E7.14; 375/E7.166;
375/E7.193; 375/E7.209; 375/E7.226; 375/E7.232 |
Current CPC
Class: |
H04N 19/60 20141101;
H04N 19/80 20141101; H04N 19/63 20141101; H04N 19/124 20141101;
H04N 19/126 20141101; H04N 19/30 20141101; H04N 19/186 20141101;
H04N 19/94 20141101 |
Class at
Publication: |
382/232 |
International
Class: |
G06K 009/36; G06K
009/46 |
Claims
What is claimed is:
1. A method of transferring a progressively-rendered, compressed
image, over a finite bandwidth channel, comprising: producing a
coarse quality compressed image at a source and transmitting said
coarse quality compressed image over a channel as a first part of a
transmission to a destination end; receiving the coarse quality
compressed image at a receiver at the destination end at a first
time and displaying an image based on said coarse quality
compressed image on a display system of the receiver when received
at said first time; creating additional information about the
image, at the source end, from which a standard quality image can
be displayed, said standard quality image being of a higher quality
than said coarse quality image, and sending compressed information
over said channel indicative of information for said standard
quality image, said sending said standard quality image information
occurring subsequent in time to said sending of all of said
information for said coarse quality image; receiving said standard
quality information at the receives at a second time, subsequent to
the first time, and decompressing said standard quality image
information, to improve the quality of the image displayed on said
display system, and to display said standard quality image;
obtaining further information about the image beyond the
information in said standard quality image, to provide an enhanced
quality image, and compressing said information for said enhanced
quality image, said enhanced quality image having more image
details than said standard quality image; transmitting said
information for said enhanced quality image, at a time subsequent
to transmitting said information for said coarse quality image and
said standard quality image; and receiving said enhanced quality
image information at said receiver, at a third time subsequent to
said first and second times, and updating a display on said display
system to display the additional enhanced quality image.
2. A method as in claim 1, wherein said producing the coarse
quality image uses a different compression technique than said
creating additional information indicative of the standard quality
image.
3. A method as in claim 1, wherein said coarse quality image
includes information indicative of a miniature version of an
original image, and said displaying the coarse quality image
comprises interpolating said miniature to a size of the original
image and displaying said image.
4. A method as in claim 2, wherein said creating additional
information comprises determining a characteristic of the image,
determining which of a plurality of different as compression
technique will best compress the characteristic determined; and
compressing said image using the determined technique.
5. A method as in claim 4, further comprising determining a
plurality of areas in said image, and determining, for each area,
which of the plurality of different compression techniques will
optimize the compression ratio.
6. A method as in claim 5, further comprising interleaving and
channel encoding different portions of the compressed image.
7. A method as in claim 5, wherein said compression technique
include vector quantization and discrete cosine transform.
8. A method as in claim 3, wherein said obtaining a miniature
comprises decimating along vertical and horizontal axes.
9. A method of transmitting and displaying a compressed image
comprising: first obtaining and sending a first layer of
information indicative of a compressed miniature image at a first
time; first receiving said first layer at said decoder end and
decompressing and displaying a first coarse image indicative
thereof; second obtaining and sending information indicative of a
compressed improved resolution image having more details than said
first coarse image, and transmitting said information at a second
time subsequent to said first time; and second receiving and
decompressing said improved resolution image information to provide
an updated display which improves the resolution of said first
coarse image.
10. A method as in claim 9, wherein said obtaining coarse
information comprises: transmitting information indicative of a
compressed miniature of the image; receiving the compressed
miniature of the image; interpolating the compressed miniature of
the image into a full sized image; and displaying the full sized
image.
11. A method as in claim 10, wherein the first coarse image is
compressed using a first compression technique and the second image
is compressed using a second compression technique which is
different from the first compression technique.
12. A method as in claim 11, further comprising determining which
of a plurality of different image compression technique will most
efficiently code information indicative of said image.
13. A method as in claim 12, wherein said determining uses fuzzy
logic technique.
14. A method as in claim 11, wherein said first obtaining comprises
decimating data on the image to form a reduced quality image,
fitting the decimated data to a first model which partially
restores source image detail lost by decimation, and calculating
reconstruction values from the fitting.
15. A method as in claim 14, further comprising using said
reconstruction weights to interpolate the decimated data into a
full sized image while minimizing a mean squared error between
original image components and interpolated image components.
16. A method as in claim 11, wherein said first step comprises
forming miniature versions of the original source image for each of
a plurality of primary colors.
17. A method as in claim 9, wherein said first obtaining comprises
obtaining a miniature image, and further comprising analyzing the
miniature image to classify the image into one of a plurality of
classes indicative of which of a plurality of compression
techniques will best compress said image.
Description
BACKGROUND OF THE INVENTION
[0001] 1. Field of the Invention
[0002] This invention relates to the compression and decompression
of digital data and, more particularly, to the reduction in the
amount of digital data necessary to store and transmit images.
[0003] 2. Background of the Invention
[0004] Image compression systems are commonly used in computers to
reduce the storage space and transmittal times associated with
storing, transferring and retrieving images. Due to increased use
of images in computer applications, and the increase in the
transfer of images, a variety of image compression techniques have
attempted to solve the problems associated with the large amounts
of storage space (i.e., hard disks, tapes or other devices) needed
to store images.
[0005] Conventional devices store an image as a two-dimensional
array of picture elements, or pixels. The number of pixels
determines the resolution of an image. Typically the resolution is
measured by stating the number of horizontal and vertical pixels
contained in the two dimensional image array. For example, a 640 by
480 image has 640 pixels across and 480 from top to bottom to total
307,200 pixels.
[0006] While the number of pixels represents the image resolution,
the number of bits assigned to each pixel represents the number of
available intensity levels of each pixel. For example, if a pixel
is only assigned one bit, the pixel can represent a maximum of two
values. Thus the range of colors which can be assigned to that
pixel is limited to two (typically black and white). In color
images, the bits assigned to each pixel represent the intensity
values of the three primary colors of red, green and blue. In
present "true color" applications, each pixel is normally
represented by 24 bits where 8 bits are assigned to each primary
color allowing the encoding of 16.8 million
(2.sup.8.times.2.sup.8.times.- 2.sup.8) different colors.
[0007] Consequently, color images require large amounts of storage
capacity. For example, a typical color (24 bits per 5 pixel) image
with a resolution of 640 by 480 requires approximately 922,000
bytes of storage. A larger 24-bit color image with a 2000 by 2000
pixel resolution requires approximately twelve million bytes of
storage. As a result, image-based applications such as interactive
shopping, multimedia products, electronic games and other
image-based presentations require large amounts of storage space to
display high quality color images.
[0008] In order to reduce storage requirements, an image is
compressed (encoded) and stored as a smaller file which requires
less storage space. In order to retrieve and view the compressed
image, the compressed image file is expanded (decoded) to its
original size. The decoded (or "reconstructed") image is usually an
imperfect or "lossy" representation of the original image because
some information may be lost in the compression process. Normally,
the greater the amount of compression the greater the divergence
between the original image and the reconstructed image. The amount
of compression is often referred to as the compression ratio. The
compression ratio is the amount of storage space needed to store
the original (uncompressed) digitized image file divided by the
amount of storage space needed to store the corresponding
compressed image file.
[0009] By reducing the amount of storage space needed to store an
image, compression is also used to reduce the time needed to
transfer and communicate images to other locations. In order to
transfer an image, the data bits that represent the image are sent
via a data channel to another location. The sequence of transmitted
bytes is called the data stream. Generally, the image data is
encoded and the compressed image data stream is sent over a data
channel and when received, the compressed image data is decoded to
recreate the original image. Thus, compression speeds the
transmission of image files by reducing their size.
[0010] Several processes have been developed for compressing the
data required to represent an image. Generally, the processes rely
on two methods: 1) spatial or time domain compression, and 2)
frequency domain compression. In frequency domain compression, the
binary data representing each pixel in the space or time domain are
mapped into a new coordinate system in the frequency domain.
[0011] In general, the mathematical transforms, such as the
discrete cosine transform (DCT), are chosen so that the signal
energy of the original image is preserved, but the energy is
concentrated in a relatively few transform coefficients. Once
transformed, the data is compressed by quantization and encoding of
the transform coefficients.
[0012] Optimization of the process of compressing an image includes
increasing the compression ratio while maintaining the quality of
the original image, reducing the time to encode an image, and
reducing the time to decode a compressed image. In general, a
process that increases the compression ratio or decreases the time
to compress an image results in a loss of image quality. A process
that increases the compression ratio and maintains a high quality
image often results in longer encoding and decoding times.
Accordingly, it would be advantageous to increase the compression
ratio and reduce the time needed to encode and decode an image
while maintaining a high quality image.
[0013] It is well known that image encoders can be optimized for
specific image types. For example, different types of images may
include graphical, photographic, or typographic information or
combinations thereof. As discussed in more detail below, the
encoding of an image can be viewed as a multi-step process that
uses a variety of compression methods which include filters,
mathematical transformations, quantization techniques, etc. In
general each compression method will compress different image types
with varying comparative efficiency. These compression methods can
be selectively applied to optimize an encoder with respect to a
certain type of image. In addition to selectively applying various
compression methods, it is also possible to optimize an encoder by
varying the parameters (e.g., quantization tables) of a particular
compression method.
[0014] Broadly speaking, however, the prior art does not provide an
adaptive encoder that automatically decomposes a source image,
classifies its parts, and selects the optimal compression methods
and the optimal parameters of the selected compression methods
resulting in an optimized encoder that increases relative
compression rates.
[0015] Once an image is optimally compressed with an encoder, the
set of compressed data are stored in a file. The structure of the
compressed file is referred to as the file format. The file format
can be fairly simple and common, or the format can be quite complex
and include a particular sequence of compressed data or various
types of control instructions and codes.
[0016] The file format (the structure of the data in the file) is
especially important when compressed data in the file will be read
and processed sequentially and when the user desires to view or
transmit only part of a compressed image file. Accordingly, it
would be advantageous to provide a file format that "layers" the
compressed image components, arranging those of greatest visual
importance first, those of secondary visual importance second, and
so on. Layering the compressed file format in such a way allows the
first segment of the compressed image file to be decoded prior to
the remainder of the file being received or read by the decoder.
The decoder can display the first segment (layer) as a miniature
version of the entire image or can enlarge the miniature to display
a coarse or "splash" quality rendition of the original image. As
each successive file segment or layer is received, the decoder
enhances the quality of the displayed picture by selectively adding
detail and correcting pixel values.
[0017] Like the encoding process, the decoding of an image can be
viewed as a multi-step process that uses a variety of decoding
methods which include inverse mathematical transformations, inverse
quantization techniques, etc. Conventional decoders are designed to
have an inverse function relative to the encoding system. These
inverse decoding methods must match the encoding process used to
encode the image. In addition, where an encoder makes
content-sensitive adaptations to the compression algorithm, the
decoder must apply a matching content-sensitive decoding
process.
[0018] Generally, a decoder is designed to match a specific
encoding process. Prior art compression systems exist that allow
the decoder to adjust particular parameters, but the prior art
encoders must also transmit accompanying tables and other
information. In addition, many conventional decoders are limited to
specific decoding methods that do not accommodate content-sensitive
adaptations.
SUMMARY OF THE INVENTION
[0019] The problems outlined above are solved by the method and
apparatus of the present invention. That is, the computer-based
image compression system of the present invention includes a unique
encoder which compresses images and a unique decoder which
decompresses images. The unique compression system obtains high
compression ratios at all image quality levels while achieving
relatively quick encoding and decoding times.
[0020] A high compression ratio enables faster image transmission
and reduces the amount of storage space required to store an image.
When compared with conventional compression techniques, such as the
Joint Photographic Experts Group (JPEG), the present invention
significantly increases the compression ratio for color images
which, when decompressed, are of comparable quality to the JPEG
images. The exact improvement over JPEG will depend on image
content, resolution, and other factors.
[0021] Smaller image files translate into direct storage and
transmission time savings. In addition, the present invention
reduces the number of operations to encode and decode an image when
compared to JPEG and other compression methods of a similar nature.
Reducing the number of operations reduces the amount of time and
computing resources needed to encode and decode an image, and thus
improves computer system response times.
[0022] Furthermore, the image compression system of the present
invention optimizes the encoding process to accommodate different
image types. As explained below, the present invention uses fuzzy
logic techniques to automatically analyze and decompose a source
image, classify its components, select the optimal compression
method for each component, and determine the optimal
content-sensitive parameters of the selected compression methods.
The encoder does not need prior information regarding the type of
image or information regarding which compression methods to apply.
Thus, a user does not need to provide compression system
customization or need to set the parameters of the compression
methods.
[0023] The present invention is designed with the goal of providing
an image compression system that reliably compresses any type of
image with the highest achievable efficiency, while maintaining a
consistent range of viewing qualities. Automating the system's
adaptivity to varied image types allows for a minimum of human
intervention in the encoding process and results in a system where
the compression and decompression process are virtually transparent
to the users.
[0024] The encoder and decoder of the present invention contain a
library of encoding methods that are treated as a "toolbox." The
toolbox allows the encoder to selectively apply particular encoding
methods or tools that optimize the compression ratio for a
particular image component. The toolbox approach allows the encoder
to support many different encoding methods in one program, and
accommodates the invention of new encoding methods without
invalidating existing decoders. The toolbox approach thus allows
upgradeability for future improvements in compression methods and
adaptation to new technologies.
[0025] A further feature of the present invention is that the
encoder creates a file format that segments or "layers" the
compressed image. The layering of the compressed image allows the
decoder to display image file segments, beginning with the data at
the front of the file, in a coherent sequence which begins with the
decoding and display of the information that constitutes the core
of the image as defined by human perception. This core information
can appear as a good quality miniature of the image and/or as a
full sized "splash" or coarse quality version of the image. Both
the miniature and splash image enable the user to view the essence
of an image from a relatively small amount of encoded data. In
applications where the image file is being transmitted over a data
channel, such as a telephone line or limited bandwidth wireless
channel, display of the miniature and/or splash image occurs as
soon as the first segment or layer of the file is received. This
allows users to view the image quickly and to see detail being
added to the image as subsequent layers are received, decoded, and
added to the core image.
[0026] The decoder decompresses the miniature and the full sized
splash quality image from the same information. User specified
preferences and the application determine whether the miniature
and/or the full sized splash quality image are displayed for any
given image.
[0027] Whether the first layer is displayed as a miniature or a
splash quality full size image, the receipt of each successive
layer allows the decoder to add additional image detail and
sharpness. Information from the previous layer is supplemented, not
discarded, so that the image is built layer by layer. Thus a single
compressed file with a layered file format can store both a
thumbnail and a full size version of the image and can store the
full size version at various quality levels without storing any
redundant information.
[0028] The layered approach of the present invention allows the
transmission or decoding of only the part of the compressed file
which is necessary to display a desired image quality. Thus, a
single compressed file can generate a thumbnail and different
quality full size images without the need to recompress the file to
a smaller size and lesser quality, or store multiple files
compressed to different file sizes and quality levels.
[0029] This feature is particularly advantageous for on line
service applications, such as shopping or other applications where
the user or the application developer may want several thumbnail
images downloaded and presented before the user chooses to receive
the entire full size, high quality image. In addition to conserving
the time and transmission costs associated with viewing a variety
of high quality images that may not be of interest, the user need
only subsequently download the remainder of each image file to view
the higher detail versions of the image.
[0030] The layered format also allows the storage of different
layers of the compressed data file separate from one another. Thus,
the core image data (miniature) can be stored locally (e.g., in
fast RAM memory for fast access), and the higher quality
"enhancement" layers can be stored remotely in lower cost bulk
storage.
[0031] A further feature of the layered file format of the present
invention allows the addition of other compressed data information.
The layered and segmented file format is extendable so that new
layers of compressed information such as sound, text and video can
be added to the compressed image data file. The extendable file
format allows the compression system to adapt to new image types
and to combine compressed image data with sound, text and
video.
[0032] Like the encoder, the decoder of the present invention
includes a toolbox of decoding methods. The decoding process can
begin with the decoder first determining the encoding methods used
to encode each data segment. The decoder determines the encoding
methods from instructions the encoder inserts into the compressed
data file.
[0033] Adding decoder instructions to the compressed image data
provides several advantages. A decoder that recognizes the
instructions can decode files from a variety of different encoders,
accommodate content-sensitive encoding methods, and adjust to user
specific needs. The decoder of the present invention also skips
parts of the data stream that contain data that are unnecessary for
a given rendition of the image, or ignore parts of the data stream
that are in an unknown format. The ability to ignore unknown
formats allows future file layers to be added while maintaining
compatibility with older decoders.
[0034] In a preferred embodiment of the present invention, the
encoder compresses an image using a first Reed Spline Filter, an
image classifier, a discrete cosine transform, a second and third
Reed Spline Filter, a differential pulse code modulator, an
enhancement analyzer, and an adaptive vector quantizer to generate
a plurality of data segments that contain the compressed image. The
plurality of data segments are further compressed with a channel
encoder.
[0035] The Reed Spline Filter includes a color space conversion
transform, a decimation step and a least mean squared error (LMSE)
spline fitting step. The output of the first Reed Spline Filter is
then analyzed to determine an image type for optimal compression.
The first Reed Spline Filter outputs three components which are
analyzed by the image classifier. The image classifier uses fuzzy
logic techniques to classify the image type. Once the image type is
determined, the first component is separated from the second and
third components and further compressed with an optimized discrete
cosine transform and an adaptive vector quantizer. The second and
third components are further compressed with a second and third
Reed Spline Filter, the adaptive vector quantizer, and a
differential pulse code modulator.
[0036] The enhancement analyzer enhances areas of an image
determined to be the most visually important, such as text or
edges. The enhancement analyzer determines the visual priority of
pixel blocks. The pixel block dimensions typically correspond to
16.times.16 pixel blocks in the source image. In addition, the
enhancement analyzer prioritizes each pixel block so that the most
important enhancement information is placed in the earliest
enhancement layers so that it can be decoded first. The output of
the enhancement analyzer is compressed with the adaptive vector
quantizer.
[0037] A user may set the encoder to compute a color palette
optimized to the color image. The color palette is combined with
the output of the discrete cosine transform, the adaptive vector
quantizer, the differential pulse code modulator, and the
enhancement analyzer to create a plurality of data segments. The
channel encoder then interleaves and compresses the plurality of
data segments.
BRIEF DESCRIPTION OF THE DRAWINGS
[0038] These and other aspects, advantages, and novel features of
the invention will become apparent upon reading the following
detailed description and upon reference to accompanying drawings in
which:
[0039] FIG. 1 is a block diagram of an image compression system
that encodes, transfers and decodes an image and includes a source
image, an encoder, a compressed file, a first storage device, a
data channel, a data stream, a decoder, a display, a second storage
device, and a printer;
[0040] FIG. 2 illustrates the multi-step decoding process and
includes the source image, the encoder, the compressed file, the
data channel, the data stream, the decoder, a thumbnail image, a
splash image, a panellized standard image, and the final
representation of the source image;
[0041] FIG. 3 is a block diagram of the encoder showing the four
stages of the encoding process;
[0042] FIG. 4 is a block diagram of the encoder showing a first
Reed Spline Filter, a color space conversion transform, a Y
miniature, a U miniature, an X miniature, an image classifier, an
optimized discrete cosine transform, a discrete cosine transform
residual calculator, an adaptive vector quantizer, a second and
third Reed Spline Filter, a Reed Spline residual calculator, a
differential pulse coder modulator, an enhancement analyzer, a high
resolution residual calculator, a palette selector, a plurality of
data segments and a channel encoder;
[0043] FIG. 5 is a block diagram of the image formatter;
[0044] FIG. 6 is a block diagram of the Reed Spline Filter;
[0045] FIG. 7 is a block diagram of the color space conversion
transform;
[0046] FIG. 8 is a block diagram of the image classifier;
[0047] FIG. 9 is a block diagram of the optimized discrete cosine
transform;
[0048] FIG. 10 is a block diagram of the DCT residual
calculator;
[0049] FIG. 11 is a block diagram of the adaptive vector
quantizer;
[0050] FIG. 12 is a block diagram of the second and third Reed
Spline Filters;
[0051] FIG. 13 is a block diagram of the Reed Spline residual
calculator;
[0052] FIG. 14 is a block diagram of the differential pulse code
modulator;
[0053] FIG. 15 is a block diagram of the enhancement analyzer;
[0054] FIG. 16 is a block diagram of the high resolution residual
calculator;
[0055] FIG. 17 is the block diagram of the palette selector;
[0056] FIG. 18 is the block diagram of the channel encoder;
[0057] FIG. 19 is a block diagram of the vector quantization
process;
[0058] FIGS. 20a and 20b show the segmented architecture of the
data stream;
[0059] FIG. 21 illustrates the normal segment;
[0060] FIG. 22a, 22b, 22c and 22d illustrate the layering and
interleaving of the plurality of data segments;
[0061] FIG. 23 is a block diagram of the decoder of the present
invention;
[0062] FIG. 24 illustrates the multi-step decoding process and
includes a Ym miniature, a Um miniature, an Xm miniature, the
thumbnail miniature, the splash image and the standard image, and
the enhanced image;
[0063] FIG. 25 is a block diagram of the decoder and includes an
inverse Huffman encoder, an inverse DPCM, a dequantizer, a
combiner, an inverse DCT, a demultiplexer, and an adder;
[0064] FIG. 26 is a block diagram of the decoder and includes the
interpolator, interpolation factors, a scaler, scale factors, a
replicator, and an inverse color converter;
[0065] FIG. 27 is a block diagram of the decoder that includes the
inverse Huffman encoder, the combiner, the dequantizer, the inverse
DCT, a pattern matcher, the adder, the interpolator, and an
enhancement overlay builder;
[0066] FIG. 28 is block diagram of the scaler with an input to
output ratio of five-to-three in the one dimensional case;
[0067] FIG. 29 illustrates the process of bilinear
interpolation;
[0068] FIG. 30 is a block diagram of the process of optimizing the
compression methods with the image classifier, the enhancement
analyzer, the optimized DCT, the AVQ, and the channel encoder;
[0069] FIG. 31 is a block diagram of the image classifier;
[0070] FIG. 32 is a flow chart of the process of creating an
adaptive uniform DCT quantization table;
[0071] FIG. 33 illustrates a table of several examples showing the
mapping from input measurements to input sets to output sets;
[0072] FIG. 34 is a block diagram of image data compression;
[0073] FIG. 35 is a block diagram of a spline
decimation/interpolation filter;
[0074] FIG. 36 is a block diagram of an optimal spline filter;
[0075] FIG. 37 is a vector representation of the image, processed
image, and residual image;
[0076] FIG. 38 is a block diagram showing a basic optimization
block of the present invention;
[0077] FIG. 39 is a graphical illustration of a one-dimensional
bi-linear spline projection;
[0078] FIG. 40 is a schematic view showing periodic replication of
a two-dimensional image;
[0079] FIGS. 41a, 41b and 41c are perspective and plan views of a
two-dimensional planar spline basis;
[0080] FIG. 42 is a diagram showing representations of the
hexagonal tent function;
[0081] FIG. 43 is a flow diagram of compression and reconstruction
of image data;
[0082] FIG. 44 is a graphical representation of a normalized
frequency response of a one-dimensional bi-linear spline basis;
[0083] FIG. 45 is a graphical representation of a one-dimensional
eigenfilter frequency response;
[0084] FIG. 46 is a perspective view of a two-dimensional
eigenfilter frequency response;
[0085] FIG. 47 is a plot of standard error as a function of
frequency for a one-dimensional cosinusoidal image;
[0086] FIG. 48 is a plot of original and reconstructed
one-dimensional images and a plot of standard error;
[0087] FIG. 49 is a first two-dimensional image reconstruction for
different compression factors;
[0088] FIG. 50 is a second two-dimensional image reconstruction for
different compression factors;
[0089] FIG. 51 is plots of standard error for representative images
1 and 2;
[0090] FIG. 52 is a compressed two- miniature using the optimized
decomposition weights;
[0091] FIG. 53 is a block diagram of a preferred adaptive
compression scheme in which the method of the present invention is
particularly suited;
[0092] FIG. 54 is a block diagram showing a combined sublevel and
optimal-spline compression arrangement;
[0093] FIG. 55 is a block diagram showing a combined sublevel and
optimal-spline reconstruction arrangement;
[0094] FIG. 56 is a block diagram showing a multi-resolution
optimized interpolation arrangement; and
[0095] FIG. 57 is a block diagram showing an embodiment of the
optimizing process in the image domain.
DETAILED DESCRIPTION OF THE INVENTION
[0096] FIG. 1 illustrates a block diagram of an image compression
system that includes a source image 100, an encoder 102, a
compressed file 104, a first storage device 106, a communication
data channel 108, a decoder 110, a display 112, a second storage
device 114, and a printer 116. The source image 100 is represented
as a two-dimensional image array of picture elements, or pixels.
The number of pixels determines the resolution of the source image
100, which is typically measured by the number of horizontal and
vertical pixels contained in the two-dimensional image array.
[0097] Each pixel is assigned a number of bits that represent the
intensity level of the three primary colors: red, green, and blue.
In the preferred embodiment, the full-color source image 100 is
represented with 24 bits, where 8 bits are assigned to each primary
color. Thus, the total storage required for an uncompressed image
is computed as the number of pixels in the image times the number
of bits used to represent each pixel (referred to as bits per
pixel).
[0098] As discussed in more detail below, the encoder 102 uses
decimation, filtering, mathematical transforms, and quantization
techniques to concentrate the image into fewer data samples
representing the image with fewer bits per pixel than the original
format. Once the source image 100 is compressed with the encoder
102, the set of compressed data are assembled in the compressed
file 104. The compressed file 104 is stored in the first storage
device 106 or transmitted to another location via the data channel
108. If the compressed file 104 is transmitted to another location,
the data stored in the compressed file 104 is transmitted
sequentially via the data channel 108. The sequence of bits in the
compressed file 104 that are transmitted via the data channel 108
is referred to as a data stream 118.
[0099] The decoder 110 expands the compressed file 104 to the
original source image size. During the process of decoding the
compressed file 104, the decoder 110 displays the expanded source
image 100 on the display 112. In addition, the decoder 110 may
store the expanded compressed file 104 in the second storage device
114 or print the expanded compressed file 104 on the printer
116.
[0100] For example, if the source image 100 comprises a
640.times.480, 24-bit color image, the amount of memory needed to
store and display the source image 100 is approximately 922,000
bytes. In the preferred embodiment, the encoder 102 computes the
highest compression ratio for a given decoding quality and playback
model. The playback model allows a user to select the decoding mode
as is discussed in more detail below. The compressed data are then
assembled in the compressed file 104 for transmittal via the data
channel 108 or stored in the first storage device 106. For example,
at a 92-to-1 compression ratio, the 922,000 bytes that represent
the source image 100 are compressed into approximately 10,000
bytes. In addition, the encoder 102 arranges the compressed data
into layers in the compressed file 104.
[0101] Referring to FIG. 2, it can be seen that the layering of the
compressed file 104 allows the decoder 110 to display a thumbnail
image and progressively improving quality versions of the source
image 100 before the decoder 110 receives the entire compressed
file 104. The first data expanded by the decoder 110 can be viewed
as a thumbnail miniature 120 of the original image or as a coarse
quality "splash" image 122 with the same dimensions as the original
image. The splash image 122 is a result of interpolating the
thumbnail miniature to the dimensions of the original image. As the
decoder 110 continues to receive data from the data stream 118, the
decoder 110 creates a standard image 124 by decoding the second
layer of information and adding it to the splash image 122 data to
create a higher quality image. The encoder 102 can create a
user-specified number of layers in which each layer is decoded and
added to the displayed image as data is received. Upon receiving
the entire compressed file 104 via the data stream 118, the decoder
110 displays an enhanced image 105 that is the highest quality
reconstructed image that can be obtained from the compressed data
stream 118.
[0102] FIG. 3 illustrates a block diagram of the encoder 102
constructed in accordance with the present invention. The encoder
102 compresses the source image 100 in four main stages. In a first
stage 126, the source image 100 is formatted, processed by a Reed
Spline Filter and color converted. In a second stage 128, the
encoder 102 classifies the source image 100 in blocks. In a third
stage 130, the encoder 102 selectively applies particular encoding
methods that optimize the compression ratio. Finally, the
compressed data are interleaved and channel encoded in a fourth
stage 132.
[0103] The encoder 102 contains a library of encoding methods that
are treated as a toolbox. The toolbox allows the encoder 102 to
selectively apply particular encoding methods that optimize the
compression ratio for a particular image type. In the preferred
embodiment, the encoder 102 includes at least one of the following:
an adaptive vector quantizer (AVQ 134), an optimized discrete
cosine transform (optimized DCT 136), a Reed Spline Filter 138
(RSF), a differential pulse code modulator (DPCM 140), a run length
encoder (RLE 142), and an enhancement analyzer 144.
[0104] FIG. 4 illustrates a more detailed block diagram of the
encoder 102. The first stage 126 of the encoder 102 includes a
formatter 146, a first Reed Spline Filter 148 and a color space
converter 150 which produces Y data 186, and U and X data 188. The
second stage 128 includes an image classifier 152. The third stage
includes an optimized discrete cosine transform and adaptive DCT
quantization (optimized DCT 136), a DCT residual calculator 154,
the adaptive vector quantizer (AVQ 134), a second and a third Reed
Spline Filter 156, a Reed Spline residual calculator 158, the
differential pulse code modulator (DPCM 140), a resource file 160,
the enhancement analyzer 144, a high resolution residual calculator
162, and a palette selector 164. The fourth stage includes a
plurality of data segments 166 and a channel encoder 168. The
output of the channel encoder 168 is stored in the compressed file
104.
[0105] The formatter 146, as shown in more detail in FIG. 5,
converts the source image 100 from its native format to a 24-bit
red, green and blue pixel array. For example, if the source image
100 is an 8-bit palletized image, the formatter converts the 8-bit
palletized image to a 24-bit red, green, and blue equivalent.
[0106] The first Reed Spline Filter 148, illustrated in more detail
in FIG. 6, uses a two-step process to compress the formatted source
image 100. The two-step process comprises a decimation step
performed in block 170 and a spline fitting step performed in a
block 172. As explained in more detail below, the decimation step
in the block 170 decimates each color component of red, green, and
blue by a factor of two along the vertical and horizontal
dimensions using a Reed Spline decimation kernal. The decimation
factor is called "tau." The R_tau2' decimated data 174 corresponds
to the red component decimated by a factor of 2. The G_tau2'
decimated data 176 corresponds to the green component decimated by
a factor of 2. The B_tau2' decimated data 178 corresponds to the
blue component decimated by a factor of 2.
[0107] In the spline fitting step in block 172, the first Reed
Spline Filter 148 partially restores the source image detail lost
by the decimation in block 170. The spline fitting step in block
172 processes the R_tau2' decimated data 172, the G_tau2' decimated
data, and the B_tau2' decimated data to calculate optimal
reconstruction weights.
[0108] As explained in more detail below, the decoder 110 will
interpolate the decimated data into a full sized image. In this
interpolation, the decoder 110 uses the reconstruction weights
which have been calculated by the Reed Spline Filter in such a way
as to minimize the mean squared error between the original image
components and the interpolated image components. Accordingly the
Reed Spline Filter 148 causes the interpolated image to match the
original image more closely and increases the overall sharpness of
the interpolated picture. In addition, reducing the error arising
from the decimation step in block 170 reduces the amount of data
needed to represent the residual image. The residual image is the
difference between the reconstructed image and the original
image.
[0109] The reconstruction weights output from the Reed Spline
Filter 148 form a "miniature" of the original source image 100 for
each primary color of red, green, and blue, wherein each red,
green, and blue miniature is one-quarter the resolution of the
original source image 100 when a tau of 2 is used.
[0110] More specifically, the preferred color space converter 150
transforms the R_tau2 miniature 180, the G_tau2 miniature 182 and
the B_tau2 miniature 184 output by the first Reed Spline Filter 148
into a different color coordinate system in which one component is
the luminance Y data 186 and the other two components are related
to the chrominance U and X data 188. The color space converter 150
transforms the RGB to the YUX color space according to the
following formulas:
1 Y = 0.29900R + 0.58700G + 0.11400B U = 0.16870R + 0.33120G +
0.50000B X = 0.50000R - 1.08216G + 0.91869B
[0111] Referring to FIG. 6, it can be seen that a R_tau2 miniature
180 corresponds to a miniature that is decimated and spline fitted
by a factor of 2. A G_tau2 miniature 182 corresponds to a green
miniature that is decimated and spline fitted by a factor of 2. A
B_tau2 miniature 184 corresponds to a blue miniature that is
decimated and spline fitted by a factor of 2.
[0112] FIG. 7 illustrates the color space converter 150 of FIG. 4.
The color space converter 150 transforms the R_tau2 miniature 180,
the G_tau2 miniature 182 and the B_tau2 miniature 184 output by the
first Reed Spline Filter 148 into a different color coordinate
system in which one component is the luminance Y data 186 and the
other two components are related to the chrominance U and X data
188 as shown in FIG. 4. Thus the color space converter 150
transforms the R_tau2 miniature 180, the G_tau2 miniature 182 and
the B_tau2 miniature 184 into a Y_tau2 miniature 190, a U_tau2
miniature 192 and an X_tau2 miniature 194.
[0113] Referring to FIG. 8, it can be seen that the second stage
128 of the encoder 102 includes an image classifier 152 that
determines the image type by analyzing the Y_tau2 miniature 190,
the U_tau2 miniature 192 and the X_tau2 miniature 194. The image
classifier 152 uses a fuzzy logic rule base to classify an image
into one or more of its known classes. In the preferred embodiment,
these classes include gray scale, graphics, text, photographs, high
activity and low activity images. The image classifier 152 also
decomposes the source image 100 into block units and classifies
each block. Since the source image 100 includes a combination of
different image types, the image classifier 152 sub-divides the
source image 100 into distinct regions. The image classifier 152
then outputs the control script 196 that specifies the correct
compression methods for each region. The control script 196
specifies which compression methods to apply in the third stage
130, and specifies the channel encoding methods to apply in the
fourth stage 132.
[0114] As shown in FIG. 4, during the third stage 130, the encoder
102 uses the control script 196 to select the optimal compression
methods from its compression toolbox. The encoder 102 separates the
Y data 186 from the U and X data 188. Thus, the encoder 102
separates the Y_tau2 miniature 190 from the U_tau2 miniature 192
and the X_tau2 miniature 194, and passes the Y_tau2 miniature 190
to the optimized DCT 136, and passes the U_tau2 miniature 192 and
the X_tau2 miniature 194 to a second and third Reed Spline Filter
156.
[0115] As illustrated in FIG. 9, the optimized DCT 136 subdivides
the Y_tau2 miniature 190 into a set of 8.times.8 pixel blocks and
transforms each 8.times.8 pixel block into sixty-four DCT
coefficients 198. The DCT coefficients include the AC terms 200 and
the DC terms 201. The DCT coefficients 198 are analyzed by the
optimized DCT 136 to determine optimal quantization step sizes and
reconstruction values. The optimized DCT 136 stores the optimal
quantization step sizes (uniform or non-uniform) in a quantization
table Q 202 and outputs the reconstruction values to the CS data
segment 204. The optimized DCT 136 then quantizes the DCT
coefficients 198 according to the quantization table Q 202. Once
quantized, the optimized DCT 136 outputs the DCT quantized values
206 to the DCT data segment 208.
[0116] In order to preserve the image information lost by the
optimized DCT 136, the DCT residual calculator 154 (shown in FIG.
10) computes and compresses the DCT residual. The DCT residual
calculator 154 dequantizes in a dequantizer 209 the DCT quantized
values 206 stored in the DCT data segment 208 by multiplying the
reconstruction values in the CS data segment 204 with the DCT
quantized values 206. The DCT residual calculator 154 then
reconstructs the dequantized DCT components with an inverse DCT 210
to generate a reconstructed dY_tau2 miniature 211. The
reconstructed dY_tau2 miniature 211 is subtracted from the original
Y_tau2 miniature 190 to create an rY_tau2 residual 212.
[0117] Referring to FIG. 11, it can be seen that the rY_tau2
residual 212 is further compressed with the AVQ 134. The technique
of vector quantization is used to represent a block of information
as a single index that requires fewer bits of storage. As explained
in more detail below, the AVQ 134 maintains a group of commonly
occurring block patterns in a set of codebooks 214 stored in the
resource file 160. The index references a particular block pattern
within a particular codebook 214. The AVQ 134 compares the input
block with the block patterns in the set of codebooks 214. If a
block pattern in the set of codebooks 214 matches or closely
approximates the input block, the AVQ 134 replaces the input block
pattern with the index.
[0118] Thus, the AVQ 134 compresses the input block information
into a list of indexes. The indexes are decompressed by replacing
each index with the block pattern each index references in the set
of codebooks 214. The decoder 110, as explained in more detail
below, also has a set of the codebooks 214. During the decoding
process the decoder 110 uses the list of indexes to reference block
patterns stored in a particular codebook 214. The original source
cannot be precisely recovered from the compressed representation
since the indexed patterns in the codebook will not match the input
block exactly. The degree of loss will depend on how well the
codebook matches the input block.
[0119] As shown in FIG. 11, the AVQ 134 compresses the rY_tau2
residual 212, by sub-dividing the rY_tau2 residual 212 into
4.times.4 residual blocks and comparing the residual blocks with
codebook patterns as explained above. The AVQ 134 replaces the
residual blocks with the codebook indexes that minimize the squared
error. The AVQ 134 outputs the list of codebook indexes to the VQ1
data segment 224. Thus, the VQ1 data segment 224 is a list of
codebook indexes that identify block patterns in the codebook. As
explained in more detail below, the AVQ 134 of the preferred
embodiment also generates new codebook patterns that the AVQ 134
outputs to the set of codebooks 214. The added codebook patterns
are stored in the VQCB data segment 223.
[0120] FIG. 12 illustrates a block diagram of the second Reed
Spline Filter 225 and third Reed Spline Filter 227. Once the image
classifier 152 determines the particular image type, the U_tau2
miniature 192 and the X_tau2 miniature 194 are further decimated
and filtered by the second Reed Spline Filter 225. Like the first
Reed Spline Filter 148 shown in FIG. 6, the second Reed Spline
Filter 225 compresses the U_tau2 miniature 192 and the X_tau2
miniature 194 in a two-step process. First, the U_tau2 miniature
192 and the X_tau2 miniature 194 are vertically and horizontally
decimated by a factor of two. The decimated data are then spline
fitted to determine optimal reconstruction weights that will
minimize the mean square error of the reconstructed decimated
miniatures. Once complete, the second Reed Spline Filter 225
outputs the optimal reconstruction values to create a U_tau4
miniature 226 and an X_tau4 miniature 228.
[0121] The third Reed Spline Filter 227 decimates the U_tau4
miniature 226 and the X_tau4 miniature 228 vertically and
horizontally by a factor of four. The decimated image data are
again spline fitted to create a U_tau16 miniature 230 and an
X_tau16 miniature 232.
[0122] In FIG. 13 the Reed Spline residual calculator 158 preserves
the image information lost by the second Reed Spline Filter 225 and
the third Reed Spline Filter 227 by computing and compressing the
Reed Spline Filter residual. The Reed Spline residual calculator
158 reconstructs the U_tau4 miniature 226 and X_tau4 miniature 228
by interpolating the U_tau16 miniature 230 and the X_tau16
miniature 232. The interpolated U_tau16 miniature 230 is referred
to as a dU_tau4 miniature 234. The interpolated X_tau16 miniature
232 is referred to as a dX_tau4 miniature 236. The dU_tau4
miniature 234 and dX_tau4 miniature 236 are subtracted from the
actual U_tau4 miniature 226 and X_tau4 miniature 228 to create an
rU_tau4 residual 238 and an rX_tau4 residual 240.
[0123] As illustrated in FIG. 11, the rU_tau4 residual 238 and the
rX_tau4 residual 240 are further compressed with the AVQ 134. The
AVQ 134 subdivides the rU_tau4 residual 238 and the rX_tau4
residual 240 into 4.times.4 residual blocks. The residual blocks
are compared with blocks in the set of codebooks 214 to find the
codebook patterns that minimize the squared error. The AVQ 134
compresses the residual block by assigning an index that identifies
the corresponding block pattern in the set of codebooks 214. Once
complete, the AVQ 134 outputs the compressed residual as the VQ3
data segment 242 and the VQ4 data segment 244.
[0124] The U_tau16 miniature 230 and the X_tau16 miniature 232 are
also compressed with the DPCM 140 as shown in FIG. 14. The DPCM 140
outputs the low-detail color components as the URCA data segment
246 and the XRCA data segment 248. The URCA data segment 246 and
the XRCA data segment 248 form the low-detail color components that
the decoder 110 uses to create the color thumbnail miniature 120 if
this is included as a playback option in the compressed data stream
118.
[0125] FIG. 15 illustrates the enhancement analyzer 144 of the
preferred embodiment. The Y_tau2 miniature 190, the U_tau4
miniature 226, and the X_tau4 miniature 228 are analyzed to
determine an enhancement list 250 that specifies the visual
priority of every 16.times.16 image block. The enhancement analyzer
144 determines the visual priority of each 16.times.16 image block
by convolving the Y_tau2 miniature 190, the U_tau4 miniature 226,
and the X_tau4 miniature 228 and comparing the result of the
convolution to a threshold value E 252. The threshold value E 252
is user defined. The user can set the threshold value E 252 from
zero to 200. The threshold value E 252 determines how much
enhancement information the encoder 102 adds to the compressed file
104. Thus, setting the threshold value E 252 to zero will suppress
any image enhancement information.
[0126] If the result of convolving a particular 16.times.16 high
resolution block is greater than the threshold value E 252, the
16.times.16 high-resolution block is prioritized and added to the
enhancement list 250. Thus the enhancement list 250 identifies
which 16.times.16 blocks are coded and prioritizes how the
16.times.16 coded blocks are listed.
[0127] The high resolution residual calculator 162, as shown in
FIG. 16, determines the high resolution residual for each
16.times.16 high resolution block identified in the enhancement
list 250. The high resolution residual calculator 162 translates
the VQ1 data segment 224 from the AVQ 134 into a reconstructed
rY_tau2 residual 212 by mapping the indexes in the VQ1 data segment
224 to the patterns in the codebook. The reconstructed rY_tau2
residual is added to the dY_tau2 miniature 254 (dequantized DCT
components). The result is interpolated by a factor of two in the
vertical and horizontal dimensions and is subtracted from the
original Y_tau2 190 miniature to form the high resolution
residual.
[0128] The high resolution residual calculator 162 then extracts
high resolution 16.times.16 blocks from the high resolution
residual according to the priorities in the enhancement list 250.
As will be explained in more detail below, the high resolution
residual calculator 162 outputs the highest priority blocks in the
first enhancement layer, the next-highest priority blocks in the
second enhancement layer, etc. The high resolution residual blocks
are referred to as the xr_Y residual 256.
[0129] The xr_Y residual 256 is further compressed with the AVQ
134. The AVQ 134 subdivides the xr_Y residual 256 into 4.times.4
residual blocks. The residual blocks are compared with blocks in
the codebook. If a residual block corresponds to a block pattern in
the codebook, the AVQ 134 compresses the 4.times.4 residual block
by assigning an index that identifies the corresponding block
pattern in the codebook. Once complete, the AVQ 134 outputs the
compressed high resolution residual to the VQ2 data segment
258.
[0130] FIG. 17 illustrates a block diagram of the palette selector
164. The palette selector 164 computes a "best-fit" 24-bit color
palette 260 for the decoder 110. The palette selector 164 is
optional and is user defined. The palette selector 164 computes the
color palette 260 from the Y_tau2 miniature 190, the U_tau2
miniature 192 and the X_tau2 miniature 194. The user can select a
number of palette entries N 262 to range from 0 to 255 entries. If
the user selects a zero, no palette is computed. If enabled, the
palette selector 164 adds the color palette 260 to a plurality of
data segments 166.
[0131] The channel encoder 168, as shown in FIG. 18, interleaves
and channel encodes the plurality of data segments 166. Based on
the user defined playback model 261, the plurality of data segments
166 are interleaved as follows: 1) as a single layer, single-pass
comprising the entire image, 2) as two layers comprising the
thumbnail miniature 120 and the remainder of the image 122 with
enhancement information interleaved into each data block (panel) in
the second layer, and 3) as multiple layers comprising the
thumbnail miniature 120, the standard image 124, the sharp image
105, and additional layers as specified by the user. For each
playback model an option exists to interleave the data for
panellized or non-panellized display. The user defined playback
model 261 is described in more detail below.
[0132] After interleaving the plurality of data segments 166, the
channel encoder 168 compresses the plurality of data segments 166
in response to the control script 196. In the preferred embodiment,
the channel encoder 168 compresses the plurality of data segments
166 with: 1) a Huffman encoding process that uses fixed tables, 2)
a Huffman process that uses adaptive tables, 3) a conventional LZ1
coding technique or 4) a run-length encoding process. The channel
encoder 168 chooses the optimal compression method based on the
image type identified in the control script 196.
[0133] The Adaptive Vector Ouantizer
[0134] The preferred embodiment of the AVQ 134 is illustrated in
FIG. 19. More specifically, the AVQ 134 optimizes the vector
quantization techniques described above. The AVQ 134 sub-divides
the image data into a set of 4.times.4 pixel blocks 216. The
4.times.4 pixel blocks 216 include sixteen (16) elements
X.sub.1,X.sub.2,X.sub.3 . . . X.sub.16 218, that start at the upper
left-hand corner and move left to right on every row to the bottom
right-hand corner.
[0135] The codebook 214 of the present invention comprises M
predetermined sixteen-element vectors, P.sub.1,P.sub.2,P.sub.3, . .
. P.sub.M 220, that correspond to common patterns found in the
population of images. The indexes I.sub.1,I.sub.2,I.sub.3, . . .
I.sub.M 222 refer respectively to the patterns
P.sub.1,P.sub.2,P.sub.3, . . . , P.sub.M 220.
[0136] Finding a best-fit pattern from the codebook 214 requires
comparing each input block with every pattern in the codebook 214
and selecting the index that corresponds to the pattern with the
minimum squared error summed over the 16 elements in the 4.times.4
block. The optimal code, C, for an input vector, X, is the index j
such that pattern P.sub.j satisfies: 1 i = 0 15 [ ( X i - P ij ) 2
16 ] = min P k P i = 0 15 [ ( X i - P ik ) 2 16 ]
[0137] where: X.sub.i is the ith element of the input vector, X and
P.sub.ik is the ith element of the VQ pattern P.sub.k.
[0138] The comparison equation finds the best match by selecting
the minimum error term that results from comparing the input block
with the codebook patterns. In other words, the AVQ 134 calculates
the mean squared error term associated with each pattern in the
codebook 214 in order to determine which pattern in the codebook
214 has the minimum squared error (also referred to as the minimum
error). The error term is the mean square error produced by
subtracting the pattern element P.sub.ik from the input block
element X.sub.i, squaring the result and dividing by sixteen
(16).
[0139] The process of searching for a matching pattern in the
codebook 214 is time-consuming. The AVQ 134 of the preferred
embodiment accelerates the pattern matching process with a variety
of techniques.
[0140] First, in order to find the optimal codebook pattern, the
AVQ 134 compares each input block term X.sub.i to the corresponding
term in the codebook pattern P.sub.j being tested and calculates
the total squared error for the first codebook pattern. This value
is stored as the initial minimum error. For each of the other
patterns P.sub.j=P.sub.2,P.sub.3, . . . , P.sub.M, the AVQ 134
subtracts the X.sub.1 and P.sub.1j terms and squares the result.
The AVQ 134 compares the resulting squared error to the minimum
error. If the squared error value is less than the minimum error,
the AVQ 134 continues with the next input term X.sub.2 and computes
the squared error associated with X.sub.2 and P.sub.2j. The AVQ 134
adds the result to the squared error of the first two terms. The
AVQ 134 then compares the accumulated squared error for X.sub.1 and
X.sub.2 to the minimum error. If the accumulated squared error is
less than the minimum error the squared error calculation continues
until the AVQ 134 has evaluated all 16 terms.
[0141] If at any time in the comparison, the accumulated squared
error for the new pattern is greater than the minimum squared
error, the current pattern is immediately rejected and the AVQ 134
discontinues calculating the squared error for the remaining input
block terms for that pattern. If the total squared error for the
new pattern is less than the minimum error, the AVQ 134 replaces
the minimum error with the squared error from the new pattern
before making the comparisons for the remaining patterns.
[0142] Also, if the accumulated squared error for a particular
codebook pattern is less than a pre-determined threshold, the
codebook pattern is immediately accepted and the AVQ 134 quits
testing other codebook patterns. Furthermore, the codebook patterns
in the present invention are ordered according to the frequency of
matches. Thus, the AVQ 134 begins by comparing the input block with
patterns in the codebook 214 that are most likely to match. Still
further, the codebook patterns are grouped by the sum of their
squared amplitudes. Thus the AVQ 134 selects a group of similar
codebook patterns by summing the squared amplitude of an input
block in order to determine which group of codebook patterns to
search.
[0143] Besides improving the time it takes for the AVQ 134 to find
an optimal codebook pattern, the AVQ 134 includes a set of
codebooks 214 that are adapted to the input blocks (i.e., codebooks
214 that are optimized for input blocks that contain DCT residual
values, high resolution residual values, etc.). Finally, the AVQ
134 of the preferred embodiment, adapts a codebook 214 to the
source image 100 by devising a set of new patterns to add to a
codebook 214.
[0144] Therefore, the AVQ 134 of the preferred embodiment has three
modes of operation: 1) the AVQ 134 uses a specified codebook 214,
2) the AVQ 134 selects the best-fit codebook 214, or 3) the AVQ 134
uses a combination of existing codebooks 214, and new patterns that
the AVQ 134 creates. If the AVQ 134 creates new patterns, the AVQ
134 stores the new patterns in the VQCB data segment 223.
[0145] The Compressed File Format
[0146] FIGS. 20a and 20b illustrate the segmented architecture of
the data stream 118 that results from transmitting the compressed
file 104. The segmented architecture of the compressed file 104 in
the preferred embodiment allows layering of the compressed image
data. Referring to FIG. 2, the layering of the compressed file 104
allows the decoder 110 to display the thumbnail miniature 120, the
splash image 122 and the standard image 124 before the entire
compressed file 104 is transferred. As the decoder 110 receives
each successive layer of components, the decoder 110 adds
additional detail to the displayed image.
[0147] In addition to layering the compressed data, the segmented
architecture allows the decoder 110 of the preferred embodiment: 1)
to move from one segment to the next in the stream without fully
decoding segments of data, 2) to skip parts of the data stream 118
that contain data that is unnecessary for a given rendition of the
image, 3) to ignore parts of the data stream 118 that are in an
unknown format, 4) to process the data in an order that is
configurable on the fly if the entire data stream 118 is stored
locally, and 5) to store different layers of the compressed file
104 separately from one another.
[0148] As shown in FIG. 20a, the byte arrangement of the data
stream 118 and the compressed file 104 includes a header segment
400 and a normal segment 402. The header segment 400 contains
header information, and the normal segment 402 contains data. The
header segment 400 is the first segment in the compressed file 104
and is the first segment transmitted with the data stream 118. In
the preferred embodiment, the header segment 400 is eight bytes
long.
[0149] As shown in FIG. 20b, the byte arrangement of the header
segment 400 includes a byte 0 406 and a byte 1 408 of the header
segment 400. Byte 0 406 and byte 1 408 of the header segment 400
identify the data stream 118. Byte 1 408 also indicates if the data
stream 118 contains image data (indicated by a "G") or if it
contains resource data (indicated by a "C"). Resource data includes
color lookup tables, font information, and vector quantization
tables.
[0150] Byte 2 410, byte 3 412, byte 4 414, byte 5 416, byte 6 418
and byte 7 420 of the header segment 400 specify which encoder 102
created the data stream 118. As new encoding methods are added to
the encoder 102, new versions of the encoder 102 will be sold and
distributed to decode the data encoded by the new methods. Thus, to
remain compatible with prior encoders 102, the decoder 110 needs to
identify which encoder 102 generated the compressed data. In the
preferred embodiment, byte 7 420 identifies the encoder 102 and
byte 2 410, byte 3 412, byte 4 414, byte 5 416, and byte 6 418 are
reserved for future enhancements to the encoder 102.
[0151] FIG. 21 illustrates the normal segment 402 as a sequence of
bytes that are logically separated into two sections: an identifier
section 422 and a data section 424. The identifier section 422
precedes the data section 424. The identifier section 422 specifies
the size of the normal segment 402, and identifies a segment type.
The data section 424 contains information about the source image
100.
[0152] The identification section 422 is a sequence of one, two, or
three bytes that identifies the length of the normal segment 402
and the segment type. The segment type is an integer number that
specifies the method of data encoding. The compressed file 104
contains 256 possible segment types. The data in the normal segment
402 is formatted according to the segment type. In the preferred
embodiment, the normal segments 402 are optimally formatted for the
color palette, the Huffman bitstreams, the Huffman tables, the
image panels, the codebook information, the vector dequantization
tables, etc.
[0153] For example, the file format of the preferred embodiment
allows the use of different Huffman bitstreams such as an 8-bit
Huffman stream, a 10-bit Huffman stream, and a DCT Huffman stream.
The encoder 102 uses each Huffman bitstream to optimize the
compressed file 104 in response to different image types. The
identification section 422 identifies which Huffman encoder was
used and the normal segment 402 contains the compressed data.
[0154] FIGS. 22a, 22b, 22c, and 22d illustrate the layering and
interleaving of the plurality of data segments 166 in the
compressed file 104 of the preferred embodiment. The plurality of
data segments 166 in the compressed file 104 are interleaved based
on the user defined playback model 261 as follows: 1) as a
single-pass, non-panellized image (FIG. 22a), 2) as a single-pass,
panellized image (FIG. 22b), 3) as two layers comprising the
thumbnail miniature 120, and the sharp image 125 (FIG. 22c) and 4)
as multiple layers comprising the thumbnail miniature 120, the
standard image 124, and the sharp image 125 (FIG. 22d).
[0155] Block diagram 426 in FIG. 22a shows the compressed file
format for the single-pass, non-panellized image. The compressed
file 104 begins with the header, the optional color palette and the
resource data such as the tables and Huffman encoding information.
The plurality of data segments 166 are not interleaved or layered.
Thus, the decoder 110 must receive the entire compressed file 104
before any part of the source image 100 can be displayed.
[0156] Block diagram 428 in FIG. 22b shows the compressed file 104
for the single-pass, panellized image. The plurality of data
segments 166 are interleaved panel-by-panel, so that all of the
segments for each panel are contiguously transmitted. The decoder
110 can expand and display a panel at a time until the entire
compressed file 104 is expanded.
[0157] Block diagram 430 in FIG. 22c shows the compressed file
format of the thumbnail miniature 120, the splash image 122 and the
final or sharp image 125. The plurality of data segments 166 are
interleaved panel-by-panel and the resolution components for the
thumbnail miniature 120 and splash image 122 exist in the first
layer, the panels for the final image exist in the second layer.
The first layer includes selected portions of the plurality of data
segments 166 that are needed to decode the panels of the thumbnail
miniature 120 and splash image 122. Thus, the compressed file 104
only stores the low detail color components (URCA data segment 246,
the XRCA data segment 248), the DC terms 201 and as many as the
first five AC terms 200 in the first layer. The number of AC terms
200 depends on the user-selected quality of the thumbnail miniature
120.
[0158] The plurality of data segments 166 in the first layer are
also interleaved panel-by-panel to allow the thumbnail miniature
120 and splash image 122 to be decoded a panel at a time. The
second layer contains the remaining plurality of data segments 166
needed to expand the compressed file 104 into the final image. The
plurality of data segments 166 in the second layer are also
interleaved panel-by-panel.
[0159] Block 432 in FIG. 22d shows the compressed file format of
the thumbnail image 120, the splash image 122, the layered standard
image 124, and the sharp image 125. The thumbnail miniature 120 and
splash image 122 are arranged in the first layer as described
above. The remaining data segments 166 are layered at different
quality levels. The multi-layering is accomplished by layering and
interleaving panel information associated with the VQ2 data segment
258 (high resolution residual). The multiple layers allow the
display of all the panels at a particular level of detail before
decoding the panels in the next layer.
[0160] The Decoder
[0161] FIG. 23 illustrates the decoder 110 of the present
invention. The decoder 110 takes as input the compressed data
stream 118 and expands or decodes it into an image for viewing on
the display 112. As explained above, the compressed file 104 and
the transmitted data stream 118 include image components that are
layered with a plurality of panels 433. The decoder 110 expands the
plurality of panels 433 one at a time.
[0162] As illustrated in FIG. 24, the decoder 110 expands the
compressed file 104 in four steps. In a first step 434, the decoder
110 expands the first layer of image data in the compressed file
104 or the data stream 118 into a Ym miniature 436, a Um miniature
438, and an Xm miniature 440. In a second step 442, the decoder 110
uses the Ym miniature 436, the Um miniature 438, and an Xm
miniature 440 to generate the thumbnail miniature 120, and the
splash image 122. In a third step 444, the decoder 110 receives a
second layer of image data and generates the higher detail panels
445 needed to expand the thumbnail miniature 120 into a standard
image 124, a fourth step 446 the decoder 110 receives a third layer
of image data to generate higher detail panels to enhance the
detail of the standard image in order to create an enhanced image
105 that corresponds to the source image 100.
[0163] FIG. 25 illustrates the elements of the first step 434 in
which the decoder 110 expands the AC terms 200, the DC terms 201,
the URCA data segment 246, and the XRCA data segment 248 into the
Ym miniature 436, the Um miniature 438, and Xm miniature 440. The
first step 434 includes an inverse Huffman encoder 458, an inverse
DPCM 476, a dequantizer 450, a combiner 452, an inverse DCT 476, a
demultiplexer 454, and an adder 456.
[0164] The decoder 110 then separates the DC terms 201 and the AC
terms 200 from the URCA data segment 246 and the XRCA data segment
248. The inverse Huffman encoder 458 decompresses the first layer
of the data stream 118 which includes the AC terms 200, the URCA
data segment 246, and the XRCA data segment 248. The inverse DPCM
476 further expands the DC terms 201 to output DC terms 201'. The
dequantizer 450 further expands the AC terms 200 to output AC terms
200' by multiplying the output AC terms 200' with the quantization
factors 478 in the quantization table Q 202 to output 8.times.8 DCT
coefficient blocks 482. The quantization table Q 202 is stored in
the CS data segment 204 (not shown).
[0165] The combiner 452 combines the output DC terms 201' with the
8.times.8 DCT coefficient blocks 482. The decoder 110 sets the
inverse DCT factor 480, and the inverse DCT 476 outputs the DCT
coefficient blocks 482 that correspond to the Ym miniature 436 that
is 1/256th the size of the original image.
[0166] The demultiplexer 454 separates the inverse Huffman encoded
URCA data segment 246 from the XRCA data segment 248. The inverse
DPCM 476 then expands the URCA data segment 246 and the XRCA data
segment 248 to generate the blocks that correspond to the Um
miniature 438 and the Xm miniature 440.
[0167] The adder 456 translates the blocks corresponding to the Um
miniature 438 and the Xm miniature 440 into blocks that correspond
to a Xm miniature 460.
[0168] FIG. 26 illustrates the second step 442 in which the decoder
110 expands the Ym miniature 436, the Um miniature 438, and the Xm
miniature 460 that the decoder 110 further includes the
interpolator 462 that operates on the Um miniature 436, the Um
miniature 438 and the Xm miniature 460. The interpolator 462 is
controlled by a Ym interpolation factor 484, a Um interpolation
factor 486, and a Xm interpolation factor 496. A scaler 466 is
controlled by a Ym scale factor 490, a Um scale factor 492, a Xm
scale factor 494. The decoder 110 further includes the replicator
464 and the inverse color converter. The interpolator 462 uses a
linear interpolation process to enlarge the Ym miniature 436, the
Um miniature 438, and the Xm miniature 460 by one, two or four
times in both the horizontal and vertical directions.
[0169] The Ym interpolation factor 484, the Um interpolation factor
486, and the Xm interpolation factor 488 control the amount of
interpolation. The size of the source image 100 in the compressed
file 104 is fixed, thus the decoder 110 may need to enlarge or
reduce the expanded image before display. The decoder 110 sets the
Ym interpolation factor 484 to a power of 2 (i.e., 1, 2, 4, etc.)
in order to optimize the decoding process. However, in order to
display an expanded image at the proper size, the scaler 466 scales
the interpolated image to accommodate different display
formats.
[0170] The interpolator 462 also expands the Um miniature 438 and
the Xm miniature 440. Like the Ym interpolation factor 484, the
decoder 110 sets the Um interpolation factor 486 and the Xm
interpolation factor 496 to a power of two. The decoder 110 sets
the Ym interpolation factor 484, and the Um interpolation factor
486 so that the Um miniature 438 and Xm miniature 460 approximate
the size of the interpolated and scaled Ym miniature 436.
[0171] After interpolation, the scaler 466 enlarges or reduces the
interpolated Ym miniature based on the Ym scale factor 490. In the
preferred embodiment, the decoder 110 sets the Ym interpolation
factor 484 so that the interpolated Ym miniature 436 is nearly
twice the size of the thumbnail miniature 120. The decoder 110 then
sets the Ym scale factor 490 to reduce the interpolated Ym
miniature 436 to the display size of the thumbnail miniature 120.
The scaler 466 interpolates the Um miniature 458 and the Xm
miniature 460 with the Um scale factor 492, and the Xm scale factor
494. The decoder 110 sets the Xm scale factor 494, the Um scale
factor 492, as necessary to scale the image to the display
size.
[0172] The inverse color converter 468 transforms the interpolated
and scaled miniatures into a red, green, and blue pixel array or a
palletized image as required by the display 112. When converting to
a palletized image, the inverse color converter 468 also dithers
the converted image. The decoder 110 displays the interpolated,
scaled and color converted miniatures as the thumbnail miniature
120.
[0173] In order to create the splash image 122, the decoder 110
expands the interpolated Ym miniature 436, the interpolated Um
miniature 438 and the interpolated Xm miniature 440 with a second
interpolation process that uses a Ym splash interpolation factor
498, a Um splash interpolation factor 500, and an Xm splash
interpolation factor 502. Like the thumbnail miniature 120, the
decoder 110 also sets the splash interpolation factors to a power
of two.
[0174] The interpolated data are then expanded with the replicator
464. The replicator 464 enlarges the interpolated data one or two
times by replicating the pixel information. The replicator 464
enlarges the interpolated data based on a Ym replication factor
504, a Um replication factor 506, and an Xm replication factor 508.
The decoder 110 sets the Ym replication factor 504, the Um
replication factor 506, and the Xm replication factor 508 so that
the replicated image is one-fourth of the display size.
[0175] The inverse color converter 468 transforms the replicated
image data into red, green and blue image data. The replicator 464
then again replicates the red, green, and blue image data to match
the display size. The decoder 110 displays the resulting splash
image 122 on the display 112.
[0176] FIG. 27 illustrates the third step 3 in which the decoder
110 generates the higher detail panels to expand the thumbnail
miniature 120 into a standard image 124. FIG. 27 also illustrates
the fourth step 446 in which the decoder 110 generates generate
higher detail panels to enhance the detail of the standard image in
order to create an enhanced image 105 that corresponds to the
source image 100.
[0177] The decoding of the standard image 124 and the enhanced
image 105 requires the inverse Huffman encoder 458, the combiner
452, the dequantizer 450, the inverse DCT 476, a pattern matcher
524, the adder 456, the interpolator 462, and an edge overlay
builder 516. The decoder 110 adds additional detail to the
displayed image as the decoder 110 receives new layers of
compressed data. The additional layers include new panels of the
DCT data segment 208 (containing the remaining AC terms 200'), the
VQ1 data segment 224, the VQ2 data segment 258, the enhancement
location data segment 510, the VQ3 data segment 242, and the VQ4
data segment 244.
[0178] The decoder 110 builds upon the Ym miniature 436, the Um
miniature 438 and the Xm miniature 440 calculated for the thumbnail
miniature 120 by expanding the next layer of image detail. The next
layer contains a portion of the DCT data segment 208, the VQ1 data
segment 224, the VQ2 data segment 258, the enhancement location
data segment 510, the VQ3 data segment 242, and the VQ4 data
segment 244 that correspond to the standard image.
[0179] The inverse Huffman encoder 458 decompresses the DCT data
segment 208 and the VQ1 data segment 224 (the DCT residual). The
combiner 452 combines the DCT information from the inverse Huffman
encoder 458 with the AC terms 200 and the DC terms 201. The
dequantizer 450 reverses the quantization process by multiplying
the DCT quantized values 206 with the quantization factors 478. The
dequantizer obtains the correct quantization factors 478 from the
quantization table Q 202. The dequantizer outputs 8.times.8 DCT
coefficient blocks 482 to the inverse DCT 476. The inverse DCT 476
in turn, outputs the 8.times.8 DCT coefficient blocks 482 that
correspond to a Y image 509 that is 1/4th the size of the original
image.
[0180] The pattern matcher 524 replaces the DCT residual blocks 512
by finding an index to a matching pattern block in the codebook
214. The adder 456 adds the DCT residual blocks 512 to the DCT
coefficient blocks 482 on a pixel by pixel basis. The interpolator
462 interpolates the output of the adder 456 by a factor of four to
create a full size Y image 520. The interpolator 462 performs
bilinear interpolation to enlarge the Y image 520 horizontally and
vertically.
[0181] The inverse Huffman encoder 458 decompresses the VQ2 data
segment 258 (the high resolution residual) and the enhancement
location data segment 510. The pattern matcher 524 uses the
codebook indexes to retrieve the matching pattern blocks stored in
the codebook 214 to expand the VQ2 data segment 258 to create
16.times.16 high resolution residual blocks 514. An enhancement
overlay builder 516 inserts the 16.times.16 high resolution
residual blocks into a Y image overlay 518 specified by the edge
location data segment 510. The Y image overlay 518 is the size of
the original image. The adder 456 adds the Y image overlay 518 to
the full sized Y image 520.
[0182] To calculate the full sized U image 522, the inverse Huffman
encoder 458 expands the VQ3 data segment 242. The pattern matcher
524 uses the codebook indexes to retrieve the matching pattern
blocks stored in the codebook 214 to expand the VQ3 data segment
242 into 4.times.4 rU_tau4 residual blocks 526. The interpolator
462 interpolates the Um miniature 438 by a factor of four and the
adder 456 adds the 4.times.4 rU_tau4 residual blocks 526 to the
interpolated Um miniature 438 in order to create a Um+r miniature
528. The interpolator 462 interpolates the Um+r miniature 528 by a
factor of four to create the full sized U image 522.
[0183] To calculate the full sized X image 530, the inverse Huffman
encoder 458 expands the VQ4 data segment 244. The pattern matcher
524 uses the codebook indexes to retrieve the matching pattern
blocks stored in the codebook 214 to expand the VQ4 data segment
244 into 4.times.4 rX_tau4 residual blocks. The decoder 110 then
translates the 4.times.4 rX_tau4 residual blocks 532 into 4.times.4
rV_tau4 residual blocks 534. The interpolator 462 interpolates the
Xm miniature 460 by a factor of four, and the adder 456 adds the
4.times.4 rV_tau4 residual blocks 534 to the interpolated Xm
miniature 460 in order to create a Xm+r miniature 536. The
interpolator 462 interpolates the Xm+r miniature 536 by a factor of
four to create the full sized X image 530.
[0184] The decoder stores the full sized Y image 520, the full
sized U image 522, and the full sized X image 530 in local memory.
The inverse color converter 468 then converts the full sized Y
image 520, the full sized U image 522, and the full sized X image
530 into a full sized red, green, and blue image. The panel is then
added to the displayed image. This process is completed for each
panel until the entire source image 100 is expanded.
[0185] In the forth step the decoder 110 receives the third image
layer and builds upon the full sized Y image 520, the full sized U
image 522, and the full sized X image 530 stored in local memory to
generate the enhanced image 105. The third image data layer
contains the remaining portion of the DCT data segment 208, the VQ1
data segment 224, the VQ2 data segment 258, the enhancement
location data segment 510, the VQ3 data segment 242, and the VQ4
data segment 244 that correspond to the enhanced image 105.
[0186] The decoder 110 repeats the process illustrated in FIG. 27
to generate a new full sized Y image 520, a new full sized U image
522, and a new full sized X image 530. The new full sized Y image
520 is added to the full sized Y image generated in the third step
444. The new full sized U image 522 is added to the full sized U
image 522 generated in the third step 444. The new full sized X
image 530 is added to the full sized X image generated in the third
step 444.
[0187] The inverse color converter 468 converts the full sized Y
image 520, the full sized U image 522, and the full sized X image
530 into a full sized red, green, and blue image. The panel is then
added to the displayed image. This process is completed for each
panel until the entire enhanced image 105 is expanded.
[0188] The inverse DCT 476 of the preferred embodiment is a
mathematical transformation for mapping data in the time (or
spatial) domain to the frequency domain, based on the "cosine"
kernel. The two dimensional version operates on a block of
8.times.8 elements.
[0189] Referring to FIG. 9, the compressed DCT coefficients 198 are
stored as DC terms 201 and AC terms 200. In the preferred
embodiment, the inverse DCT 476 as shown in FIGS. 25 and 27
combines the process of transformation and decimation in the
frequency and spatial domains (frequency and then spatial) into a
single operation in the frequency domain. The inverse DCT 476 of
the present invention provides at least a factor of 2 in
implementation efficiency and is utilized by the decoder 110 to
expand the thumbnail miniature 120 and splash image 122.
[0190] The inverse DCT 476 receives a sequence of DC terms 201 and
AC terms 200 which are frequency coefficients. The high frequency
terms are arbitrarily discarded at a predefined frequency to
prevent aliasing. The discarding of the high frequency terms is
equivalent to a low pass filter which passes everything below a
predefine frequency while attenuating all the high frequencies to
zero.
[0191] The equation for an inverse DCT is: 2 f y , x := 1 4 u v C u
C v F v , u cos ( 2 x + 1 16 ) u cos ( 2 y + 1 16 v )
[0192] where
u:=0 . . . 7 v:=0 . . . 7
x:=0 . . . 7 y:=0 . . . 7
[0193] 3 C u := 1 2 ( u = 0 ) + ( u 0 )
[0194] The inverse DCT 476 generates an 8.times.8 output matrix
that is decimated to a 4.times.4 matrix then to a 2.times.2 matrix.
The inverse DCT 476 then decimates the output matrix by subsampling
with a filter. After subsampling, an averaging filter smooths the
output. Smoothing is accomplished by using a running average of the
adjacent elements to form the output.
[0195] For example, for a 4.times.4 output matrix the 8.times.8
matrix from the inverse DCT 476 is sub-divided into sixteen
2.times.2 regions, and adjacent elements within each 2.times.2
region is averaged to form the output. Thus the sixteen regions
form a 4.times.4 matrix output.
[0196] For a 2.times.2 output matrix, the 8.times.8 matrix from the
inverse DCT 476 is sub-divided into four 4.times.4 regions. The
adjacent elements within each 4.times.4 matrix region are averaged
to form the output. Thus, the four regions form a 2.times.2 matrix
output.
[0197] In addition, since most of the AC coefficients are zero, the
inverse DCT 476 is simplified by combining the inverse DCT
equations with the averaging and the decimation equations. Thus,
the creation of a 2.times.2 output matrix where a given X is an
8.times.8 input matrix that consists of DC terms 201 and AC terms
200 is stated formally as: 4 X := [ X 0 , 0 X 0 , 1 0 0 0 0 0 0 X 1
, 0 X 1 , 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
[0198] All elements with i or j greater than 1 are set to zero. The
setting of the high frequency index to zero is equivalent to
filtering out the high frequency coef ficients from the signal.
[0199] Assigning Y as the, 2.times.2 output matrix, the decimated
output is thus equal to:
Y.sub.0,0:=X.sub.0,0+(k.sub.1.multidot.(X.sub.0,1))+(k.sub.1.multidot.(X.s-
ub.1,0))+(k.sub.2.multidot.(X.sub.1,1))
Y.sub.0,1:=X.sub.0,0-(k.sub.1.multidot.(X.sub.0,1))+(k.sub.1.multidot.(X.s-
ub.1,0))-(k.sub.2.multidot.(X.sub.1,1))
Y.sub.1,0:=X.sub.0,0+(k.sub.1.multidot.(X.sub.0,1))-(k.sub.1.multidot.(X.s-
ub.1,0))-(k.sub.2.multidot.(X.sub.1,1))
Y.sub.1,1:=X.sub.0,0-(k.sub.1.multidot.(X.sub.0,1))-(k.sub.1.multidot.(X.s-
ub.1,0))+(k.sub.2.multidot.(X.sub.1,1))
[0200] where 5 k 1 := 1 8 ( c ( 1 ) + c ( 3 ) + c ( 5 ) + c ( 7 ) )
c ( k ) = cos ( k 16 )
k.sub.2:=(k.sub.1).sup.2
[0201] The creation of a 4.times.4 output matrix where a given X is
an 8.times.8 input matrix that consists of DC terms 201 and AC
terms 200 is stated formally as:
[0202] All elements with i or j greater than 3 are set to zero.
[0203] It is possible to implement the calculations in the
2.times.2 case where the two dimensional equation is decomposed
downward; however, performing the one dimensional approach twice
reduces complexity and decreases the calculation time. In the
preferred embodiment, the inverse DCT 476 computes an additional
one-dimensional row inverse DCT, and then a one-dimensional 6 X :=
[ X 0 , 0 X 0 , 1 X 0 , 2 X 0 , 3 0 0 0 0 X 1 , 0 X 1 , 1 X 1 , 2 X
1 , 3 0 0 0 0 X 2 , 0 X 2 , 1 X 2 , 2 X 2 , 3 0 0 0 0 X 3 , 0 X 3 ,
1 X 3 , 2 X 3 , 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 ]
[0204] column inverse DCT.
[0205] The equation for a one dimensional case is as follows:
(1dout.sub.x are the elements of the one dimensional case)
1dout.sub.0:=in.sub.0+(k.sub.1.multidot.in.sub.1)+(k.sub.2.multidot.in.sub-
.2)+(k.sub.3.multidot.in.sub.3)
1dout.sub.1:=in.sub.0+(k.sub.4.multidot.in.sub.1)-(k.sub.2.multidot.in.sub-
.2)-(k.sub.5.multidot.in.sub.3)
1dout.sub.2:=in.sub.0-(k.sub.4.multidot.in.sub.1)-(k.sub.2.multidot.in.sub-
.2)+(k.sub.5.multidot.in.sub.3)
1dout.sub.3:=in.sub.0-(k.sub.1.multidot.in.sub.1)+(k.sub.2.multidot.in.sub-
.2)-(k.sub.3.multidot.in.sub.3)
[0206] 7 k 1 := c ( 1 ) + c ( 3 ) 2 k 2 := c ( 2 ) + c ( 6 ) 2 k 3
:= c ( 3 ) - c ( 7 ) 2 k 4 := c ( 5 ) + c ( 7 ) 2 k 5 := c ( 5 ) +
c ( 2 ) 2
[0207] where c(k) is defined as in the 2.times.2 output matrix.
[0208] The scaler 466 of the preferred embodiment is also shown in
FIG. 27. More specifically, the scaler 466 utilizes a generalized
routine that scales the image up or down while reducing aliasing
and reconstruction noise. Scaling can be described as a combination
of decimation and interpolation. The decimation step consists of
downsampling and using an anti-aliasing filter; the interpolation
step consists of pixel filling using a reconstruction filter for
any scale factor that can be represented by a rational number P/Q,
where P and Q are integers associated with the interpolation and
decimation ratios.
[0209] The scaler 466 decimates the input data by dividing the
source image into the desired number of output pixels and then
radiometrically weights the input data to form the necessary
output. FIG. 28 illustrates the scaler 466 with an input to output
ratio of five-to-three in the one dimensional case. Input pixel
P.sub.1 538, pixel P.sub.2 540, pixel P.sub.3 542, pixel P.sub.4
544, and pixel P.sub.2 546 contain different data values. The
output pixel X.sub.1 548, pixel X.sub.2 550, and pixel X.sub.3 552
are computed as follows:
X.sub.1=P.sub.1+(P.sub.2) (0.67)
X.sub.2=(P.sub.2) (0.33)+P.sub.3+(P.sub.4) (0.33)
X.sub.3=(P.sub.4) (0.66)+P.sub.5
[0210] The decimated data is then filtered with a reconstruction
filter and an area average filter. The reconstruction filter
interpolates the input data by replicating the pixel data. The area
average filter then area averages by integrating the area covered
by the output pixel.
[0211] If the output ratio is less than 1 (i.e, interpolation is
necessary), the interpolator 462 utilizes bilinear interpolation.
FIG. 29 illustrates the operation of the bilinear interpolation.
Input pixel A 554, input pixel B 556, input pixel C 558, and input
pixel D 560, and reference point X 562 are interpolated to create
output 564. For this example reference point X 562 is .alpha. to
the right of pixel A 554 and 1-.alpha. to the right of pixel C 558,
and reference point X 562 is .beta.down from pixel A 554 and
1-.beta. up from pixel B 556. Reference point X 562 is stated
formally as:
X=(1-.alpha.)*((1-.beta.)*A+62
*B)+.alpha.*((1-.beta.)*C+.beta.*D).
[0212] The Image Classifier
[0213] The preferred embodiment of the image classifier 152 is
illustrated in FIG. 8. More specifically, the image classifier 152
uses fuzzy logic techniques to determine which compression methods
will optimize the compression of various regions of the source
image 100. The image classifier 152 adds intelligence to the
encoder 102 by providing the means to decide, based on statistical
characteristics of the image, what "tools" (combinations of
compression methods) will best compress the image.
[0214] The source image 100 may include a combination of different
image types. For example, a photograph could show a person framed
in a graphical border, wherein the person is wearing a shirt that
contains printed text. In order to optimize the compression ratio
for the regions of the image that contain different image types,
the image classifier 152 subdivides the source image 100 and then
outputs the control script 196 that specifies the correct
compression methods for each region. Thus, the image classifier 152
provides a customized, "most-efficient" compression ratio for
multiple image types.
[0215] The image classifier 152 uses fuzzy logic to infer the
correct compression steps from the image content. Image content is
inherently "fuzzy" and is not amenable to simple discrete
classification. Images will thus tend to belong to several
"classes." For example, a classification scheme might include one
class for textual images and a second class for photographic
images. Since an image may comprise a photograph of a person
wearing a shirt containing printed text, the image will belong to
both classes to varying degrees. Likewise, the same image may be
high contrast, "grainy," black and white and/or high activity.
[0216] Fuzzy logic is a set-theoretic approach to classification of
objects that assigns degrees of membership in a particular class.
In classical set theory, an object either belongs to a set or it
does not; membership is either 100% or 0%. In fuzzy set theory, an
object can be partly in one set and partly in another. The
fuzziness is of greater significance when the content must be
categorized for the purpose of applying appropriate compression
techniques. Relevant categories in image compression include
photographic, graphical, noisy, and high-energy. Clearly the
boundaries of these sets are not sharp. A scheme that matches
appropriate compression tools to image content must reliably
distinguish between content types that require different
compression techniques, and must also be able to judge how to blend
tools when types requiring different tools overlap.
[0217] FIG. 30 illustrates the optimization of the compression
process. The optimization process analyzes the input image 600 at
different levels. In the top level analysis 602 the image
classifier 152 decomposes the image into a plurality of subimages
604 (regions) of relatively homogeneous content as defined by a
classification map 606. The image classifier 152 then outputs the
control script 196 that specifies which compression methods or
"tools" to employ in compressing each region. The compression
methods are further optimized in the second level analysis 608 by
the enhancement analyzer 144 which determines which areas of an
image are the most visually important (for example, text and strong
luminance edges). The compression methods are then further
optimized in the third level analysis 610 with the optimized DCT
156, AVQ 134, and adaptive methods in the channel encoder 168. The
second level analysis 608 and the third level analysis 610
determine how to adapt parameters and tables to a particular
image.
[0218] The fuzzy logic image classifier 152 provides adaptive
"intelligent" branching to appropriate compression methods with a
high degree of computational simplicity. It is not feasible to
provide the encoder 102 with an exhaustive mapping of all possible
combinations of inherently non-linear, discontinuous,
multidimensional inputs (image measurements) onto desired control
scripts 196. The fuzzy logic image classifier 152 reduces such an
analysis.
[0219] Furthermore, the fuzzy logic image classifier 152 ensures
that the encoder 102 makes a smooth transition from one compression
method (as defined by the control script 196) to another
compression method. As image content becomes "more like" one class
than another, the fuzzy controller avoids the discrete switching
from one compression method to another compression method.
[0220] The fuzzy logic image classifier 152 receives the image data
and determines a set of image measurements which are mapped onto
one or more input sets. The image classifier 152 in turn maps the
input sets to corresponding output sets that identify which
compression methods to apply. The output sets are then blended
("defuzzified") to generate a control script 196. The process of
mapping the input image to a particular control script 196 thus
requires three sets of rules: 1) rules for mapping input
measurements onto input sets (e.g., degree of membership with the
"high activity" input set=F[average of AC coefficients 56-63]); 2)
rules for mapping input sets onto output sets (e.g., if graphical
image, use DCT quantization table 5 and 3) rules for
defuzzification that mediate between membership of several output
sets, i.e., how the membership of more than one output sets, should
be blended to generate a single control script 196 that controls
the compression process.
[0221] Still further, the fuzzy logic rule base is easily
maintained. The rules are modular. Thus, the rules can be
understood, researched, and modified independently of one another.
In addition, the rule bases are easily modified allowing new rules
to make the image classifier 152 more sensitive to different types
of image content. Furthermore, the fuzzy logic rule base is
extendable to include additional image types specified by the user
or learned using neural network or genetic programming methods.
[0222] FIG. 31 illustrates a block diagram of the image classifier
152. In block 612 the image classifier 152 determines a set of
input measurements 614 that correspond to the source image 100. In
order to determine the input measurements 614, the image classifier
152 sub-divides the source image 100 into a plurality of blocks. To
conserve computations, the user can enable the image classifier 152
to select a random sample of the plurality of blocks to use as the
basis of the input measurements 614.
[0223] The image classifier 152 determines the set of input
measurements 614 from the plurality of blocks using a variety of
methods. The image classifier 152 calculates the mean, the
variance, and a histogram of all three color components. The image
classifier 152 performs a discrete cosine transform of the image
blocks to derive a set of DCT components wherein each DCT
coefficient is histogrammed to provide a frequency domain profile
of the imputed image. The image classifier 152 performs special
convolutions to gather information about edge content, texture
content, and the efficacy of the Reed Spline Filter. The image
classifier 152 derives spatial domain blocks and matches the
spatial domain blocks with a special VQ-like pattern list to
provide information about the types of activity contained in the
picture. Finally, the image classifier scans the image for common
and possibly localized features that bear on the compressibility of
the image (such as typed text or scanning artifacts).
[0224] In block 616 the image classifier 152 analyzes the input
measurements 614 generated in block 612 to determine the extent to
which the source image 100 belongs to one of the fuzzy input sets
618 within the input rule base 620. The input rule base 620
identifies the list of image types. In the preferred embodiment,
the image classifier 152 contains input sets 618 for the following
image types: scale, text, graphics, photographic, color depth,
degree of activity, and special features.
[0225] Membership in the activity input set and the scale image
input set are determined by the input measurements 614 for the DCT
coefficient histogram, the spatial statistics, and the
convolutions. Membership in the text image input set and the
graphic input set correspond to the input measurements 614 for a
linear combination of high frequency DCT coefficients and gaps in
the luminance histogram. The photographic input set is the
complement of the graphic input set.
[0226] The color depth input set includes four classifications:
gray scale images, 4-bit images, 8-bit images and 24-bit images.
The color depth input corresponds to the input measurements 614 for
the Y, U and X color components. A small dynamic range in the U and
X color components indicates that the picture is likely to be a
gray scale image, while gaps in the Y component histogram reveals
whether the image was once a palettized 4-bit or 8-bit image.
[0227] The special feature input set corresponds to the input
measurements 614 for the common or localized features that bear on
the compressibility of the image. Thus the special feature input
set identifies such artifacts as black borders caused by inaccurate
scanning and graphical titling on a photographic image.
[0228] In block 622 the image classifier 152 maps the input sets
618 onto output sets 624 according to the output rule base 626. The
image classifier 152 applies the output rule base 626 to map each
input set 618 onto membership of each fuzzy output set 624. The
output sets 624 determine, for example, how many CS terms are
stored in the CS data segment 204 and the optimization of the VQ1
data segment 224, the VQ2 data segment 258, the VQ3 data segment
242, the VQ4 data segment 244, and the number of VQ patterns to
use. The output sets also determine whether the encoder 102
performs an optimized DCT 136 and which quantization tables Q 202
to apply.
[0229] For the second Reed Spline Filter 225 and the third Reed
Spline Filter 227, the output sets 624 adjust the decimation factor
tau and the orientation of the kernal function. Finally, the output
sets determine whether the channel encoder 168 utilizes a fixed
Huffman encoder, and adaptive Huffman encoder or an LZ1. FIG. 33
illustrates several examples of mapping from input measurements 614
to input sets 618 to output sets 624.
[0230] Referring to FIG. 31, in block 626 the image classifier
constructs a classification map 628 based upon membership within
the output sets. The classification map 628 identifies independent
regions in the source image 100 that are independently compressed.
Thus the image classifier 152 identifies the regions of the image
that belong to compatible output sets 624. These are regions that
contain relatively homogenous image contrast and call for one
method or set or complementary methods to be applied to the entire
region.
[0231] In block 630 the image classifier 152 converts
(defuzzifies), based on the defuzzification rule base 632, the
membership of the fuzzy output sets 624 of each independent region
in order to generate the control script 196. The control script 196
contains instructions for which compression methods to perform and
what parameters, tables, and optimization levels to employ for a
particular region of the source image 100.
[0232] The Enhancement Analyzer
[0233] The preferred embodiment of the enhancement analyzer 144 is
illustrated in FIGS. 4, 15 and 30. More specifically, the
enhancement analyzer 144 examines the Y_tau2 miniature 190, the
U_tau2 miniature 192, and the X_tau4 miniature 228 to determine the
enhancement priority of image blocks that correspond to 16.times.16
blocks in the original source image 100. The enhancement analyzer
144 prioritizes the image blocks by 1) calculating the mean of the
Y_tau2 miniature 190, the U_tau2 miniature 192, and the X_tau4
miniature 228, and 2) testing every color block against a
normalized threshold value E 252 for the Y_tau2 miniature 190, the
U_tau2 miniature 192, and the X_tau4 miniature 228. A list of
blocks that exceed the threshold value E 252 are added to the
enhancement list 250.
[0234] The enhancement analyzer 144 determines a threshold value
E.sub.Y for the Y_tau2 miniature 190, a threshold value E.sub.U for
the U_tau2 miniature 192, and a threshold value E.sub.X for the
X_tau4 miniature 228. Once the enhancement analyzer 144 computes
the threshold value E.sub.Y, the threshold value F.sub.U and the
threshold value E.sub.X, the enhancement analyzer 144 tests each
8.times.8 Y_tau2 block, each 4.times.4 U_tau4 block and each
4.times.4 X_tau4 block (each block corresponds to a 16.times.16
block in the source image 100) as follows:
[0235] Every pixel in the test block is convolved with the
following filter masks:
M.sub.1={-1,-2,-1,0,0,0,1,2,1}
M.sub.2={1,0,-1,2,0,-2,1,0,-1}
[0236] to compute two statistics S.sub.1 and S.sub.2.
[0237] Masks M.sub.1 and M.sub.2 are convolved with a three by
three block of pixels centered on the pixel being tested. The three
by three block of pixels is represented as:
x.sub.11x.sub.12x.sub.13
x.sub.21x.sub.22x.sub.23
x.sub.31x.sub.32x.sub.33
[0238] where the pixel x.sub.22 is the pixel being tested. Thus the
statistics are calculated with the following equations:
S.sub.1=(-1.multidot.x.sub.11)-(2.multidot.x.sub.12)-(1.multidot.x.sub.13)-
+(1.multidot.x.sub.31)+(2.multidot.x.sub.32)+(1.multidot.x.sub.33)
[0239]
S.sub.2=(1.multidot.x.sub.11)-(1.multidot.x.sub.13)+(2.multidot.x.s-
ub.21)-(1.multidot.x.sub.23)+(1.multidot.x.sub.31)-(1.multidot.x.sub.33)
[0240] If S.sub.1 plus S.sub.2 is greater than the threshold value
E.sub.Y for a particular 8.times.8 Y_tau2 block, the enhancement
analyzer 144 adds the 8.times.8 Y_tau2 block to the enhancement
list 250. If S.sub.1 plus S.sub.2 is greater than the threshold
value E.sub.U for a particular 4.times.4 U_tau4 block, the
enhancement analyzer 144 adds the 4.times.4 U_tau4 block to the
enhancement list 250. If S.sub.1 plus S.sub.2 is greater than the
threshold value E.sub.X for a particular 4.times.4 X_tau4 block the
enhancement analyzer 144 adds the 4.times.4 X_tau4 block to the
enhancement list 250.
[0241] In addition to the enhancement list 250, the enhancement
analyzer 144 also uses the DCT coefficients 198 to identify
visually unimportant "texture" regions where the compression ratio
can be increased without significant loss to the image quality.
[0242] Optimized DCT
[0243] The preferred embodiment of the optimized DCT 136 is
illustrated in FIG. 9. More specifically, the optimized DCT 136
uses the quantization table Q 202 to assign the DCT coefficients
(DC terms 200 and AC terms 201) quantization step values. In
addition, the quantization step values in the quantization table Q
202 vary depending on the optimized DCT 136 operation mode. The
optimized DCT 136 operates in four DCT modes as follows: 1)
switched fixed uniform DCT quantization tables that correspond to
image classification, 2) optimal reconstruction values, 3) adaptive
uniform DCT quantization tables, and 4) adaptive non-uniform DCT
quantization tables.
[0244] The fixed DCT quantization tables are tuned to different
image types, including eight standard tables corresponding to
images differing along three dimensions: photographic versus
graphic, small-scale versus large-scale, and high-activity versus
low-activity. In the preferred embodiment, additional tables can be
added to the resource file 160 (not shown).
[0245] The control script 196 defines which standard table the
optimized DCT 136 uses in the fixed-table DCT mode. In the
fixed-table mode, quantized step values for each DCT coefficient is
obtained by linearly quantizing each x.sub.i DCT coefficient with
the quantization value q.sub.i in quantization table Q. The
mathematical relationship for the quantization procedure is:
[0246] for i=0, 1, . . . , 63
[0247] if x.sub.i>=0, 8 c i = [ x i + q i 2 ] q i
[0248] if x.sub.i<0, 9 c i = [ x i + q i 2 ] q i
[0249] Reconstruction is also linear unless reconstruction values
have been computed and stored in the CS data segment 204. Letting r
denote the dequantized DCT coefficients, the linear dequantization
formula is:
[0250] for i=0, 1, . . . , 63
r.sub.i=c.sub.i.multidot.q.sub.i
[0251] In the fixed-table DCT mode, the optimized DCT 136 can also
compute the optimal reconstruction values stored in the CS data
segment 204. While the DC term 201 is always calculated linearly,
the CS reconstruction values represent the conditional expected
value of each quantized level of each AC term 200. The CS
reconstruction values are calculated for each AC term 200 by first
calculating an absolute value frequency histogram, H.sub.i for the
ith coefficient (for i=1, 2, . . . , 63) over all DCT blocks in the
source image, N, as follows:
[0252] for j=0, 1, . . . , N
[0253] H.sub.i(k)=frequency (abs(x.sub.ij)=k)
[0254] where x.sub.ij=the value of the ith coefficient in the jth
DCT block.
[0255] Second, the centroid of coefficient values is calculated
between each quantization step. The formula for the centroid of the
ith coefficient in the kth quantization interval is: 10 CS i ( k )
= j = kq - q 2 kq + q 2 [ H i ( j ) T i ( k ) ]
[0256] where 11 T i ( k ) = j = kq - q 2 kq + q 2 H ( j )
[0257] This provides a non-linear mapping of quantized coefficients
onto reconstructed values as follows:
r.sub.i=CS.sub.i(q.sub.i) for i=1, 2, . . . , 63
[0258] In the adaptive uniform DCT quantization mode, the image the
classifier 152 outputs the control script 196 that directs the
optimized DCT 136 to adjust a given DCT uniform quantization table
Q 202 to provide more efficient compression while holding the
visual quality constant. This method adjusts the DCT quantization
step sizes such that the compressed bit rate (entropy) after
quantizing the DCT coefficients is minimized subject to the
constraint that the visually-weighted mean squared error arising
from the DCT quantization is held constant with respect to the base
quantization table and the user-supplied quantization parameter
L.
[0259] The optimized DCT 136 uses marginal analysis to adjust the
DCT quantization step sizes. A "marginal rate of transformation
(MRT)" is computed for each DCT coefficient. The MRT represents the
rate at which bits are "transformed" into (a reduction of) the
visually weighted mean squared error (VMSE). The MRT of a
coefficient is defined as the ratio of 1) the marginal change in
the encoded bit rate with respect to a quantization step value q to
2) the marginal change in the visual mean square error with respect
to the quantization step value q.
[0260] MRT (bits/VMSE) ratio is calculated as follows:
[0261] MRT
(bits/VMSE)=((.DELTA.bits/.DELTA.q)/I(.DELTA.VMSE/.DELTA.q)).
[0262] Increasing the quantization step value q will add more bits
to the representation of the corresponding DCT coefficient.
However, adding more bits to the representation of a DCT
coefficient will reduce the VMSE. Since the bits added to the step
value q are usually transformed into VMSE reduction, the MRT is
generally negative.
[0263] The MRT is calculated for all of the DCT coefficients. The
adaptive method utilized by the optimized DCT 136 adjusts the
quantization step values q of the quantized table Q 202 by reducing
the quantization step value q corresponding to the maximum MRT and
increasing the quantization step value q corresponding to the
minimum MRT. The optimized DCT 136 repeats the process until the
MRT is equalized across all of the DCT coefficients while holding
the VMSE constant.
[0264] FIG. 32 shows a flow chart of the process of creating an
adaptive uniform DCT quantization table. In a step 700 the
optimized DCT 136 computes the MRT values for all DCT coefficients
i. In step 702 the optimized DCT 136 compares the MRT values, if
the MRT values are the same, the optimized DCT 136 uses the
resulting quantization table Q 202. If the MRT values are not
equal, the optimized DCT 136 finds the minimum MRT value and the
maximum MRT value for the DCT coefficients i in step 706.
[0265] In step 708, the optimized DCT 136 increases the
quantization step value q.sub.low corresponding to the minimum MRT
value and decreases the quantization step value q.sub.high
associated with the maximum MRT value. Increasing q.sub.low which
reduces the number of bits devoted to the corresponding DCT
coefficient but does not increase VMSE appreciably. Reducing the
quantization step value q.sub.high increases the number of bits
devoted to the corresponding dCT coefficient and reduces the VMSE
significantly. The optimized DCT 136 offsets the adjustments for
the quantization step values q.sub.low and q.sub.high in order to
keep the VMSE constant.
[0266] The optimized DCT 136 returns to step 700, where the process
is repeated until all MRT values are equal. Once all of the
quantization step values q are determined the resulting
quantization table Q 202 is complete.
[0267] The Reed Spline Filter
[0268] FIGS. 34-57 illustrate a preferred embodiment of the Reed
Spline Filter 138 which is advantageously used for the first,
second and third Reed Spline Filters 148, 225, and 227. The Reed
Spline Filter described in FIG. 34-57 is in terms of a generic
image format. In particular the image input data comprises Y image
input which corresponds for example to the red, green and blue
image data in the first Reed Spline Filter 148 in the foregoing
discussion. In like manner the outputs of the Reed Spline Filter
138 described as reconstruction values should be understood to
correspond, for example, to the R_tau2 miniature 180, the G_tau2
miniature 182 and the B_tau2 miniature 184 of the first Reed Spline
Filter 138.
[0269] The Reed Spline Filter is based on the a least-mean-square
error (LMS)-error spline approach, which is extendable to N
dimensions. One- and two-dimensional image data compression
utilizing linear and planar splines, respectively, are shown to
have compact, closed-form optimal solutions for convenient,
effective compression. The computational efficiency of this new
method is of special interest, because the
compression/reconstruction algorithms proposed herein involve only
the Fast Fourier Transform (FFT) and inverse FFT types of
processors or other high-speed direct convolution algorithms. Thus,
the compression and reconstruction from the compressed image can be
extremely fast and realized in existing hardware and software. Even
with this high computational efficiency, good image quality is
obtained upon reconstruction. An important and practical
consequence of the disclosed method is the convenience and
versatility with which it is integrated into a variety of hybrid
digital data compression systems.
[0270] I. SPLINE FILTER OVERVIEW
[0271] The basic process of digital image coding entails
transforming a source image X into a "compressed" image Y such that
the signal energy of Y is concentrated into fewer elements than the
signal energy of X, with some provisions regarding error. As
depicted in FIG. 34, digital source image data 1002 represented by
an appropriate N-dimensional array X is supplied to compression
block 1004, whereupon image data X is transformed to compressed
data Y' via a first generalized process represented here as G
(X)=Y'. Compressed data may be stored or transmitted (process block
1006) to a "remote" reconstruction block 1008, whereupon a second
generalized process, G'(Y')=X', operates to transform compressed
data Y' into a reconstructed image X'.
[0272] G and G' are not necessarily processes of mutual inversion,
and the processes may not conserve the full information content of
image data X. Consequently, X' will, in general, differ from X, and
information is lost through the coding/reconstruction process. The
residual image or so-called residue is generated by supplying
compressed data Y' to a "local" reconstruction process 1005
followed by a difference process 1010 which computes the residue
.DELTA.X=X-X' 1012. Preferably, X and X' are sufficiently close, so
that the residue .DELTA.X 1012 is small and may be transmitted,
stored along with the compressed data Y', or discarded. Subsequent
to the remote reconstruction process 1008, the residue .DELTA.X
1012 and reconstructed image X' are supplied to adding process 1007
to generate a restored image X'+.DELTA.X=X" 1003.
[0273] In practice, to reduce computational overhead associated
with large images during compression, a decimating or subsampling
process may be performed to reduce the number of samples.
Decimation is commonly characterized by a reduction factor .tau.
(tau), which indicates a measure of image data elements to
compressed data elements. However, one skilled in the art will
appreciate that image data X must be filtered in conjunction with
decimation to avoid aliasing. As shown in FIG. 35, a low-pass input
filter may take the form of a pointwise convolution of image data X
with a suitable convolution filter 1014, preferably implemented
using a matrix filter kernel. A decimation process 1016 then
produces compressed data Y', which is substantially free of
aliasing prior to subsequent process steps. While the convolution
or decimation filter 1014 attenuates aliasing effects, it does so
by reducing the number of bits required to represent the signal. It
is "low-pass" in nature, reducing the information content of the
reconstructed image X'. Consequently, the residue .DELTA.X 1012
will be larger, and in part, will offset the compression attained
through decimation.
[0274] The present invention disclosed herein solves this problem
by providing a method of optimizing the compressed data such that
the mean-square-residue <.DELTA.X.sup.2> is minimized, where
"< >" shall herein denote an averaging process. As shown in
FIG. 36, compressed data Y', generated in a manner similar to that
shown in FIG. 35, is further processed by an optimization process
1018. Accordingly, the optimization process 1018 is dependent upon
the properties of convolution filter 1014 and is constrained such
that the variance of the mean-square-residue is zero,
.delta.<.DELTA.X.sup.2>=0. The disclosed method of filter
optimization "matches" the filter response to the image data,
thereby minimizing the residue. Since the decimation filter 1014 is
low-pass in nature, the optimization process 1018, in part,
compensates by effectively acting as a "self-tuned" high-pass
filter. A brief descriptive overview of the optimization procedure
is provided in the following sections.
[0275] A. Image Approximation by Spline Functions
[0276] As will become clear in the following detailed description,
the input decimation filter 1014 of FIG. 36 may be regarded as a
projection of an image data vector X onto a set of basis functions
that constitute shifted, but overlapping, spline functions
{.psi..sub.k(x)} such that 12 X _ X _ ' = k k k ( x _ ) ,
[0277] where X' is the reconstructed image vector and .chi..sub.k
is the decomposition weight. The image data vector X is thus
approximated by an array of preferably computationally simple,
continuous functions, such as lines or planes, allowing also an
efficient reconstruction of the original image.
[0278] According to the method, the basis functions need not be
orthogonal and are preferably chosen to overlap in order to provide
a continuous approximation to image data, thereby rendering a
non-diagonal basis correlation matrix:
A.sub.jk=.psi..sub.j(x).multidot..psi..sub.k(x).
[0279] This property is exploited by the method of the present
invention, since it allows the user to "adapt" the response of the
filter by the nature and degree of cross-correlation. Furthermore,
the basis of spline functions need not be complete in the sense of
spanning the space of all image data, but preferably generates a
close approximation to image X. It is known that the decomposition
of image vector X into components of differing spline basis
functions {.psi..sub.k(X) } is not unique. The method herein
disclosed optimizes the projection by adjusting the weights
.chi..sub.k such that the differential variations of the average
residue vanishes, .delta.<.DELTA.X.sup.2>=0, or equivalently
<.DELTA.X.sup.2>=min. In general, it will be expected that a
more complete basis set will provide a smaller residue and better
compression, which, however, requires greater computational
overhead and greater compression. Accordingly, it is preferable to
utilize a computationally simple basis set, which is easy to
manipulate in closed form and which renders a small residual image.
This residual image or residue .DELTA.X is preferably retained for
subsequent processing or reconstruction. In this respect there is a
compromise between computational complexity, compression, and the
magnitude of the residue.
[0280] In a schematic view, a set of spline basis functions
S'={.psi..sub.k} may be regarded as a subset of vectors in the
domain of possible image vectors S={X}, as depicted in FIG. 37. The
decomposition on projection of X onto components of S' is not
unique and may be accomplished in a number of ways. A preferable
criterion set forth in the present description is a
least-mean-square (LMS) error, which minimizes the overall
difference between the source image X and the reconstructed image
X'. Geometrically, the residual image .DELTA.X can be thought of as
a minimal vector in the sense that it is the shortest possible
vector connecting X to X'. That is, .DELTA.X might, for instance,
be orthogonal to the subspace S', as shown in FIG. 37. As it will
be elaborated in the next section, the projection of image vector X
onto S' is approximated by an expression of the form: 13 X _ = X _
' = k k k ( x _ )
[0281] The "best" X' is determined by the constraint that
.DELTA.X=X-X' is minimized with respect to variations in the
weights .chi..sub.j: 14 j x _ 2 = j ( X _ - k k k ( x _ ) ) 2 = 0
,
[0282] which by analogy to FIG. 37, described an orthogonal
projection of X onto S'.
[0283] Generally, the above system of equations which determines
the optimal .chi..sub.k may be regarded as a linear transformation,
which maps X onto S' optimally, represented here by:
A (.chi..sub.k)=X*.psi..sub.k(X)
[0284] where A.sub.ij=.psi..sub.i*.psi..sub.j is a transformation
matrix having elements representing the correlation between bases
vectors .psi..sub.i and .psi..sub.j. The optimal weights
.psi..sub.k are determined by the inverse operation A.sup.-1:
.chi..sub.k=A.sup.-1(X*.psi..sub.k (x)),
[0285] rendering compression with the least residue. One skilled in
the art of LMS criteria will know how to express the processes
given here in the geometry of multiple dimensions. Hence, the
processes described herein are applicable to a variety of image
data types.
[0286] The present brief and general description has direct
processing counterparts depicted in FIG. 36. The operation
X*.psi..sub.k(x)
[0287] represents a convolution filtering process 1014, and
A.sup.-1(X*.psi..sub.k(x))
[0288] represents the optimizing process 1018.
[0289] In addition, as will be demonstrated in the following
sections, the inverse operation A.sup.-1 is equivalent to a
so-called inverse eigenfilter when taken over to the conjugate
image domain. Specifically, 15 DFT k = 1 m DFT ( X _ k ( x _ ) )
,
[0290] where DFT is the familiar discrete Fourier transform (DFT)
and .lambda..sub.m are the eigenvalues of A. The equivalent
optimization block 1018, shown in FIG. 38, comprises three steps:
(1) a discrete Fourier transformation (DFT) 1020; (2) inverse
eigenfiltering 1022; and (3) an inverse discrete Fourier
transformation (DFT.sup.-1) 1024. The advantages of this
embodiment, in part, rely on the fast coding/reconstruction speed,
since only DFT and DFT.sup.-1 are the primary computations, where
now the optimization is a simple division. Greater elaboration into
the principles of the method are provided in Section II where also
the presently contemplated preferred embodiments are derived as
closed form solutions for a one-dimensional linear spline basis and
two-dimensional planar spline bases. Section III provides an
operational description for the preferred method of compression and
reconstruction utilizing the optimal procedure disclosed in Section
II. Section IV discloses results of a reduction to practice of the
preferred embodiments applied to one- and two-dimensional images.
Finally, Section V discloses a preferred method of the filter
optimizing process implemented in the image domain.
[0291] II. IMAGE DATA COMPRESSION BY OPTICAL SPLINE
INTERPOLATION
[0292] A. One-Dimensional Data Compression by LMS-Error Linear
Splines
[0293] For one-dimensional image data, bi-linear spline functions
are combined to approximate the image data with a resultant linear
interpolation, as shown in FIG. 39. The resultant closed-form
approximating and optimizing process has a significant advantage in
computational simplicity and speed.
[0294] Letting the decimation index .tau. and image sampling period
t be fixed, positive integers .tau., t=1,2, . . . , and letting
X(t) be a periodic sequence of data of period n.tau., where n is
also an integer, consider a periodic, linear spline 1014 of period
n.tau. of the type,
F(t)=F(t+n.tau.), (1)
[0295] where (2)
[0296] as shown by the functions .psi..sub.k(t) 1014 of FIG.
39.
[0297] The family of shifted linear splines F(t) is defined as
follows:
.psi..sub.k(t)=F(t-k.tau.) for (k=0,1,2, . . . , (n-1)). (3)
[0298] One object of the present embodiment is to approximate X(t)
by the n-point sum: 16 S ( t ) = k = 0 n - 1 X k k ( t ) , ( 4
)
[0299] in a least-mean-squares fashion where X.sub.0, . . . ,
X.sub.n-1 are n reconstruction weights. Observe that the two-point
sum in the interval 0<t<.tau. is: 17 L ( X 0 , X 1 , , X n -
1 ) = t = - n [ X ( t ) - k = 0 n - 1 X k k ( t ) ] 2 , ( 6 )
[0300] Hence, S(t) 1030 in Equation 4 represents a linear
interpolation of the original waveform X(t) 1002, as shown in FIG.
39.
[0301] To find the "best" weights X.sub.0, . . . , X.sub.n-1, the
quality L (X.sub.0,X.sub.1, . . . , X.sub.n-1) is minimized: 18 X 0
0 ( t ) + X 1 1 ( t ) = X 0 ( 1 - t ) + X 1 ( 1 - t - ) = X 0 + ( X
1 - X 0 ) t , ( 5 )
[0302] where the sum has been taken over one period plus .tau. of
the data. X.sub.k is minimized by differentiating as follows: 19 L
X j = t = n 2 [ X ( t ) - k = 0 n - 1 X k k ( t ) ] j ( t ) = 2 [ t
= - n X ( t ) j ( t ) - k = 0 n - 1 X k t = - n k ( t ) j ( t ) ]
0. ( 7 )
[0303] This leads to the system, 20 k = 0 n - 1 A jk X k = Y j , (
8 )
[0304] of linear equations for X.sub.k, where 21 A jk = t = - n j (
t ) k ( t ) for ( j , k = 0 , 1 , , n - 1 ) ( 9 )
[0305] and 22 Y j = t = - n X ( t ) j ( t ) for ( j = 0 , 1 , , n -
1 ) ( 10 )
[0306] The term Y.sub.j in Equation 10 is reducible as follows: 23
Y j = t = - n X ( t ) F ( t - j ) = t = ( j - 1 ) ( j + 1 ) X ( t )
F ( t - j ) . ( 11 )
[0307] Letting (t-j.tau.)=m, then: 24 Y j = m = - + 1 - 1 X ( m + j
) F ( m ) for ( j = 0 , 1 , 2 , , n - 1 ) . ( 12 )
[0308] The Y.sub.j's in Equation 12 represent the compressed data
to be transmitted or stored. Note that this encoding scheme
involves n correlation operations on only 2.tau.-1 points.
[0309] Since F(t) is assumed to be periodic with period n.tau., the
matrix form of A.sub.jk in Equation 9 can be reduced by
substitution Equation 3 into Equation 9 to obtain: 25 A jk = m = -
T + 1 - 1 F ( m + ( j - k ) ) F ( m ) = { m = - + 1 - 1 ( F ( m ) )
2 = if j - k 0 mod n m = - + 1 - 1 F ( m ) F ( m ) = if j - k 1 mod
n 0 otherwise ( 13 )
[0310] By Equation 13, A.sub.jk can be expressed also in circulant
form in the following manner:
A.sub.jk=a.sub.(k-j).sub..sub.n, (14)
[0311] where (k-j).sub.n denotes (k-j) mod n, and
a.sub.0=.alpha., a.sub.1=.beta., a.sub.2=0, . . . ,
a.sub.n-1=.beta. (15)
[0312] Therefore, A.sub.jk in Equations 14 and 15 has explicitly
the following equivalent circulant matrix representations: 26 [ A
jk ] = [ A 0 , 0 A 0 , 1 A o , n - 1 A 1 , 0 A 1 , 1 A 1 , n - 1 A
n - 1 , 0 A n - 1 , 1 A n - 1 , n - 1 ] = [ { a ( k - j ) a } ] = [
a 0 a 1 a 2 a n - 1 a n - 1 a 0 a 1 a n - 2 a n - 2 a n - 1 a 0 a n
- 3 a 1 a 2 a 3 a 0 ] = [ 0 0 0 0 0 0 . ] ( 16 )
[0313] One skilled in the art of matrix and filter analysis will
appreciate that the periodic boundary conditions imposed on the
data lie outside the window of observation and may be defined in a
variety of ways. Nevertheless, periodic boundary conditions serve
to simplify the process implementation by insuring that the
correlation matrix [A.sub.jk] has a calculable inverse. Thus, the
optimization process involves an inversion of [A.sub.jk], of which
the periodic boundary conditions and consequent circulant character
play a preferred role. It is also recognized that for certain
spline functions, symmetry rendered in the correlation matrix
allows inversion in the absence of periodic image boundary
conditions.
[0314] B. Two-Dimensional Data Compression by Planar Splines
[0315] For two-dimensional image data, multi-planar spline
functions are combined to approximate the image data with a
resultant planar interpolation. In FIG. 40, X(t.sub.1,t.sub.2) is a
doubly periodic array of image data (e.g., still image) of periods
n.sub.1.tau. and n.sub.2.tau., with respect to the integer
variables t.sub.1 and t.sub.2 where .tau. is a multiple of both
t.sub.1 and t.sub.2. The actual image 1002 to be compressed can be
viewed as being repeated periodically throughout the plane as shown
in the FIG. 40. Each subimage of the extended picture is separated
by a border 1032 (or gutter) of zero intensity of width .tau.. This
border is one of several possible preferred "boundary conditions"
to achieve a doubly-periodic image.
[0316] Consider now a doubly periodic planar spline, F(t.sub.1,
t.sub.2) which has the form of a six-sided pyramid or tent,
centered at the origin and is repeated periodically with periods
n.sub.1.tau. and n.sub.2.tau. with respect to integer variables
t.sub.1 and t.sub.2, respectively. A perspective view of such a
planar spline function 1034 is shown in FIG. 41a and may
hereinafter be referred to as "hexagonal tent." Following the
one-dimensional case by analogy, letting:
.psi..sub.k.sub..sub.1.sub.k.sub..sub.2
(t.sub.1,t.sub.2)=F(t.sub.1-k.sub.- 1.tau., t.sub.2-k.sub.2.tau.)
(17)
[0317] for (k.sub.1=0,1, . . . , n.sub.1-1) and (k.sub.2=0,1, . . .
,n.sub.2-1), the "best" weights X.sub.k.sub..sub.1.sub.k.sub..sub.2
are found such that: 27 L ( X k 1 k 2 ) = t 1 t 2 = - n 1 , n 2 [ X
( t 1 , t 2 ) - k 1 , k 2 = 0 n 1 - 1 , n 2 - 1 X k 1 k 1 k 1 k 2 (
t 1 , t 2 ) ] 2 ( 18 )
[0318] is a minimum.
[0319] A condition for L to be a minimum is 28 L X j 1 j 2 = 2 t 1
t 2 = - n 1 , n 2 [ X ( t 1 , t 2 ) - k 1 , k 2 = 0 n 1 - 1 , n 2 -
1 X k 1 k 2 k 1 k 2 ( t 1 , t 2 ) ] j 1 j 2 ( t 1 , t 2 ) = 2 [ t 1
t 2 = - n 1 , n 2 X ( t 1 , t 2 ) j 1 j 2 ( t 1 , t 2 ) - k 1 , k 2
= 0 n 1 - 1 , n 2 - 1 X k 1 k 2 t 1 t 2 = - n 1 , n 2 j 1 j 2 ( t 1
, t 2 ) k 1 k 2 ( t 1 , t 2 ) ] 0. ( 19 )
[0320] The best efficients X.sub.k.sub..sub.1.sub.k.sub..sub.2 are
the solution of the 2nd-order tensor equation,
A.sub.j.sub..sub.1.sub.j.sub..sub.2.sub.k.sub..sub.1.sub.k.sub..sub.2
X.sub.k.sub..sub.1.sub.k.sub..sub.2=Y.sub.j.sub..sub.1.sub.j.sub..sub.2,
(20)
[0321] where the ummation is on k.sub.1 and k.sub.2, 29 A j 1 j 2 k
1 k 2 = t 1 , t 2 = - n 1 , n 2 j 1 j 2 ( t 1 , t 2 ) k 1 k 2 ( t 1
, t 2 ) ( 21 )
[0322] and 30 Y j 1 j 2 = t 1 , t 2 = - n 1 , n 2 X ( t 1 , t 2 ) j
1 j 2 ( t 1 , t 2 ) . ( 22 )
[0323] With the visual aid of FIG. 41a, the tensor Y.sub.j1j2
reduces as follows: 31 Y j 1 j 2 = t 1 , t 2 = - T n 1 , n 2 X ( t
1 , t 2 ) j 1 j 2 ( t 1 , t 2 ) = t 1 , t 2 = - T n 1 , n 2 X ( t 1
, t 2 ) F ( t 1 - j 1 , t 2 - j 2 ) = t 1 = ( j 1 - 1 ) ( j 1 + 1 )
t 2 = ( j 2 - 1 ) ( j 2 + 1 ) X ( t 1 , t 2 ) F ( t 1 - j 1 , t 2 -
j 2 ) . ( 23 )
[0324] Letting t.sub.k-j.sub.k.tau.=m.sub.k for k=1,2, then 32 Y j
1 j 2 = m 1 , m 2 = - + 1 - 1 X ( m 1 + j 1 , m 2 + j 2 ) F ( m 1 ,
m 2 ) ( 24 )
[0325] for (j.sub.1=0,1, . . . , n.sub.1-1) and (j.sub.2=0,1, . . .
, n.sub.2-1), where F(m.sub.1,m.sub.2) is the doubly periodic,
six-sided pyramidal function, shown in FIG. 41a. The tensor
transform in Equation 21 is treated in a similar fashion to obtain
33 A j 1 j 2 k 1 k 2 = t 1 , t 2 = - n 1 , n 2 j 1 , j 2 ( t 1 , t
2 ) k 1 k 2 ( t 1 , t 2 ) = m 1 , m 2 = - + 1 - 1 F ( m 1 + ( j 1 -
k 1 ) , m 2 + ( j 2 - k 2 ) ) F ( m 1 , m 2 ) = { m 1 , m 2 = - + 1
- 1 ( [ F ( m 1 , m 2 ) ] ) 2 = if ( j 1 - k 1 ) 0 mod n 1 ( j 2 -
k 2 ) 0 mod n 2 m 1 , m 2 = - + 1 - 1 F ( m 1 , m 2 ) F ( m 1 , m 2
) = if ( j 1 - k 1 ) 1 mod n 1 ( j 2 - k 2 ) 0 mod n 2 m 1 , m 2 =
- + 1 - 1 F ( m 1 , m 2 ) F ( m 1 , m 2 ) = if ( j 1 - k 1 ) 0 mod
n 1 ( j 2 - k 2 ) 1 mod n 2 m 1 , m 2 = - + 1 - 1 F ( m 1 , m 2 ) F
( m 1 , m 2 ) = if ( j 1 - k 1 ) 1 mod n 1 ( j 2 - k 2 ) 1 mod n 2
m 1 , m 2 = - + 1 - 1 F ( m 1 , m 2 ) F ( m 1 , m 2 ) = if ( j 1 -
k 1 ) 1 mod n 1 ( j 2 - k 2 ) 1 mod n 2 ( 25 )
[0326] The values of .alpha., .beta., .gamma., and .xi. depend on
.tau., and the shape and orientation of the hexagonal tent with
respect to the image domain, where for example m.sub.1 and m.sub.2
represent row and column indices. For greater flexibility in
tailoring the hexagonal tent function, it is possible to utilize
all parameters of the [A.sub.j1j2k1k2]. However, to minimize
calculational overhead it is preferable to employ symmetric
hexagons, disposed over the image domain with a bi-directional
period .tau.. Under these conditions, .beta.=.gamma.=.xi. and
.eta.=0, simplifying [A.sub.j1j2k1k2] considerably. Specifically,
the hexagonal tent depicted in FIG. 41a and having an orientation
depicted in FIG. 41b is described by the preferred case in which
.beta.=.gamma.=.xi. and .eta.=0. It will be appreciated that other
orientations and shapes of the hexagonal tent are possible, as
depicted, for example, in FIG. 41c. Combinations of hexagonal tents
are also possible and embody specific preferable attributes. For
example, a superposition of the hexagonal tents shown in FIG. 41b
and 41c effectively "symmetrizes" the compression process.
[0327] From Equation 25 above, A.sub.j1j2k1k2 can be expressed in
circulant form by the following expression:
A.sub.j.sub..sub.1.sub.j.sub..sub.2.sub.k.sub..sub.1.sub.k.sub..sub.2=a.su-
b.(k.sub..sub.1.sub.-j.sub..sub.1.sub.)n.sub..sub.1.sub.,(k.sub..sub.2.sub-
.-j.sub..sub.2.sub.)n.sub..sub.2. (26)
[0328] where (k.sub.l-j.sub.l).sub.nl denote (k.sub.1-j.sub.1) mod
n.sub.l, l=1,2, and 34 [ a s 1 s 2 ] = [ a 00 a 01 a 02 a a , n 2 -
1 a 10 a 11 a 12 a 1 , n 2 - 1 a 20 a 21 a 22 a 2 , n 2 - 1 a n 1 -
1 , 0 a n 1 - 1 , 1 a n 1 - 1 , 2 a n 1 - 1 , n 2 - 1 ] = [ 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] , ( 27 )
[0329] where (s.sub.1=0, 1, 2, . . . n.sub.1-1) and (s.sub.2=1, 2,
3, . . . , n.sub.2-1). Note that when
[a.sub.s.sub..sub.1,.sub.s.sub..sub.2] is represented in matrix
form, it is "block circulant."
[0330] C. Compression-Reconstruction Alqorithms
[0331] Because the objective is to apply the above-disclosed LMS
error linear spline interpolation techniques to image sequence
coding, it is advantageous to utilize the tensor formalism during
the course of the analysis in order to readily solve the linear
systems in equations 8 and 20. Here, the tensor summation
convention is used in the analysis for one and two dimensions. It
will be appreciated that such convention may readily apply to the
general case of N dimensions.
[0332] 1. Linear Transformation of Tensors
[0333] A linear transformation of a 1st-order tensor is written
as
Y.sub.r=A.sub.rsX.sub.s (sum on s), (28)
[0334] where A.sub.rs is a linear transformation, and
Y.sub.r,X.sub.s are 1st-order tensors. Similarly, a linear
transformation of a second order tensor is written as:
Y.sub.r.sub..sub.1.sub.r.sub..sub.2=A.sub.r.sub..sub.1.sub.r.sub..sub.2.su-
b.s.sub..sub.1.sub.s.sub..sub.2X.sub.s.sub..sub.1.sub.s.sub..sub.2
(sum on s.sub.1,s.sub.2). (29)
[0335] The product or composition of linear transformations is
defined as follows. When the above Equation 29 holds, and
Z.sub.1.sub..sub.a.sub.q.sub..sub.2=B.sub.q.sub..sub.1.sub.q.sub..sub.2.su-
b.r.sub..sub.1.sub.r.sub..sub.2Y.sub.r.sub..sub.1.sub.r.sub..sub.2,
(30)
[0336] then
Z.sub.q.sub..sub.1.sub.q.sub..sub.2=B.sub.q.sub..sub.1.sub.q.sub.2.sub.r.s-
ub..sub.1.sub.r.sub..sub.2A.sub.r.sub..sub.1.sub.r.sub..sub.2.sub.s.sub..s-
ub.1.sub.s.sub..sub.2. (31)
[0337] Hence,
C.sub.q.sub..sub.1.sub.q.sub..sub.2.sub.s.sub..sub.1.sub.s.sub..sub.2=B.su-
b.q.sub..sub.1.sub.q.sub..sub.2.sub.r.sub..sub.1.sub.r.sub..sub.2A.sub.r.s-
ub..sub.1.sub.r.sub..sub.2.sub.s.sub..sub.1.sub.s.sub..sub.2
(32)
[0338] is the composition or product of two linear
transformations.
[0339] 2. Circulant Transformation of 1st-Order Tensors
[0340] The tensor method for solving equations 8 and 20 is
illustrated for the 1-dimensional case below: Letting A.sub.rs
represent a circulant tensor of the form:
A.sub.rs=a.sub.(s-r)mod n for(r,s=0, 1, 2, . . . , n-1), (33)
[0341] and considering the n special 1st-order tensors as
W.sub.S.sup.(l).ident.(.omega..sup.l).sup.s for (t=0, 1, 2, . . .
,n-1), (34)
[0342] where .omega. is the n-th root of unity, then
A.sub.rsW.sub.S.sup.(l)=.lambda.(l)W.sub.r.sup.(l), (35)
[0343] where 35 ( l ) = j = 0 n - 1 a j ( l ) j ( 36 )
[0344] are the distinct eigenvalues of A.sub.rs. The terms
W.sub.S.sup.(l) are orthogonal. 36 W s ( l ) W s ( j ) * = { 0 for
l j n for l = j . ( 37 )
[0345] At this point it is convenient to normalize these tensors as
follows: 37 ( l ) S = 1 n W ( l ) S for ( l = 0 , 1 , 2 , , n - 1 )
. ( 38 )
[0346] .PHI..sub.s.sup.(l) evidently also satisfies the orthonormal
property, i.e.,
.PHI..sub.S.sup.(l).PHI..sub.S.sup.(j)*=.delta..sub.lj (39)
[0347] where .delta..sub.lj is the Kronecker delta function and *
represents complex conjugation.
[0348] A linear transformation is formed by summing the n dyads
.PHI..sub.r.sup.(l).PHI..sub.s.sup.(l)* for l=0,1, . . . ,n-1 under
the summation sign as follows: 38 A ~ rs = l = 0 n - 1 ( l ) ( l )
r ( l ) S * . ( 40 )
[0349] Then 39 A ~ rs ( j ) S = j 0 n - 1 ( l ) r ( l ) s ( l ) * s
( j ) = n - 1 l = 0 ( l ) r ( l ) lj = ( j ) r ( j ) . ( 41 )
[0350] Since .sub.rs has by a simple verification the same
eigenvectors and eigenvalues as the transformation A.sub.rs has in
Equations 9 and 33, the transformation .sub.rs and A.sub.rs are
equal.
[0351] 3. Inverse Transformation of 1st-Order Tensors.
[0352] The inverse transformation of A.sub.rs is shown next to be
40 A rs - 1 = l = 0 n - 1 1 ( l ) r l s l * . ( 42 )
[0353] This is proven easily, as shown below: 41 A rs A st - 1 = l
= 0 n - 1 l ' = 0 n - 1 ( l ) 1 ( l ' ) r l s l * s l ' t l ' * = l
= 0 n - 1 l ' = 0 n - 1 ( l ) 1 ( l ' ) l l l ' t l ' * = l = 0 n -
1 r l t l * = l = 0 n - 1 1 n ( l ) rt = l = 0 n - 1 1 n ( rt ) l =
rt ( 43 )
[0354] 4. Solving 1st-Order Tensor Equations
[0355] The solution of a 1st-order tensor equation
Y.sub.r=A.sub.rsX is given by
A.sub.qr.sup.-1Y.sub.r=A.sub.gr.sup.-1A.sub.rsX.sub.s=.delta..sub.qsX.sub.-
s=X.sub.q, (44)
[0356] so that 42 X r = A rs - 1 Y s = l = 0 n - 1 1 ( l ) r l s l
* Y s = l = 0 n - 1 [ s l * Y s ( l ) ] r l = l = 0 n - 1 [ 1 ( l )
[ 1 n k = 0 n - 1 Y k - lk ] ] lr = DFT [ 1 ( l ) DFT - 1 ( Y k ) ]
. ( 45 )
[0357] where DFT denotes the discrete Fourier Transform and
DFT.sup.-1 denotes its inverse discrete Fourier Transform.
[0358] An alternative view of the above solution method is derived
below for one dimension using standard matrix methods. A linear
transformation of a 1st-order tensor can be represented by a
matrix. For example, let A denote A.sub.rs in matrix form. If
A.sub.rs is a circulant transformation, then A is also a circulant
matrix. From matrix theory it is known that every circulant matrix
is "similar" to a DFT matrix. If Q denotes the DFT matrix of
dimension (n.times.n), and Q.sup.t the complex conjugate of the DFT
matrix, and .LAMBDA. is defined to be the eigenmatrix of A,
then:
A=Q.LAMBDA.Q.sup.t. (46)
[0359] The solution to y=Ax is then
x=A.sup.-1y=Q.LAMBDA..sup.-1(Q.sup.ty).
[0360] For the one-dimensional process described above, the
eigenvalues of the transformation operators are: 43 ( l ) = j = 0 n
- 1 a j ( w l ) j = DFT ( a j ) . ( 47 )
[0361] where a.sub.0=.alpha., a.sub.1=.beta., . . . , a.sub.n-2=0,
a.sub.n-1=.beta., and .omega..sup.n=1. Hence:
.lambda.(l)=.alpha.+.beta..omega..sup.l+.beta..omega..sup.(n-1)l
=.alpha.+.beta.(.omega..sup.l+.omega..sup.-l). (48)
[0362] A direct extension of the 1st-order tensor concept to the
2nd-order tensor will be apparent to those skilled in the art. By
solving the 2nd-order tensor equations, the results are extended to
compress a 2-D image. FIG. 42 depicts three possible hexagonal tent
functions for 2-dimensioned image compression indices .tau.=2,3,4.
The following table exemplifies the relevant parameters for
implementing the hexagonal tent functions:
2 Decimation Index .tau. = 2 .tau. = 3 .tau. = 4 (.tau.)
Compression Ratio 4 9 16 (.tau..sup.2) .alpha. a.sup.2 + 6b.sup.2
a.sup.2 + 6b.sup.2 + 12c.sup.2 a.sup.2 + 6b.sup.2 + 12c.sup.2 +
18d.sup.2 .beta. b.sup.2 2(c.sup.2 + bc) 2d.sup.2 + 2db + 4dc +
c.sup.2 gain a + 6b a + 6b + 12c a + 6b + 12c + 18d
[0363] The algorithms for compressing and reconstructing a still
image are explained in the succeeding sections.
[0364] III. OVERVIEW OF CODING-RECONSTRUCTION SCHEME
[0365] A block diagram of the compression/reconstruction scheme is
shown in FIG. 43. The signal source 1002, which can have dimension
up to N, is first passed through a low-pass filter (LPF). This
low-pass filter is implemented by convolving (in a process block
1014) a chosen spline filter 1013 with the input source 1002. For
example, the normalized frequency response 1046 of a
one-dimensional linear spline is shown in FIG. 44. Referring again
to FIG. 43, it can be seen that immediately following the LPF, a
subsampling procedure is used to reduce the signal size 1016 by a
factor .tau.. The information contained in the subsampled source is
not optimized in the least-mean-square sense. Thus, an optimization
procedure is needed to obtain the best reconstruction weights. The
optimization process can be divided into three consecutive parts. A
DFT 1020 maps the non-optimized weights into the image conjugate
domain. Thereafter, an inverse eigenfilter process 1022 optimizes
the compressed data. The frequency response plots for some typical
eigenfilters and inverse eigenfilters are shown in FIG. 45 and 46.
After the inverse eigenfilter 1022, a DFT.sup.-1 process block 1024
maps its input back to the original image domain. When the
optimized weights are derived, reconstruction can proceed. The
reconstruction can be viewed as oversampling followed by a
reconstruction low-pass filter.
[0366] The embodiment of the optimized spline filter described
above may employ a DFT and DFT.sup.-1 type transform processes.
However, those skilled in the art of digital image processing will
appreciate that it is preferable to employ a Fast Fourier Transform
(FFT) and FFT.sup.-1 processes, which substantially reduce
computation overhead associated with conjugate transform
operations. Typically, such an improvement is given by the ratio of
computation steps required to transform a set of N elements: 44 FFT
DFT = N 2 log 2 ( N ) N 2 = 1 2 N log 2 ( N ) ,
[0367] which improves with the size of the image.
[0368] A. The Compression Method
[0369] The coding method is specified in the following steps:
[0370] 1. A suitable value of .tau. (an integer) is chosen. The
compression ratio is .tau..sup.2 for two-dimensional images.
[0371] 2. Equation 23 is applied to find Y.sub.j1,j2, which is the
compressed data to be transmitted or stored: 45 Y j 1 j 2 = t 1 , t
2 = - T n 1 , n 2 X ( t 1 , t 2 ) j 1 j 2 ( t 1 , t 2 ) = t 1 , t 2
= - T n 1 , n 2 X ( t 1 , t 2 ) F ( t 1 - j 1 , t 2 - j 2 ) = t 1 =
( j 1 - 1 ) ( j 1 + 1 ) t 2 = ( j 2 - 1 ) ( j 2 + 1 ) X ( t 1 , t 2
) F ( t 1 - j 1 , t 2 - j 2 )
[0372] B. The Reconstruction Method
[0373] The reconstruction method is shown below in the following
steps:
[0374] 1. Find the FFT.sup.-1 of Y.sub.j1,j2 (the compressed
data).
[0375] 2. The results of step 1 are divided by the eigenvalues
.lambda.(l, m) set forth below. The eigenvalues .lambda.(l,m) are
found by extending Equation 48 to the two-dimensional case to
obtain:
.lambda.(l,m)=.alpha.+.beta.(.omega..sub.1.sup.l+.omega..sub.1.sup.-l+.ome-
ga..sub.2.sup.m+.omega..sub.2.sup.-m+.omega..sub.1.sup.l.omega..sub.2.sup.-
-m+.omega..sub.1.sup.-l.omega..sub.2.sup.m), (49)
[0376] where .omega..sub.1 is the n.sub.1-th root of unity and
.omega..sub.2 is the n.sub.2-th root of unity.
[0377] 3. The FFT of the results from step 2 is then taken. After
computing the FFT, X.sub.k.sub..sub.1.sub.k.sub..sub.2 (the
optimized weights) are obtained.
[0378] 4. The recovered or reconstructed image is: 46 S ( t 1 , t 2
) = k 1 , k 2 = 0 n 1 - 1 , n 2 - 1 X k 1 k 2 k 1 k 2 ( t 1 , t 2 )
. ( 50 )
[0379] 5. Preferably, the residue is computed and retained with the
optimized weights:
.DELTA.X(t.sub.1, t.sub.2)=X(t.sub.1,t.sub.2)-S(t.sub.1,
t.sub.2).
[0380] Although the optimizing procedure outlined above appears to
be associated with an image reconstruction process, it may be
implemented at any stage between the aforementioned compression and
reconstruction. It is preferable to implement the optimizing
process immediately after the initial compression so as to minimize
the residual image. The preferred order has an advantage with
regard to storage, transmission and the incorporation of subsequent
image processes.
[0381] C. Response Considerations
[0382] The inverse eigenfilter in the conjugate domain is described
as follows: 47 H ( i , j ) = 1 ( i , j ) . ( 51 )
[0383] where .lambda.(i,j) can be considered as an estimation of
the frequency response of the combined decimation and interpolation
filters. The optimization process H(i,j) attempts to "undo" what is
done in the combined decimation/interpolation process. Thus, H(i,j)
tends to restore the original signal bandwidth. For example, for
.tau.=2, the decimation/interpolation combination is described as
having an impulse response resembling that of the following
3.times.3 kernel: 48 R = ( 0 0 ) . ( 52 )
[0384] Then, its conjugate domain counterpart,
.lambda.(i,j).vertline..sub- ..alpha.,.beta., N, will be 49 ( ( i ,
j ) ) , , N + 2 [ cos ( 2 i N ) + cos ( 2 j N ) + cos [ 2 ( i N - j
N ) ] ] , ( 53 )
[0385] where i,j are frequency indexes and N represents the number
of frequency terms. Hence, the implementation accomplished in the
image conjugate domain is the conjugate equivalent of the inverse
of the above 3.times.3 kernel. This relationship will be utilized
more explicitly for the embodiment disclosed in Section V.
[0386] IV. NUMERICAL SIMULATIONS
[0387] A. One-Dimensional Case
[0388] For a one-dimensional implementation, two types of signals
are demonstrated. A first test is a cosine signal which is useful
for observing the relationship between the standard error, the size
of .tau. and the signal frequency. The standard error is defined
herein to be the square root of the average error: 50 [ 1 N t ( X (
t ) ) 2 ] 1 / 2 .
[0389] A second one-dimensional signal is taken from one line of a
grey-scale still image, which is considered to be realistic data
for practical image compression.
[0390] FIG. 47 shows the plots of standard error versus frequency
of the cosine signal for different degrees of decimation .tau.
1056. The general trend is that as the input signal frequency
becomes higher, the standard error increases. In the low frequency
range, smaller values of .tau. yield a better performance. One
abnormal phenomenon exists for the .tau.=2 case and a normalized
input frequency of 0.25. For this particular situation, the linear
spline and the cosine signal at discrete grid points can match
perfectly so that the standard error is substantially equal to
0.
[0391] Another test example comes from one line of realistic still
image data. FIG. 48a and 48b show the reconstructed signal waveform
1060 for .tau.=2 and .tau.=4, respectively, superimposed on the
original image data 1058. FIG. 48a shows a good quality of
reconstruction for .tau.=2. For .tau.=4, in FIG. 48b, some of the
high frequency components are lost due to the combined
decimation/interpolation procedure. FIG. 48c presents the error
plot 1062 for this particular test example. It will be appreciated
that the non-linear error accumulation versus decimation parameter
.tau. may be exploited to minimize the combination of optimized
weights and image residue.
[0392] B. Two-Dimensional Case
[0393] For the two-dimensional case, realistic still image data are
used as the test. FIG. 49 and 50 show the original and
reconstructed images for .tau.T=2 and .tau.=4. For .tau.=2, the
reconstructed image 1066, 1072 is substantially similar to the
original. However, for .tau.=4, there are zig-zag patterns along
specific edges in images. This is due to the fact that the
interpolation less accurately tracks the high frequency components.
As described earlier, substantially complete reconstruction is
achieved by retaining the minimized residue .DELTA.X and adding it
back to the approximated image. In the next section, several
methods are proposed for implementing this process. FIG. 51 shows
the error plots as functions of .tau. for both images.
[0394] An additional aspect of interest is to look at the optimized
weights directly. When these optimal weights are viewed in picture
form, high-quality miniatures 1080, 1082 of the original image are
obtained, as shown in FIG. 52. Hence, the present embodiment is a
very powerful and accurate method for creating a "thumbnail"
reproduction of the original image.
[0395] V. ALTERNATIVE EMBODIMENTS
[0396] Video compression is a major component of high-definition
television (HDTV) According to the present invention, video
compression is formulated as an equivalent three-dimensional
approximation problem, and is amenable to the technique of optimum
linear or more generally by hyperplanar spline interpolation. The
main advantages of this approach are seen in its fast speed in
coding/reconstruction, its suitability in a VLSI hardware
implementation, and a variable compression ratio. A principal
advantage of the present invention is the versatility with which it
is incorporated into other compression systems. The invention can
serve as a "front-end" compression platform from which other signal
processes are applied. Moreover, the invention can be applied
iteratively, in multiple dimensions and in either the image or
image conjugate domain. The optimizing method can for example apply
to a compressed image and further applied to a corresponding
compressed residual image. Due to the inherent low-pass filtering
nature of the interpolation process, some edges and other
high-frequency features may not be preserved in the reconstructed
images, but which are retained through the residue. To address this
problem, the following procedures are set forth:
[0397] Procedure (a)
[0398] Since the theoretical formulation, derivation, and
implementation of the disclosed compression method do not depend
strongly on the choice of the interpolation kernel function, other
kernel functions can be applied and their performances compared. So
far, due to its simplicity and excellent performance, only the
linear spline function has been applied. Higher-order splines, such
as the quadratic spline, cubic spline could also be employed. Aside
from the polynomial spline functions, other more complicated
function forms can be used.
[0399] Procedure (b)
[0400] Another way to improve the compression method is to apply
certain adaptive techniques. FIG. 53 illustrates such an adaptive
scheme. For a 2-D image 1002, the whole image can be divided into
subimages of smaller size 1084. Since different subimages have
different local features and statistics, different compression
schemes can be applied to these different subimages. An error
criterion is evaluated in a process step 1086. If the error is
below a certain threshold determined in a process step 1088, a
higher compression ratio is chosen for that subimage. If the error
goes above this threshold, then a lower compression ratio is chosen
in a step 1092 for that subimage. Both multi-kernel functions 1090
and multi-local-compression ratios provide good adaptive
modification.
[0401] Procedure (c)
[0402] Subband coding techniques have been widely used in digital
speech coding. Recently, subband coding is also applied to digital
image data compression. The basic approach of subband coding is to
split the signal into a set of frequency bands, and then to
compress each subband with an efficient compression algorithm which
matches the statistics of that band. The subband coding techniques
divide the whole frequency band into smaller frequency subbands.
Then, when these subbands are demodulated into the baseband, the
resulting equivalent bandwidths are greatly reduced. Since the
subbands have only low frequency components, one can use the above
described, linear or planar spline, data compression technique for
coding these data. A 16-band filter compression system is shown in
FIG. 54, and the corresponding reconstruction system in FIG. 55.
There are, of course, many ways to implement this filter bank, as
will be appreciated by those skilled in the art. For example, a
common method is to exploit the Quadrature Mirror Filter
structure.
[0403] V. IMAGE DOMAIN IMPLEMENTATION
[0404] The embodiments described earlier utilize a spline filter
optimization process in the image conjugate domain using an FFT
processor or equivalent thereof. The present invention also
provides an equivalent image domain implementation of a spline
filter optimization process which presents distinct advantages with
regard to speed, memory and process application.
[0405] Referring back to Equation 45, it will be appreciated that
the transform processes DFT and DFT.sup.-1 may be subsummed into an
equivalent conjugate domain convolution, shown here briefly: 51 X j
= DFT [ 1 m DFT - 1 ( Y k ) ] = DFT [ DFT - 1 [ DFT ( 1 m ) ] DFT -
1 ( Y k ) ] ( 54 )
[0406] If .OMEGA.=DFT (1/.lambda..sub.m), then:
X.sub.j=DFT[DFT.sup.-1(.OMEGA.)
DFT.sup.-1(Y.sub.k)]=.OMEGA.*Y.sub.k.
[0407] Furthermore, with .lambda..sub.m=DFT(a.sub.j), the
optimization process may be completely carried over to an image
domain implementation knowing only the form of the input spline
filter function. The transform processes can be performed in
advance to generate the image domain equivalent of the inverse
eigenfilter. As shown in FIG. 57, the image domain spline optimizer
.OMEGA. operates on compressed image data Y' generated by a first
convolution process 1014 followed by a decimation process 1016, as
previously described. Off-line or perhaps adaptively, the tensor
transformation A (as shown for example in Equation 25 above) is
supplied to an FFT type processor 1032, which computes the
transformation eigenvalues .lambda.. The tensor of eigenvalues is
then inverted at process block 1034, followed by FFT.sup.-1 process
block 1036, generating the image domain tensor .OMEGA.. The tensor
.OMEGA. is supplied to a second convolution process 1038, whereupon
.OMEGA. is convolved with the non-optimized compressed image data
Y' to yield optimized compressed image data Y".
[0408] In practice, there is a compromise between accuracy and
economy with regard to the specific form of .OMEGA.. The optimizer
tensor .OMEGA. should be of sufficient size for adequate
approximation of: 52 DFT - 1 ( 1 DFT ( A ) ) .
[0409] On the other hand, the term .OMEGA. should be small enough
to be computationally tractable for the online convolution process
1038. It has been found that two-dimensional image compression
using the preferred hexagonal tent spline is adequately optimized
by a 5.times.5 matrix, and preferably a 7.times.7 matrix, for
example, with the following form: 53 = { 0 h - g g e e g h f e d c
d e - g e c b b c e g d b a b d g e c b b c e - g e d c d e f h g e
e g - g h 0 } .
[0410] Additionally, to reduce computational overhead, the smallest
elements (i.e., the elements near the perimeter) such as f, g, and
h may be set to zero with little noticeable effect in the
reconstruction.
[0411] The principal advantages of the present preferred embodiment
are in computational saving above and beyond that of the previously
described conjugate domain inverse eigenfilter process (FIG. 38,
1018). For example, a two-dimensional FFT process may typically
require about N.sup.2log.sub.2N complex operations or equivalently
6N.sup.2log.sub.2N multiplications. The total number of image
conjugate filter operations is of order 10N.sup.2log.sub.2N. On the
other hand, the presently described (7.times.7) kernel with 5
distinct operations per image element will require only 5N.sup.2
operations, lower by an important factor of log.sub.2N. Hence, even
for reasonably small images, there is significant improvement in
computation time.
[0412] Additionally, there is substantial reduction in buffer
demands because the image domain process 1038 requires only a
7.times.7 image block at a given time, in contrast to the conjugate
process which requires a full-frame buffer before processing. In
addition to the lower demands on computation with the image domain
process 1038, there is virtually no latency in transmission as the
process is done in pipeline. Finally, "power of 2" constraints
desirable for efficient FFT processing is eliminated, allowing
convenient application to a wider range of image dimensions.
[0413] The above detailed description is intended to be exemplary
and not limiting. From this detailed description, taken in
conjunction with the appended drawings, the advantages of the
present invention will be readily understood by one who is skilled
in the relevant technology. The present apparatus and method
provides a unique encoder, compressed file format and decoder which
compresses images and decodes compressed images. The unique
compression system increases the compression ratios for comparable
image quality while achieving relatively quick encoding and
decoding times, optimizes the encoding process to accommodate
different image types, selectively applies particular encoding
methods for a particular image type, layers the image quality
components in the compressed image, and generates a file format
that allows the addition of other compressed data information.
[0414] While the above detailed description has shown, described
and pointed out the fundamental novel features of the invention as
applied to various embodiments, it will be understood that various
omissions and substitutions and changes in the form and details of
the illustrated device may be made by those skilled in the art,
without departing from the spirit of the invention.
* * * * *