U.S. patent number RE32,580 [Application Number 06/909,319] was granted by the patent office on 1988-01-19 for digital speech coder.
This patent grant is currently assigned to American Telephone and Telegraph Company, AT&T Bell Laboratories. Invention is credited to Bishnu S. Atal, Joel R. Remde.
United States Patent |
RE32,580 |
Atal , et al. |
January 19, 1988 |
Digital speech coder
Abstract
An improved speech analysis and synthesis system wherein LPC
parameters and a modified residual signal for excitation is
transmitted: the excitation signal is the cross correlation of the
residual signal and the LPC-recreated original signal.
Inventors: |
Atal; Bishnu S. (New
Providence, NJ), Remde; Joel R. (Elizabeth, NJ) |
Assignee: |
American Telephone and Telegraph
Company, AT&T Bell Laboratories (Murray Hill, NJ)
|
Family
ID: |
26985378 |
Appl.
No.: |
06/909,319 |
Filed: |
September 18, 1986 |
Related U.S. Patent Documents
|
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
Issue Date |
|
Reissue of: |
326371 |
Dec 1, 1981 |
04472832 |
Sep 18, 1984 |
|
|
Current U.S.
Class: |
704/219;
704/218 |
Current CPC
Class: |
G10L
19/10 (20130101); G10L 19/08 (20130101) |
Current International
Class: |
G10L 005/00 () |
Field of
Search: |
;381/29-40,49
;364/513.5 |
References Cited
[Referenced By]
U.S. Patent Documents
Primary Examiner: Kemeny; E. S. Matt
Attorney, Agent or Firm: Cubert; Jack S.
Claims
What is claimed is:
1. A method for processing a sequential pattern comprising the
steps of: partitioning said sequential pattern into successive time
intervals; generating a set of signals representative of the
sequential pattern of each time interval responsive to said time
interval sequential pattern; generating a signal corresponding to
the differences between said interval sequential pattern and the
interval representative signal set responsive to said interval
sequential pattern and said interval representative signals;
forming a first signal corresponding to the interval pattern
responsive to said interval pattern representative signals and said
interval differences representative signal; generating a second
interval corresponding signal responsive to said interval pattern
representative signals; generating a signal corresponding to the
differences between said first and second interval corresponding
signals; producing a third signal responsive to said interval
differences corresponding signal for altering said second signal to
reduce the interval differences corresponding signal; and utilizing
said third signal to construct a replica of said interval
sequential pattern.
2. A method for processing a speech pattern comprising the steps
of: partitioning the speech pattern into successive time intervals;
generating a set of signals representative of said speech pattern
of each time interval responsive to said interval speech pattern;
generating a signal representative of the differences between said
interval speech pattern and the interval speech pattern
representative signal set responsive to said interval speech
pattern and said interval speech pattern representative signals;
forming a first signal corresponding to the interval speech pattern
responsive to said interval speech pattern representative signals
and the interval differences representative signal; forming a
second interval corresponding signal responsive to the interval
speech pattern representative signals; generating a signal
corresponding to the differences between said first and second
interval corresponding signals; and producing a third signal
responsive to said interval differences corresponding signal for
altering said second signal to reduce the interval differences
corresponding signal.
3. A method for processing a speech pattern according to claim 2
wherein: said interval representative signal set generating step
comprises generating a set of speech parameter signals
representative of said interval speech pattern; said first interval
corresponding signal forming step comprises generating said first
interval corresponding signal responsive to said speech parameter
signals and said differences representative signal; and said second
interval corresponding signal forming step comprises generating
said second interval corresponding signal responsive to said
interval speech parameter signals.
4. A method for processing a speech pattern according to claim 3
wherein said speech parameter signal generating step comprises
generating a set of signals representative of the interval speech
spectrum.
5. A method for processing a speech pattern according to claim 4
wherein: said third signal producing step comprises generating a
coded signal having at least one element responsive to the interval
differences corresponding signal; and modifying said second
interval corresponding signal responsive to said coded signal
element.
6. A method for processing a speech pattern according to claim 5
wherein: said coded signal generating step comprises generating,
for a predetermined number of times, a coded signal element
responsive to said interval differences corresponding signal; and
modifying said second interval corresponding signal responsive to
said generated coded signal elements.
7. A method for processing a speech pattern according to claim 6
wherein: said differences corresponding signal generating step
comprises generating a signal representative of the correlation
between said first interval corresponding and second interval
corresponding signals.
8. A method for processing a speech pattern according to claim 5
wherein said differences corresponding signal generating step
comprises generating a signal representative of the mean squared
difference between said first and second interval corresponding
signals.
9. A method for processing a speech pattern according to claims 2,
3, or 4 further comprising the step of utilizing said third signal
to construct a replica of said interval speech pattern.
10. A sequential pattern processor comprising means for
partitioning a sequential pattern into successive time intervals;
means responsive to each time interval sequential pattern for
generating a set of signals representative of the sequential
pattern of said time interval; means responsive to said interval
sequential pattern and said interval representative signals for
generating a signal representative of the differences between said
interval sequential pattern and the interval representative signal
set; means responsive to said interval pattern representative
signals and said differences representative signal for forming a
first signal corresponding to the interval pattern; means
responsive to said interval pattern representative signals for
generating a second interval corresponding signal; means for
generating a signal corresponding to the differences between said
first and second interval corresponding signals; and means
responsive to said interval differences corresponding signal for
producing a third signal for altering said second signal to reduce
the interval differences corresponding signal; and means for
utilizing said third signal to construct a replica of said interval
sequential pattern.
11. A speech processor comprising means for partitioning a speech
pattern into successive time intervals; means responsive to each
interval speech pattern for generating a set of signals
representative of the speech pattern of said time interval; means
responsive to said interval speech pattern and said interval speech
pattern representative signals for generating a signal
representative of the differences between said interval speech
pattern and the interval representative signal set; means
responsive to said speech interval signals and said interval
differences representative signal for forming a first signal
corresponding to the interval speech pattern; means responsive to
said interval speech pattern representative signals for forming a
second interval corresponding signal; means for generating a signal
corresponding to the differences between said first and second
interval corresponding signals; and means responsive to said
interval differences corresponding signal for producing a third
signal for altering said second interval corresponding signal to
reduce the interval differences corresponding signal.
12. A speech processor according to claim 11 wherein: said speech
interval representative signal set generating means comprises means
for generating a set of signals representative of prescribed speech
parameters of said interval speech pattern; said first interval
corresponding signal forming means comprises means responsive to
said interval prescribed speech parameter signals and said
differences representative signal for generating said first
interval corresponding signal; said second interval corresponding
signal forming means comprises means responsive to said interval
prescribed speech parameter signals for generating the second
interval corresponding signal.
13. A speech processor according to claim 12 wherein said
prescribed speech parameter signal generating means comprises means
for generating a set of signals representative of the interval
speech pattern spectrum.
14. A speech processor according to claim 13 wherein: said third
signal producing means comprises means responsive to said interval
differences corresponding signal for generating a coded signal
having at least one element; and means responsive to said coded
signal elements for modifying said second interval corresponding
signal.
15. A speech processor according to claim 14 wherein: said coded
signal generating means comprises means operative N times to
produce an N element coded signal including means responsive to
said differences corresponding signal for generating coded signal
elements; and means responsive to the generated coded signal
elements for modifying said second interval corresponding
signal.
16. A speech processor according to claim 15 wherein: said interval
differences corresponding signal generating means comprises means
for generating a signal representative of the correlation between
said first and second interval corresponding signals.
17. A speech processor according to claim 15 wherein said interval
differences corresponding signal generating means comprises means
for generating a signal representative of the mean squared
difference between said first and second interval corresponding
signals.
18. A speech processor according to claims 11, 12, or 13 further
comprising the step of utilizing said third signal to construct a
replica of said interval speech pattern.
19. A method for encoding a speech pattern comprising the steps of:
partitioning a speech pattern into successive time frames;
generating for each frame a set of speech parameter signals
responsive to the frame speech pattern; generating a signal
representative of the differences between the frame speech pattern
and said speech parameter signal set responsive to said frame
speech pattern and said frame speech parameter signals; generating
a first signal corresponding to the frame speech pattern responsive
to said frame speech parameter signals and said differences
representative signal; generating a second frame corresponding
signal responsive to said frame speech parameter signals;
generating a signal corresponding to the differences between said
first and second interval corresponding signals; and producing a
coded signal responsive to said interval differences corresponding
signal for modifying said second interval corresponding signal to
reduce said interval differences corresponding signal.
20. A method for encoding a speech signal according to claim 19
further comprising combining said produced coded signal and said
speech parameter signals to form a coded signal representative of
the frame speech pattern.
21. A method for encoding a speech signal according to claim 19
wherein said speech parameter signal set generation comprises
generating a set of linear predictive parameter signals for the
frame responsive to said frame speech pattern; and said differences
representative signal generation comprises generating a predictive
residual signal responsive to said frame linear prediction
parameter signals and said frame speech pattern.
22. A method for encoding a speech signal according to claim 21
wherein said coded signal producing step comprises generating a
coded signal having at least one element responsive to said
differences corresponding signal; and modifying said frame second
signal responsive to said coded signal elements.
23. A method for encoding a speech pattern according to claim 21
wherein said signal producing step comprises generating a
multielement coded signal by successively generating a coded signal
element responsive to said differences corresponding signal and
modifying said second signal responsive to said coded signal
elements.
24. Apparatus for encoding a speech pattern comprising means for
partitioning a speech pattern into successive time frames; means
responsive to the frame speech pattern for generating for each
frame a set of speech parameter signals; means responsive to said
frame speech parameter signals and said frame speech pattern for
generating a signal representative of the differences between said
frame speech pattern and said frame speech parameter signal set;
means responsive to said frame speech parameter signals and said
differences representative signal for generating a first signal
corresponding to said frame speech pattern; means responsive to
said frame speech parameter signals for generating a second frame
corresponding signal; means for generating a signal corresponding
to the differences between said first and second frame
corresponding signals; and means responsive to said frame
differences corresponding signal for producing a third signal to
modify said second signal to reduce the frame differences
corresponding signal.
25. Apparatus for encoding a speech pattern according to claim 24
further comprising means for combining said produced coded signal
and said speech parameter signals to form a coded signal
representative of the frame speech pattern.
26. Apparatus for encoding a speech pattern according to claim 24
wherein said speech parameter signal generating means comprises
means responsive to said frame speech pattern for generating a set
of linear predictive parameter signals for the frame; said
differences representative signal generating means comprises means
responsive to said frame linear prediction parameter signals and
said frame speech pattern for generating a frame predictive
residual signal; said first signal generating means comprises means
responsive to said frame predictive parameter signals and said
frame predictive residual signal for forming said first frame
corresponding signal; and said second signal generating means
comprises means responsive to said frame linear predictive
parameter signals for forming said second frame corresponding
signal.
27. Apparatus for encoding a speech pattern according to claim 26
wherein said coded signal producing means comprises means
responsive to said difference corresponding signal for generating a
coded signal having at least one element; and means responsive to
said coded signal element for modifying said second signal.
28. Apparatus for encoding a speech pattern according to claim 26
wherein said coded signal producing means comprises means for
generating a multielement coded signal including means operative
successively for generating a coded signal element responsive to
said differences corresponding signal and for modifying said second
signal responsive to said coded signal elements.
29. A speech processor comprising means for partitioning a speech
pattern into successive time frames; means responsive to the speech
pattern of each frame for producing a set of predictive parameter
signals and a predictive residual signal; means responsive to said
frame predictive parameter and predictive residual signals for
generating a first signal corresponding to the frame speech
pattern; means responsive to said frame predictive parameter
signals for generating a second frame corresponding signal; means
responsive to said first and second frame corresponding signals for
producing a signal corresponding to the differences between said
first and second frame corresponding signals; means responsive to
said frame differences corresponding signal for generating a coded
excitation signal and for applying said coded excitation signal to
said second signal generating means to reduce the differences
corresponding signal.
30. A speech processor according to claim 29 further comprising
means responsive to said frame coded excitation signal and said
frame predictive parameter signals for constructing a replica of
said frame speech pattern.
31. A speech processor according to claim 29 or claim 30 wherein
said coded excitation signal generating means comprises means
operative successively to form a multielement coded signal
comprising means responsive to the differences corresponding signal
for forming an element of said multielement code and for modifying
said second signal responsive to said coded signal elements.
32. A method for processing a speech pattern according to claim 5,
6, 7, or 8 further comprising the step of utilizing said coded
signal to construct a replica of said interval speech pattern.
33. A speech processor according to claim 14, 15, 16, or 17 further
comprising means for utilizing said coded signal to construct a
replica of said interval speech pattern.
34. A speech processor for producing a speech message comprising:
means for receiving a sequence of speech message time interval
signals, each speech interval signal including a plurality of
spectral representative signals and an excitation representative
signal for said time interval; means jointly responsive to said
interval spectral representative signals and said interval
excitation representative signal for generating a speech pattern
corresponding to the speech message; said interval excitation
speech signal being formed by the steps of: partitioning a speech
message pattern into successive time intervals; generating a set of
signals representative of said speech message pattern for each time
interval responsive to said interval speech pattern; generating a
signal representative of the differences between said interval
speech pattern and said representative signal set responsive to
said interval speech pattern and said interval respresentative
signals; forming a first signal corresponding to the interval
speech message pattern responsive to said speech message pattern
interval representative signals and differences representative
signal; forming a second interval corresponding signal responsive
to said interval speech message pattern representative signals;
generating a signal corresponding to the differences between said
first and second interval corresponding signals; and producing a
third signal responsive to said interval differences corresponding
signal for altering said second interval corresponding signal to
reduce the interval differences corresponding signal, said third
signal being said interval excitation representative signal.
35. A speech processor according to claim 34 wherein said interval
differences corresponding signal generating step comprises
generating a signal representative of the correlation between said
first interval corresponding signal and said second interval
corresponding signal and said third signal producing step comprises
forming a coded signal responsive to said correlation
representative signal.
36. A speech processor according to claim 34 or 35 wherein said
speech message interval spectral representative signals are time
interval predictive parameter signals.
37. A method for producing a speech message comprising the steps
of: receiving a sequence of speech message interval signals, each
speech interval signal including a plurality of spectral
representative signals and an excitation representative signal; and
generating a speech pattern corresponding to the speech message
jointly responsive to said interval spectral representative signals
and said interval excitation representative signals; said interval
excitation speech signal being formed by the steps of: partitioning
a speech pattern into successive time intervals; generating a set
of signals representative of the spectrum of said speech pattern
for each time interval responsive to said interval speech pattern;
generating a signal representative of the differences between said
interval speech pattern and said interval speech pattern spectral
representative signal set responsive to said interval speech
pattern and said spectral representative signals; forming a first
signal corresponding to the interval speech pattern responsive to
said interval spectral representative signals and said differences
representative signal; forming a second interval corresponding
signal responsive to said speech pattern interval spectral
representative signals; generating a signal corresponding to the
differences between said first and second interval corresponding
signals; and producing a third signal responsive to said interval
differences corresponding signal for altering said second interval
corresponding signal to reduce the interval differences
corresponding signal said third signal being said interval
excitation signal.
38. A method for producing a speech message according to claim 37
wherein said interval differences corresponding signal generating
step comprises generating a signal representative of the
correlation between said first signal and said second signal and
said third signal producing step comprises forming a prescribed
format signal responsive to said correlation representative
signal.
39. A method for producing a speech message according to claim 37
or 38 wherein said speech interval spectral representative signals
are speech interval predictive parameter signals. .Iadd.40.
Apparatus for producing a speech message comprising:
means for receiving a sequence of speech message signals for the
successive time intervals of the speech message, each time interval
speech message signal including a set of coded spectral
representative signals for the time interval portion of said speech
message and a plurality of pulse amplitude and location coded
signals representative of the differences between the time interval
portion of the speech message and the time interval portion of the
speech message formed from said spectral representative
signals;
means for converting the plurality of pulse amplitude and location
codes of said time interval into a signal representative of the
excitation of the time interval portion of said speech message;
and
means jointly responsive to said interval spectral representative
signals and said interval excitation representative signal for
generating a speech pattern corresponding to the speech message of
said time interval. .Iaddend. .Iadd.41. Apparatus for producing a
speech message according to claim 40 wherein said converting means
comprises means responsive to said amplitude and location codes for
forming a sequence of pulses within said time interval
representative of the excitation of the speech message
portion of said time interval. .Iaddend. .Iadd.42. A method for
producing a speech message comprising the steps of:
receiving a sequence of speech message signals for the successive
time interval portions of the speech message, each time interval
speech message signal including a set of coded spectral
representative signals for the time interval portion of said speech
message and a plurality of pulse amplitude and location coded
signals representative of the differences between the time interval
portion of the speech message and the time interval portion of the
speech message formed from said spectral representative
signals;
converting the plurality of pulse amplitude and location codes of
said time interval into a signal representative of the excitation
of the time interval portion of said speech message; and
generating a speech pattern corresponding to the speech message of
said time interval jointly responsive to said interval spectral
representative signals and said interval excitation representative
signal. .Iaddend. .Iadd.43. A method for producing a speech message
according to claim 42 wherein said converting step comprises
forming a sequence of pulses within said time interval
representative of the excitation of the speech message portion of
said time interval responsive to said amplitude and location codes.
.Iaddend.
Description
Our invention relates to speech processing and more particularly to
digital speech coding arrangements.
Digital speech communication systems including voice storage and
voice response facilities utilize signal compression to reduce the
bit rate needed for storage and/or transmission. As is well known
in the art, a speech pattern contains redundancies that are not
essential to its apparent quality. Removal of redundant components
of the speech pattern significantly lowers the number of digital
codes required to construct a replica of the speech. The subjective
quality of the speech replica, however, is dependent on the
compression and coding techniques.
One well known digital speech coding system such as disclosed in
U.S. Pat. No. 3,624,302 issued Nov. 30, 1971 includes linear
prediction analysis of an input speech signal. The speech signal is
partitioned into successive intervals and a set of parameters
representative of the interval speech is generated. The parameter
set includes linear prediction coefficient signals representative
of the spectral envelope of the speech in the interval, and pitch
and voicing signals corresponding to the speech excitation. These
parameter signals may be encoded at a much lower bit rate than the
speech signal waveform itself. A replica of the input speech signal
is formed from the parameter signal codes by synthesis. The
synthesizer arrangement generally comprises a model of the vocal
tract in which the excitation pulses are modified by the spectral
envelope representative prediction coefficients in an all pole
predictive filter.
The foregoing pitch excited linear predictive coding is very
efficient. The produced speech replica, however, exhibits a
synthetic quality that is often difficult to understand. In
general, the low speech quality results from the lack of
correspondence between the speech pattern and the linear prediction
model used. Errors in the pitch code or errors in determining
whether a speech interval is voiced or unvoiced cause the speech
replica to sound disturbed or unnatural. Similar problems are also
evident in formant coding of speech. Alternative coding
arrangements in which the speech excitation is obtained from the
residual after prediction, e.g., ADPCM or APC, provide a marked
improvement because the excitation is not dependent upon an inexact
model. The excitation bit rate of these systems, however, is at
least an order of magnitude higher than the linear predictive
model. Attempts to lower the excitation bit rate in the residual
type systems have generally resulted in a substantial loss in
quality. It is an object of the invention to provide improved
speech coding of high quality at lower bit rates than residual
coding schemes.
BRIEF SUMMARY OF THE INVENTION
We have found that the foregoing residual encoding problems may be
solved by forming a pattern predictive of a pattern (e.g. speech
pattern) to be encoded and comparing the pattern to be encoded with
the predictive pattern on a frame by frame basis. The differences
between the pattern to be encoded and the predictive pattern over
each frame are utilized to form a coded signal of a prescribed
format which coded signal modifies the predictive pattern to
minimize the frame differences. The bit rate of the prescribed
format coded signal is selected so that the modified predictive
pattern approximates the speech pattern to a desired level
consistent with coding requirements.
The invention is directed to a sequential pattern processing
arrangement in which the sequential pattern is partitioned into
successive time intervals. In each time interval, a set of signals
representative of the interval sequential pattern and a signal
representative of the differences between the interval sequential
pattern and the interval representative signal set are generated. A
first signal corresponding to the interval pattern is formed
responsive to said interval pattern representative signals and said
interval differences representative signal and a second interval
corresponding signal is generated responsive to said interval
pattern representative signals. A signal corresponding to the
differences between the first and second interval corresponding
signals is formed and a third signal is produced responsive to said
interval differences corresponding signal that alters the second
signal to reduce the differences between said first and second
interval corresponding signals.
According to one aspect of the invention, a speech pattern is
partitioned into successive time intervals. In each interval, a set
of signals representative of the speech pattern in each time
interval and a signal representative of the differences between
said interval speech pattern and the interval speech pattern
representative signal set are generated. A first signal
corresponding to the interval speech pattern is formed responsive
to said interval speech representative signals and differences
representative signal and a second interval corresponding signal is
generated responsive to the interval speech pattern representative
signals. A signal corresponding to the differences between the
first and second interval representative signals is formed and a
third signal is produced responsive to the interval differences
corresponding signal that alters said second interval corresponding
signal to reduce the differences corresponding signal.
According to another aspect of the invention, the third signal is
utilized to construct a replica of the interval pattern.
In an embodiment of the invention, a set of predictive parameter
signals is generated for each time frame from a speech signal. A
prediction residual signal is formed responsive to the time frame
speech signal and the time frame predictive parameters. The
prediction residual signal is passed through a first predictive
filter to produce a first speech representative signal for the time
frame. An second speech representative signal is generated for the
time frame in a second predictive filter from the frame prediction
parameters. Responsive to the first speech representative and
second speech representative signals of the time frame, a coded
excitation signal is formed and applied to the second predictive
filter to minimize the perceptually weighted mean squared
difference between the frame first and second speech representative
signals. The coded excitation signal and the predictive parameter
signals are utilized to construct a replica of the time frame
speech pattern.
DESCRIPTION OF THE DRAWING
FIG. 1 depicts a block diagram of a speech processor circuit
illustrative of the invention;
FIG. 2 depicts a block diagram of an excitation signal forming
processor that may be used in the circuit of FIG. 1;
FIG. 3 shows a flow chart that illustrates the operation of the
excitation signal forming circuit of FIG. 1;
FIGS. 4 and 5 show flow charts that illustrate the operation of the
circuit of FIG. 2;
FIG. 6 shows a timing diagram that is illustrative of the operation
of the excitation signal forming circuit of FIG. 1 and of FIG. 2;
and
FIG. 7 shows waveforms illustrating the speech processing of the
invention.
DETAILED DESCRIPTION
FIG. 1 shows a general block diagram of a speech processor
illustrative of the invention. In FIG. 1, a speech pattern such as
a spoken message is received by microphone transducer 101. The
corresponding analog speech signal therefrom is bandlimited and
converted into a sequence of pulse samples in filter and sampler
circuit 113 of prediction analyzer 110. The filtering may be
arranged to remove frequency components of the speech signal above
4.0 KHz and the sampling may be at an 8.0 KHz rate as is well known
in the art. The timing of the samples is controlled by sample clock
CL from clock generator 103. Each sample from circuit 113 is
transformed into an amplitude representative digital code in
analog-to-digital converter 115.
The sequence of speech samples is supplied to predictive parameter
computer 119 which is operative, as is well known in the art, to
partition the speech signals into 10 to 20 ms intervals and to
generate a set of linear prediction coefficient signals
a.sub.k,k=1,2, . . . , p representative of the predicted short time
spectrum of the N>>p speech samples of each interval. The
speech samples from A/D converter 115 are delayed in delay 117 to
allow time for the formation of signals a.sub.k. The delayed
samples are supplied to the input of prediction residual generator
118. The prediction residual generator, as is well known in the
art, is responsive to the delayed speech samples and the prediction
parameters a.sub.k to form a signal corresponding to the difference
therebetween. The formation of the predictive parameters and the
prediction residual signal for each frame shown in predictive
analyzer 110 may be performed according to the arrangement
disclosed in U.S. Pat. No. 3,740,476 issued to B. S. Atal June 19,
1973 and assigned to the same assignee or in other arrangements
well known in the art.
While the predictive parameter signals a.sub.k form an efficient
representation of the short time speech spectrum, the residual
signal generally varies widely from interval to interval and
exhibits a high bit rate that is unsuitable for many applications.
In the pitch excited vocoder, only the peaks of the residual are
transmitted as pitch pulse codes. The resulting quality, however,
is generally poor. Waveform 701 of FIG. 7 illustrates a typical
speech pattern over two time frames. Waveform 703 shows the
predictive residual signal derived from the pattern of waveform 701
and the predictive parameters of the frames. As is readily seen,
waveform 703 is relatively complex so that encoding pitch pulses
corresponding to peaks therein does not provide an adequate
approximation of the predictive residual. In accordance with the
invention, excitation code processor 120 receives the residual
signal d.sub.k and the prediction parameters a.sub.k of the frame
and generates an interval excitation code which has a predetermined
number of bit positions. The resulting excitation code shown in
waveform 705 exhibits a relatively low bit rate that is constant. A
replica of the speech pattern of waveform 701 constructed from the
excitation code and the prediction parameters of the frames is
shown in waveform 707. As seen by a comparison of waveforms 701 and
707, higher quality speech characteristic of adaptive predictive
coding is obtained at much lower bit rates.
The prediction residual signal d.sub.k and the predictive parameter
signals a.sub.k for each successive frame are applied from circuit
110 to excitation signal forming circuit 120 at the beginning of
the succeeding frame. Circuit 120 is operative to produce a
multielement frame excitation code EC having a predetermined number
of bit positions for each frame. Each excitation code corresponds
to a sequence of 1.ltoreq.i.ltoreq.I pulses representative of the
excitation function of the frame. The amplitude .beta..sub.i and
location m.sub.i of each pulse within the frame is determined in
the excitation signal forming circuit so as to permit construction
of a replica of the frame speech signal from the excitation signal
and the predictive parameter signals of the frame. The .beta..sub.i
and m.sub.i signals are encoded in coder 131 and multiplexed with
the prediction parameter signals of the frame in multiplexer 135 to
provide a digital signal corresponding to the frame speech
pattern.
In excitation signal forming circuit 120, the predictive residual
signal d.sub.k and the predictive parameter signals a.sub.k of a
frame are supplied to filter 121 via gates 122 and 124,
respectively. At the beginning of each frame, frame clock signal FC
opens gates 122 and 124 whereby the d.sub.k signals are supplied to
filter 121 and the a.sub.k signals are applied to filters 121 and
123. Filter 121 is adapted to modify signal d.sub.k so that the
quantizing spectrum of the error signal is concentrated in the
formant regions thereof. As disclosed in U.S. Pat. No. 4,133,976
issued to B. S. Atal et al, Jan. 9, 1979 and assigned to the same
assignee, this filter arrangement is effective to mask the error in
the high signal energy portions of the spectrum.
The transfer function of filter 121 is expressed in z transform
notation as ##EQU1## where B(z) is controlled by the frame
predictive parameters a.sub.k.
Predictive filter 123 receives the frame predictive parameter
signals from computer 119 and an artificial excitation signal EC
from excitation signal processor 127. Filter 123 has the transfer
function of Equation 1. Filter 121 forms a weighted frame speech
signal y responsive to the predictive residual d.sub.k while filter
123 generates a weighted artificial speech signal y responsive to
the excitation signal from signal processor 127. Signals y and y
are correlated in correlation processor 125 which generates a
signal E corresponding to the weighted difference therebetween.
Signal E is applied to signal processor 127 to adjust the
excitation signal EC so that the differences between the weighted
speech representative signal from filter 121 and the weighted
artificial speech representative signal from filter 123 are
reduced.
The excitation signal is a sequence of 1.ltoreq.i.ltoreq.I pulses.
Each pulse has an amplitude .beta..sub.i and a location m.sub.i.
Processor 127 is adapted to successively form the .beta..sub.i,
m.sub.i signals which reduce the differences between the weighted
frame speech representative signal from filter 121 and the weighted
frame artificial speech representative signal from filter 123. The
weighted frame speech representative signal may be expressed as:
##EQU2## and the weighted artificial speech representative signal
of the frame may be expressed as ##EQU3## where h.sub.n is the
impulse response of filter 121 or filter 123.
The excitation signal formed in circuit 120 is a coded signal
having elements .beta..sub.i, m.sub.i, i=1,2, . . . , I. Each
element represents a pulse in the time frame. .beta..sub.i is the
amplitude of the pulse and m.sub.i is the location of the pulse in
the frame. Correlation signal generator circuit 125 is operative to
successively generate a correlation signal for each element. Each
element may be located at time 1.ltoreq.q.ltoreq.Q in the time
frame. Consequently, the correlation processor circuit forms Q
possible candidates for element i in accordance with Equation 4:
##EQU4## Excitation signal generator 127 receives the C.sub.iq
signals from the correlation signal generator circuit, selects the
C.sub.iq signal having the maximum absolute value and forms the
i.sub.th element of the coded signal ##EQU5## where q* is the
location of the correlation signal having the maximum absolute
value. The index i is incremented to i+1 and signal y.sub.n at the
output of predictive filter 123 is modified. The process in
accordance with Equations 4, 5 and 6 is repeated to form element
.beta..sub.i+1, m.sub.i+1. After the formation of element
.beta..sub.I, m.sub.I, the signal having elements .beta..sub.i
m.sub.1, .beta..sub.2 m.sub.2, . . . , .beta..sub.I m.sub.I is
transferred to coder 131. As is well known in the art, coder 131 is
operative to quantize the .beta..sub.i m.sub.i elements and to form
a coded signal suitable for transmission to network 140.
Each of filters 121 and 123 in FIG. 1 may comprise a transversal
filter of the type described in aforementioned U.S. Pat. No.
4,133,976. Each of processors 125 and 127 may comprise one of the
processor arrangements well known in the art adapted to perform the
processing required by Equations 4 and 6 such as the C.S.P., Inc.
Macro Arithmetic Processor System 100 or other processor
arrangements well known in the art. Processor 125 includes a
read-only memory which permanently stores programmed instructions
to control the C.sub.iq signal formation in accordance with
Equation 4 and processor 127 includes a read-only memory which
permanently stores programmed instructions to select the B.sub.i,
m.sub.i signal elements according to Equation 6 as is well known in
the art. The program instructions in processor 125 are set forth in
FORTRAN language form in Appendix A and the program instructions in
processor 127 are listed in FORTRAN language form in Appendix
B.
FIG. 3 depicts a flow chart showing the operation of processors 125
and 127 for each time frame. Referring to FIG. 3, the h.sub.k
impulse response signals are generated in box 305 responsive to the
frame predictive parameters for the transfer function of Equation
1. This occurs after receipt of the FC signal from clock 103 in
FIG. 1 as per wait box 303. The element index i and the excitation
pulse location index q are initially set to 1 in box 307. Upon
receipt of signals y.sub.n and y.sub.n,i-1 from predictive filters
121 and 123, signal C.sub.iq is formed as per box 309. The location
index q is incremented in box 311 and the formation of the next
location C.sub.iq signal is initiated.
After the C.sub.iQ signal is formed for excitation signal element i
in processor 125, processor 127 is activated. The q index in
processor 127 is initially set to 1 in box 315 and the i index as
well as the C.sub.iq signals formed in processor 125 are
transferred to processor 127. Signal C.sub.iq * which represents
the C.sub.iq signal having the maximum absolute value and its
location q* are set to zero in box 317. The absolute values of the
C.sub.iq signals are compared to signal C.sub.iq * and the maximum
of these absolute values is stored as signal C.sub.iq * in the loop
including boxes 319, 321, 323, and 325.
After the C.sub.iQ signal from processor 125 has been processed,
box 327 is entered from box 325. The excitation code element
location m.sub.i is set to q* and the magnitude of the excitation
code element .beta..sub.i is generated in accordance with Equation
6. The .beta..sub.i m.sub.i element is output to predictive filter
123 as per box 328 and index i is incremented as per box 329. Upon
formulation of the .beta..sub.I m.sub.I element of the frame, wait
box 303 is reentered from decision box 331. Processors 125 and 127
are then placed in wait states until the FC frame clock pulse of
the next frame.
The excitation code in processor 127 is also supplied to coder 131.
The coder is operative to transform the excitation code from
processor 127 into a form suitable for use in network 140. The
prediction parameter signals a.sub.k for the frame are supplied to
an input of multiplexer 135 via delay 133 as prediction signals
a'.sub.k. The excitation coded signal ECS from coder 131 is applied
to the other input of the multiplexer. The multiplexed excitation
and predictive parameter codes for the frame are then sent to
network 140.
Network 140 may be a communication system, the message store of a
voice storage arrangement, or apparatus adapted to store a complete
message or vocabulary of prescribed message units, e.g., words,
phonemes, etc., for use in speech synthesizers. Whatever the
message unit, the resulting sequence of frame codes from circuit
120 are forwarded via network 140 to speech synthesizer 150. The
synthesizer, in turn, utilizes the frame excitation codes from
circuit 120 as well as the frame predictive parameter codes to
construct a replica of the speech pattern.
Demultiplexer 152 in synthesizer 150 separates the excitation code
EC of a frame from the prediction parameters a.sub.k thereof. The
excitation code, after being decoded into an excitation pulse
sequence in decoder 153, is applied to the excitation input of
speech synthesizer filter 154. The a.sub.k codes are supplied to
the parameter inputs of filter 154. Filter 154 is operative in
response to the excitation and predictive parameter signals to form
a coded replica of the frame speech signal as is well known in the
art. D/A converter 156 is adapted to transform the coded replica
into an analog signal which is passed through low-pass filter 158
and transformed into a speech pattern by transducer 160.
An alternative arrangement to perform the excitation code formation
operations to circuit 120 may be based on the weighted means
squared error between signals y.sub.n and y.sub.n. This weighted
mean squared error upon forming .beta..sub.i and m.sub.i for the
i.sup.th excitation signal pulse is ##EQU6## where h.sub.n is the
n.sup.th sample of the impulse response of H(z), m.sub.j is the
location of the j.sup.th pulse in the excitation code signal, and
.beta..sub.j is the magnitude of the j.sup.th pulse.
The pulse locations and amplitudes are generated sequentially. The
i.sup.th element of the excitation is determined by minimizing
E.sub.i in Equation 7. Equation 7 may be rewritten as ##EQU7## so
that the known excitation code elements preceding
.beta..sub.i,m.sub.i appear only in the first term.
As is well known, the value of .beta..sub.i which minimizes E.sub.i
can be determined by differentiating Equation 8 with respect to
.beta..sub.i and setting ##EQU8##
Consequently, the optimum value of .beta..sub.i is ##EQU9## are the
autocorrelation coefficients of the predictive filter impulse
response signal h.sub.k.
.beta..sub.i in Equation 10 is a function of the pulse location and
is determined for each possible value thereof. The maximum of the
.vertline..beta..sub.i .vertline. values over the possible pulse
locations is then selected. After .beta..sub.i and m.sub.i values
are obtained, .beta..sub.i+1 m.sub.i+1 values are generated by
solving Equation 10 in similar fashion. The first term of Equation
10, i.e., ##EQU10## corresponds to the speech representative signal
of the frame at the output of predictive filter 121. The second
term of Equation 10, i.e., ##EQU11## corresponds to the artificial
speech representative signal of the frame at the output of
predictive filter 123. .beta..sub.i is the amplitude of an
excitation pulse at location m.sub.i which minimizes the difference
between the first and second terms.
The data processing circuit depicted in FIG. 2 provides an
alternative arrangement to excitation signal forming circuit 120 of
FIG. 1. The circuit of FIG. 2 yields the excitation code for each
frame of the speech pattern in response to the frame prediction
residual signal d.sub.k and the frame prediction parameter signals
a.sub.k in accordance with Equation 10 and may comprise the
previously mentioned C.S.P., Inc. Macro Arithmetic Processor System
100 or other processor arrangements well known in the art.
Referring to FIG. 2, processor 210 receives the predictive
parameter signals a.sub.k and the prediction residual signals
d.sub.n of each successive frame of the speech pattern from circuit
110 via store 218. The processor is operative to form the
excitation code signal elements .beta..sub.1 m.sub.1, .beta..sub.2,
m.sub.2, . . . , .beta..sub.I, m.sub.I under control of permanently
stored instructions in predictive filter subroutine read-only
memory 201 and excitation processing subroutine read-only memory
205. The predictive filter subroutine of ROM 201 is set forth in
Appendix C and the excitation processing subroutine in ROM 205 is
set forth in Appendix D.
Processor 210 comprises common bus 225, data memory 230, central
processor 240, arithmetic processor 250, controller interface 220
and input-output interface 260. As is well known in the art,
central processor 240 is adapted to control the sequence of
operations of the other units of processor 210 responsive to coded
instructions from controller 215. Arithmetic processor 250 is
adapted to perform the arithmetic processing on coded signals from
data memory 230 responsive to control signals from central
processor 240. Data memory 230 stores signals as directed by
central processor 240 and provides such signals to arithmetic
processor 250 and input-output interface 260. Controller interface
220 provides a communication link for the program instructions in
ROM 201 and ROM 205 to central processor 240 via controller 215,
and input-output interface 260 permits the d.sub.k and a.sub.k
signal to be supplied to data memory 230 and supplies output
signals .beta..sub.i and m.sub.i from the data memory to coder 131
in FIG. 1.
The operation of the circuit of FIG. 2 is illustrated in the filter
parameter processing flow chart of FIG. 4, the excitation code
processing flow chart of FIG. 5, and the timing chart of FIG. 6. At
the start of the speech signal, box 410 in FIG. 4 is entered via
box 405 and the frame count r is set to the first frame by a single
pulse ST from clock generator 103. FIG. 6 illustrates the operation
of the circuit of FIGS. 1 and 2 for two successive frames. Between
times t.sub.0 and t.sub.7 in the first frame, prediction analyzer
110 forms the speech pattern samples of frame r+2 as in waveform
605 under control of the sample clock pulses of waveform 601.
Analyzer 110 generates the a.sub.k signals corresponding to frame
r+1 between times t.sub.0 and t.sub.3 and forms predictive residual
signal d.sub.k between times t.sub.3 and t.sub.6 as indicated in
waveform 607. Signal FC (waveform 603) occurs between times t.sub.0
and t.sub.1. The signals d.sub.k from residual signal generator 118
previously stored in store 218 during the preceding frame are
placed in data memory 230 via input-output interface 260 and common
bus 225 under control of central processor 240. As indicated
operation box 415 of FIG. 4, these operations are responsive to
frame clock signal FC. The frame prediction parameter signals
a.sub.k from prediction parameter computer 119 previously placed in
store 218 during the preceding frame are also inserted in memory
230 as per operation box 420. These operations occur between times
t.sub.0 and t.sub.1 on FIG. 6.
After insertion of the frame d.sub.k and a.sub.k signals into
memory 230, box 425 is entered and the predictive filter
coefficients b.sub.k corresponding to the transfer function of
Equation 1:
are generated in arithmetic processor 250 and placed in data memory
230. p is typically 16 and .alpha. is typically 0.85 for a sampling
rate of 8 KHz. The predictive filter impulse response signals
h.sub.k ##EQU12## are then generated in arithmetic processor 250
and stored in data memory 230. When the h.sub.k impulse response
signal is stored, box 435 is entered and the predictive filter
autocorrelation signals of Equation 11 are generated and
stored.
At time t.sub.2 in FIG. 6, controller 215 disconnects ROM 201 from
interface 220 and connects excitation processing subroutine ROM 205
to the interface. The formation of the .beta..sub.i, m.sub.i
excitation pulse codes shown in the flow chart of FIG. 5 is then
initiated. Between times t.sub.2 and t.sub.4 in FIG. 6, the
excitation pulse sequence is formed. Excitation pulse index i is
initially set to 1 and pulse location index q is set to 1 in box
505. .beta..sub.1 is set to zero in box 510 and operation box 515
is entered to determine .beta..sub.iq =.beta..sub.11. .beta..sub.11
is the optimum excitation pulse at location q=1 of the frame. The
absolute value of .beta..sub.11 is then compared to the previously
stored .beta..sub.1 in decision box 520. Since .beta..sub.1 is
initially zero, the m.sub.i code is set to q=1 and the .beta..sub.i
code is set to .beta..sub.11 in box 525.
Location index q is then incremented in box 530 and box 515 is
entered via decision box 535 to generate signal .beta..sub.12. The
loop including boxes 515, 520, 525, 530 and 535 is iterated for all
pulse location values 1.ltoreq.q.ltoreq.Q. After the Q.sup.th
iteration, the first excitation pulse amplitude .beta..sub.1
=.beta..sub.iq* and its location in the frame m.sub.1 =q* are
stored in memory 230. In this manner, the first of the I excitation
pulses is determined. Referring to waveform 705 in FIG. 7, frame r
occurs between times t.sub.0 and t.sub.1. The excitation code for
the frame consists of 8 pulses. The first pulse of amplitude
.beta..sub.1 and location m.sub.1 occurs at time t.sub.m1 in FIG. 7
as determined in the flow chart of FIG. 5 for index i=1.
Index i is incremented to the succeeding excitation pulse in box
545 and operation box 515 is entered via box 550 and box 510. Upon
completion of each iteration of the loop between boxes 510 and 550,
the excitation signal is modified to further reduce the signal of
Equation 7. Upon completion of the second iteration, pulse
.beta..sub.2 m.sub.2 (time t.sub.m2 in waveform 705) is formed.
Excitation pulses .beta..sub.3 m.sub.3 (time t.sub.m3),
.beta..sub.4 m.sub.4 (time t.sub.m4), .beta..sub.5 m.sub.5 (time
t.sub.m5), .beta..sub.6 m.sub.6 (time t.sub.m6), .beta..sub.7
m.sub.7 (time t.sub.m7), and .beta..sub.8 m.sub.8 (time t.sub.m8),
are then successively formed as index i is incremented.
After the I.sup.th iteration (waveform 609 at t.sub.4), box 555 is
entered from decision box 550 and the current frame excitation code
.beta..sub.1 m.sub.1, .beta..sub.2 m.sub.2, . . . , .beta..sub.ImI
is generated therein. The frame index is incremented in box 560 and
the predictive filter operations of FIG. 4 for the next frame are
started in box 415 at time t.sub.7 in FIG. 6. Upon the occurrence
of the FC clock signal for the next frame at t.sub.7 in FIG. 6, the
predictive parameter signals for frame r+3 are formed (waveform 605
between times t.sub.7 and t.sub.14), the a.sub.k and d.sub.k
signals are generated for frame r+2 (waveform 607 between times
t.sub.7 and t.sub.13), and the excitation code for frame r+1 is
produced (waveform 609 between times t.sub.7 and t.sub.12).
The frame excitation code from the processor of FIG. 2 is supplied
via input-output interface 260 to coder 131 in FIG. 1 as is well
known in the art. Coder 131 is operative as previously mentioned in
quantize and format the excitation code for application to network
140. The a.sub.k prediction parameter signals for the frame are
applied to one input of multiplexer 135 through delay 133 so that
the frame excitation code from coder 131 may be appropriately
multiplexed therewith.
The invention has been described with reference to particular
illustrative embodiments. It is apparent to those skilled in the
art with various modifications may be made without departing from
the scope and the spirit of the invention. For example, the
embodiments described herein have utilized linear predictive
parameters and a predictive residual. The linear predictive
parameters may be replaced by formant parameters or other speech
parameters well known in the art. The predictive filters are then
arranged to be responsive to the speech parameters that are
utilized and to the speech signal so that the excitation signal
formed in circuit 120 of FIG. 1 is used in combination with the
speech parameter signals to construct a replica of the speech
pattern of the frame in accordance with the invention. The encoding
arrangement of the invention may be extended to sequential patterns
such as biological and geological patterns to obtain efficient
representations thereof. ##SPC1##
* * * * *