Method And Apparatus For Signal Bandwidth Compression Utilizing The Fourier Transform Of The Logarithm Of The Frequency Spectrum Magnitude

Manley , et al. August 1, 1

Patent Grant 3681530

U.S. patent number 3,681,530 [Application Number 05/046,128] was granted by the patent office on 1972-08-01 for method and apparatus for signal bandwidth compression utilizing the fourier transform of the logarithm of the frequency spectrum magnitude. This patent grant is currently assigned to GTE Sylvania Incorporated. Invention is credited to Harold J. Manley, Harry L. Shaffer.


United States Patent 3,681,530
Manley ,   et al. August 1, 1972

METHOD AND APPARATUS FOR SIGNAL BANDWIDTH COMPRESSION UTILIZING THE FOURIER TRANSFORM OF THE LOGARITHM OF THE FREQUENCY SPECTRUM MAGNITUDE

Abstract

A bandwidth compression system such as a digital vocoder including an analysis section employs a transducer to convert an input speech wave into an electrical signal which is then digitized by an analog to digital converter. The digitized signal is directed through a spectrum device where the magnitudes of the frequency spectrum of the input speech wave are obtained. These magnitudes are then directed to a logging circuit to obtain the logarithm of the frequency spectrum magnitudes of the input speech signal. The logged magnitudes of the frequency spectrum are then directed to a computer where the discrete Fourier transform of the logged spectrum magnitudes are obtained to form the Fourier transform of the logarithm of the frequency spectrum magnitude (FTLSM) of the input speech signal. An encoding unit selects and encodes certain ones of the FTLSM coefficients for transmission to a remote terminal for analysis. The encoded signals include pitch data and vocal tract impulse data, both of which are derived from the FTLSM signals. The analysis section of a vocoder terminal employs a decoding device which decodes the received data and separates it into pitch data and vocal tract impulse data. Connected to the decoding device is a computing device for computing the logarithm of the spectrum envelope of the vocal tract impulse response function using the discrete Fourier transform. The logged spectrum is directed through a delogging device to a fast Fourier transform (FET) computer where the Fourier sine transform of the received spectrum signals (the impulse response) are obtained. A convolution unit then convolves the pitch data with the impulse response data to yield the desired synthesized speech signal.


Inventors: Manley; Harold J. (Sudbury, MA), Shaffer; Harry L. (Lynnfield, MA)
Assignee: GTE Sylvania Incorporated (N/A)
Family ID: 21941776
Appl. No.: 05/046,128
Filed: June 15, 1970

Current U.S. Class: 704/203; 704/207; 704/224
Current CPC Class: G10L 19/02 (20130101)
Current International Class: G10L 19/00 (20060101); G10L 19/02 (20060101); G10l 001/02 (); G10l 001/08 ()
Field of Search: ;179/15A,15.55R ;324/77C,77F

References Cited [Referenced By]

U.S. Patent Documents
3448216 June 1969 Kelly
3566035 February 1971 Noll
3344349 September 1967 Schroeder
3403227 September 1968 Malm
3330910 July 1967 Flanagan
3493684 February 1970 Kelly
3471648 October 1969 Miller

Other References

Noll, Short-Time Spectrum and Cepstrum Techniques for Vocal Pitch Detection, J.A.S.A. 2/1964 p. 296-302. .
Shively, A Digital Processor to Generate Spectra in Real Time, IEEE Trans. on Computers, 5/1968 p. 485-491..

Primary Examiner: Claffy; Kathleen H.
Assistant Examiner: Leaheey; Jon Bradford

Claims



1. A bandwidth compression system including an analysis section comprising:

means for generating electrical signals representing the Fourier transform of the logarithm of the magnitudes of the spectrum of an input signal, said input signal having excitation and impulse response information included therein;

first detection means coupled to said means for generating electrical signals and being operative to provide from said electrical signals an output signal representing the excitation information of said input signal; and

second detection means coupled to said means for generating electrical signals and being operative to separate out a predetermined portion of said electrical signals, said predetermined portion representing the

2. A processor according to claim 1 including a synthesis section comprising:

impulse response means coupled to said second detection means and being operative in response to the predetermined portion of said electrical signals to generate an output signal corresponding to the impulse response information;

excitation means coupled to said first detection means and being operative in response to the output signal from said first detection means to generate an excitation carrier signal; and

convolution means having input connections from said impulse response means and from said excitation means and being operative to convolve the output signals from said impulse response means and from said excitation means to

3. A digital vocoder including an analysis section comprising:

means for obtaining spectrum magnitude signals of an input speech signal having voicing and vocal tract information;

logging means coupled to said means for obtaining spectrum magnitude signals and being operative to generate output signals representing the logarithm of the spectrum magnitude of the input speech signal;

first Fourier transform means coupled to said logging means and being operative to generate output signals having magnitude and positions and representing the Fourier transform of the logarithm of spectrum magnitudes of the input speech signal;

pitch detection logic means coupled to said Fourier transform means and being operative to extract a pitch signal from the output signal of said first Fourier transform means, said pitch signal having a magnitude representing the voicing information of the input speech signal; and

selecting means coupled to said first Fourier transform means and being operative to select a predetermined number of the output signals of said first Fourier transform means, said predetermined number of output signals

4. A digital vocoder according to claim 3 including an encoding means coupled to said selecting means and being operative to quantize at a predetermined rate and scale by a predetermined factor each of the predetermined number of output signals of said Fourier transform means

5. A digital vododer according to claim 3 including a synthesis section comprising:

second Fourier transform means being operative in response to the selected output signals of said first Fourier transform means to generate output signals representing the Fourier transform of said selected output signals of said first Fourier transform means;

delogging means coupled to said second Fourier transform means and being operative to generate output signals representing the antilogarithm of the output signals of said second Fourier transform means;

third Fourier transform means coupled to said delogging means and being operative to generate output signals representing the vocal tract information of the input speech signal;

pitch carrier generator coupled to said pitch detection logic means and being operative in response to said pitch signal to generate pitch carrier signals having predetermined rates; and

convolution unit coupled to said third Fourier transform means and to said pitch carrier generator and being operative to combine the output signals of said third Fourier transform means and the pitch carrier signals from said pitch carrier generator to thereby generate the synthesized version

6. A digital vocoder according to claim 3 wherein said means for obtaining the spectrum magnitude signals of an input speech signal includes:

transducer means being operative to convert said input signal into an electrical input speech signal;

an analog to digital converter connected to said transducer means and being operative to convert said electrical input speech signal into a digital speech signal;

computer means coupled to said analog to digital converter and being operative to generate real and imaginary signals representing the spectrum of the digital speech signal; and

a magnitude computation circuit connected to said computer means and being operative to combine in a predetermined manner said real and imaginary signals to generate the spectrum magnitude signals of said input speech

7. A digital vocoder according to claim 6 further including a normalization unit connected between said analog to digital converter means and said computer means and being operative to change the level of the input signals a predetermined factor to maintain the peak value of the digital speech signal to said computer means within a predetermined dynamic range.

8. A digital vocoder according to claim 6 further including a weighting function circuit connected between said analog to digital converter means and said computer means and being operative to weight the digital speech

9. A digital vocoder according to claim 3 wherein said pitch detection logic means includes:

selection means having an input connection from said first Fourier transform means and being operative to select the output signal of said first Fourier transform means having the largest magnitude;

first comparator means having an input connection from said selection means and a first and second output connection, said first comparator means being operative to compare the magnitude of the selected output signal of said selection means to a predetermined threshold level and to generate an output signal at said first output connection if the magnitude of said selected output signal exceeds the predetermined threshold level and to generate a predetermined output signal at said second output connection if the magnitude of said selected output signal is less than the predetermined threshold level; and

buffer storage means having a first input connection connected to the common juncture of said selection means and said first comparator means, a second input connection connected to the first output connection of said first comparator means and an output terminal and being operative to store the output signal from said selection means and to shift the stored signal to the output terminal upon the receipt of a signal from said first comparator, means,

whereby an unvoiced speech signal is indicated when said first comparator means has an output signal at said second output connection and a voiced speech signal is indicated when the output signal of said first Fourier transform means is shifted to the output of said buffer storage means.

10. A digital vocoder according to claim 9 further including means for determining voicing information having input connections connected to said means for obtaining spectrum magnitude signals and the first output connection of said first comparator means, a first output connection connected to the second input connection of said buffer storage means and a second output connection and being operative in response to the spectrum magnitude signals to provide an output at said first output connection when said spectrum magnitude signals include a voiced signal and to provide an output signal at said second output connection when said

11. A digital vocoder according to claim 10 wherein said means for determining voicing information comprises:

means connected to said means for obtaining spectrum magnitude signals for computing a first output signal representing the low-band energy of the spectrum magnitude signals and a second output signal representing the high-band energy of the spectrum magnitude signals;

means for combining said first output signal representing the low-band energy with said second output signal representing the high-band energy to form a composite signal representing the ratio of said first and second output signals;

second comparator means having an input connection coupled to said means for computing, an output connection, and a predetermined threshold level and being operative to generate an output signal at its input connection when the output signal representing the low-band energy is greater than its predetermined threshold level;

third comparator means having an input connection coupled to said means for combining, an output connection and a predetermined threshold level and being operative to generate an output signal at its output connection when said composite signal representing the ratio of said first and second output signals is greater than its predetermined threshold level; and

fourth comparator means having a first input connection coupled to the output connection of said second comparator means, a second input connection coupled to the output connection of said third comparator means and a first output connection coupled to said buffer storage means and a second output connection and being operative to generate a signal at its first output connection when two predetermined signals are received at its first and second input connections, respectively, and to generate a signal at its second output connection when only one predetermined signal is

12. A digital vocoder according to claim 7 further including a denormalizing unit coupled to said normalization unit and to said first Fourier transform means and being operative to alter the magnitude of the output signal of said first Fourier transform means in a predetermined manner related to the predetermined factor of said normalization unit.

13. A digital vocoder according to claim 12 wherein said denormalizing unit is a computer capable of solving the equation

C.sub.o = C'.sub.o - 16 .sqroot. N log.sub.2 (G.sub.N)

where C.sub.o is the altered magnitude, C'.sub.o is the unaltered magnitude, N is the selected predetermined number of output signals from said first Fourier transform means and G.sub.N is the predetermined factor

14. A digital vocoder according to claim 4 wherein said encoding means comprises:

scaling factor storage means operative to store a predetermined scaling factor for each of the predetermined number of output signals of said first Fourier transform means;

scaling means coupled to said scaling factor storage means and to selecting means and being operative to add each of the predetermined scaling factors to a separate one of the predetermined number of output signals of said first Fourier transform means to eliminate negative values in said predetermined number of output signals;

ratio storage means operative to store a predetermined ratio signal for each of the predetermined number of output signals of said first Fourier transform means; and

multiplier means coupled to said scaling means and said ratio storage means and being operative to multiply each of the scaled output signals of said scaling means by a corresponding ratio signal stored in said ratio storage means to thereby quantize each of the predetermined numbers of output

15. A digital vocoder according to claim 14 further including gating means coupled to said multiplier means and being operable to gate certain ones of said predetermined number of output signals of said first Fourier transform means at a first predetermined rate and to gate the remainder of the output signals of said first Fourier transform means at a second

16. A digital vocoder according to claim 5 wherein said second Fourier transform means is a Fourier transform computer means operable to solve the expression

where V.sub.n is the n.sup.th frequency sample of the selected output signals of said first Fourier transform means, C.sub.k is the k.sup.th sample of the selected output signals of said first Fourier transform

17. A digital vocoder according to claim 5 wherein said pitch carrier generator includes:

first means responsive to said pitch signal from said pitch detection logic means for generating a first predetermined pitch carrier signal when the magnitude of the pitch signal indicates a voiced signal;

second means responsive to said pitch signal from said pitch detection logic means for generating a second predetermined pitch carrier signal when the magnitude of the pitch signal indicates an unvoiced signal; and

gating means coupled to said first and second means for generating and being operative to gate a first predetermined pitch carrier signal to said convolution means when the magnitude of the pitch signal is less than a predetermined magnitude and to gate a second predetermined pitch carrier signal to said convolution means when the magnitude of the pitch signal is

18. A digital vocoder according to claim 17 wherein said first means for generating includes:

third means for generating signals, the magnitudes of which describe a predetermined function;

fourth means for generating signals, the magnitudes of which describe the slope of a line connecting the magnitudes of two successive pitch signals from said pitch detection means; and

first comparator means having input connections coupled to said third and fourth means for generating and an output connection coupled to said gating means and being operative to generate a first predetermined pulse when the signals from said fourth means for generating are equal to or greater than the magnitude of the signal from said third means for

19. A digital vocoder according to claim 18 including an inhibiting means responsive to said pitch signal from said pitch detection logic means to inhibit the second predetermined pitch carrier signal of said second means

20. A digital vocoder according to claim 18 wherein said third means for generating signals includes:

first storage counter means having a first input connection and an output connection and being operative to store a first predetermined signal, to add to said first predetermined signal a second predetermined signal appearing at said first input connection and to supply the resultant signal to said output connection;

slope means for generating a third predetermined signal; and

first summation means having a first input connection coupled to the output connection of said first storage counter means, a second input connection coupled to said slope means and an output connection coupled to said first input connection of said storage counter means and to said gating means of said pitch carrier generators, said first summation means being operative to add the resultant signal of said first storage counter means to the third predetermined signal from said slope means to form said second predetermined signal and to direct said second predetermined signal simultaneously to said gating means of said pitch carrier generator and to said first storage counter means to update said first predetermined signal

21. A digital vocoder according to claim 18 wherein said fourth means for generating signals includes:

means for computing a slope signal m wherein m = T.sub.p (n- 1) - T.sub.pn /T, where T.sub.p is a first pitch signal received from said pitch detection logic at a first predetermined time, T.sub.p (n- 1) is a second pitch signal received from said pitch detection logic at a second predetermined time and T is the elapsed time between said first and second predetermined times;

second storage counter means having a first input connection and an output connection and being operative to store a first predetermined signal, to add to said first predetermined signal a second predetermined signal appearing at said first input connection and to supply the resultant signal to said output connection; and

second summation means having a first input connection coupled to the output connection of said second storage counter means, a second input connection coupled to said means for computing a slope signal and an output connection coupled to said first input connection of said second storage counter means and to said gating means of said pitch carrier generator, said second summation means being operative to add the resultant signal of said second storage counter means to the slope signal from said means for computing a slope signal to form said second predetermined signal and to direct said second predetermined signal to said gating means of said pitch carrier generator and to said second storage means to update said first predetermined signal stored therein.

22. A digital vocoder according to claim 5 including a weighting circuit having an input connection coupled to said third Fourier transform means and an output connection coupled to said convolution means and being operative to apply weighting function signals to the output signals of said third Fourier transform means to thereby improve the quality of the

23. A digital vocoder according to claim 22 wherein the weighting circuit includes:

a masking circuit having an input connection coupled to said third Fourier transform means and being operative to select a predetermined number of the output signals of said third Fourier transform means;

weighting function storage means being operative to store a predetermined number of signals corresponding to the predetermined number of output signals selected by said masking circuit; and

multiplier means having input connections coupled to said masking circuit and to said weighting function storage means and an output connection coupled to said convolution means and being operative to multiply each of the predetermined number of output signals selected by said masking circuit by a different one of the predetermined number of signals stored in said weighting function storage means to thereby weight the vocal tract

24. A digital vocoder according to claim 5 wherein said convolution unit includes:

logic means having a first input connection coupled to said pitch carrier generator, a second input connection coupled to said third Fourier transform means, and first, second, third and fourth output connections, said logic means being operative in response to a first predetermined time period to provide a data path from said first and second input connections to said first and second output connections, respectively, and being operative in response to a second predetermined time period to provide a data path from said first and second input connections to said third and fourth output connections respectively;

first storage means having first and second input connections coupled respectively to said first and second output connections of said logic means and a plurality of output connections, said first storage means being operative to store the output signals representing the vocal tract information received from the third Fourier transform means via the data path established by said logic means during said first predetermined time period and to gate from a different one of said plurality of output connections a complete set of vocal tract signals upon the receipt of each signal from said pitch carrier generator during said first predetermined time period;

second storage means having first and second input connections coupled respectively to said third and fourth output connections of said logic means and a plurality of output connections, said second storage means being operative to store the output signals representing the vocal tract information received from the third Fourier transform means via the data path established by said logic means during said second predetermined time period and to gate from a different one of said plurality of output connections a complete set of vocal tract signals upon receipt of each signal from said pitch carrier generator during said second predetermined time period; and

summing means having a plurality of input connections each coupled to one of said plurality of output connections of said first and second storage means and being operative to add the vocal tract signals from said first and second storage means whereby a synthesized version of the input speech

25. A vocoder system for synthesizing a first speech signal and analyzing a second speech signal simultaneously, said first and second speech signals including voicing and vocal tract information, said digital vocoder comprising:

means for generating a pitch carrier signal from said first speech signal;

means for obtaining the frequency spectrum magnitude signals of said first speech signal;

means coupled to said means for obtaining the frequency spectrum magnitudes of said first speech signal for converting the frequency spectrum magnitudes into signals having a first predetermined symmetry;

means for obtaining the frequency spectrum magnitudes of a second speech signal;

means coupled to said means for obtaining the frequency spectrum magnitudes of said second speech signal for generating signals having a second predetermined symmetry and representing the logarithm of the frequency spectrum magnitudes of said second speech signal;

summing means coupled to said means for converting and to said means for generating and being operative to sum said signals having a first predetermined symmetry and said signals having a second predetermined symmetry to form a composite signal;

computing means having an input connection coupled to said summing means and first and second output connections, said computing means being operative to compute a first and second set of signals representing the complex Fourier transform of said composite signal, said first set of signals having said first predetermined symmetry and being directed to said first output connection and said second set of signals having said second predetermined symmetry and being directed to said second output connection;

convolution means coupled to said means for generating a pitch carrier signal and to said first output connection of said computing means and being operative to combine in a predetermined manner said pitch carrier signal and said first set of signals having said first predetermined symmetry to thereby generate a synthesized version of said first speech signal;

pitch detection means coupled to said second output connection of said computing means and being operative to extract the voicing information of said second speech signal from said second set of signals having said second predetermined symmetry; and

selection means coupled to said second output connection of said computing means and being operative to select a predetermined number of said set of signals having said second predetermined symmetry, said selected signals representing the vocal tract information of said second speech signal.

26. A vocoder system for synthesizing a first speech signal and analyzing a second speech signal simultaneously with said first and second speech signals including voicing and vocal tract information, said digital vocoder system comprising:

means for generating a pitch carrier signal from said first speech signal;

means for obtaining the frequency spectrum magnitudes of said first speech signal;

means coupled to said means for obtaining the frequency spectrum magnitude of said first speech signal for converting the frequency spectrum magnitudes into signals having a first predetermined symmetry;

computing means having first and second input ports and first, second, third and fourth output ports and being operable to compute simultaneously the Fourier transform of a set of first predetermined input signals at said first input ports, said set of first predetermined signals having a composite symmetry of said first and second predetermined symmetries and the Fourier transform of a set of second predetermined input signals at said second input port, said set of second predetermined signals having first and second predetermined symmetries and operable to direct to said first, second, third and fourth output ports respectively a first set of output signals representing the Fourier transform of the portion of the set of first predetermined input signals having the second predetermined symmetry, a second set of output signals representing the Fourier transform of the portion of the set of first predetermined input signals having the first predetermined symmetry, a third set of output signals having the first predetermined symmetry and representing the Fourier transform of the portion of the set of second predetermined input signals at said second input port and a fourth set of output signals representing the Fourier transform of the portion of the set of second predetermined input signals having the second predetermined symmetry;

sampling means having an output connection coupled to said first input port of said computing means and being operable to sample said second speech signal over a first predetermined time interval, said first and second sets of output signals of said computing means representing the spectrum of said sampled second input speech signal;

magnitude means coupled to said first and second output ports of said computing means and being operative to combine in a predetermined manner said first and second sets of output signals of said computing means to generate signals representing the frequency spectrum magnitudes of said second speech signal;

means coupled to said magnitude means for generating output signals having a second predetermined symmetry and representing the logarithm of the frequency spectrum magnitudes of said second speech signal;

summing means having input connections coupled to said means for converting and to said means for generating and an output connection coupled to said second input port of said computing means and being operative to sum said signals having a first predetermined symmetry with said signals having said second predetermined symmetry to form said set of second predetermined input signals,

whereby said third set of output signals of said computing means represents the vocal tract information of said first speech signal and said fourth set of output signals of said computing means is the Fourier transform of the logarithm of the spectrum magnitudes representing the voicing and vocal tract data of said second speech input signal;

pitch detection logic means coupled to the fourth output port of said computing means and being operative to extract a pitch signal from the fourth set of output signals of said computing means to thereby represent the voicing information of said second input speech signal;

selecting means coupled to the fourth output port of said computing means and being operative to select a predetermined number of the fourth set of output signals to represent the vocal tract information of said second input speech signal; and

convolution means coupled to said means for generating a pitch carrier signal from said first speech signal and to the third output port of said computing means and being operative to combine in a predetermined manner the pitch carrier signals with the third set of output signals of said

27. A vocoder system according to claim 26 wherein said means for generating a pitch carrier signal includes:

first means responsive to said first speech signal for generating a first predetermined pitch carrier signal when the magnitude of the pitch signal indicates a voiced signal;

second means responsive to said pitch signal from said pitch detection logic means for generating a second predetermined pitch carrier signal when the magnitude of the pitch signal indicates an unvoiced signal; and

gating means coupled to said first and second means for generating and being operative to gate a first predetermined pitch carrier signal to said convolution means when the magnitude of the pitch signal is less than a predetermined magnitude and to gate a second predetermined pitch carrier signal to said convolution means when the magnitude of the pitch signal is

28. A vocoder system according to claim 27 wherein said first means for generating includes:

third means for generating signals, the magnitudes of which describe a predetermined function;

fourth means for generating signals, the magnitudes of which describe the slope of a line connecting the magnitudes of the voiced information of two successive first input signals; and

first comparator means having input connections coupled to said third and fourth means for generating and an output connection coupled to said gating means of said means for generating a pitch carrier signal and being operative to generate a first predetermined pulse when the signals from said fourth means for generating are equal to or greater than the

29. A vocoder system according to claim 28 including an inhibiting means responsive to said voicing information of said first input speech signal to inhibit the second predetermined pitch carrier signal of said second means for generating when the voicing information exceeds a predetermined

30. A vocoder system according to claim 29 wherein said third means for generating signals includes:

first storage counter means having a first input connection and an output connection and being operative to store a first predetermined signal, to add to said first predetermined signal a second predetermined signal appearing at said first input connection and to supply the resultant signal to said output connection;

slope means for generating a third predetermined signal; and

first summation means having a first input connection coupled to the output connection of said first storage counter means, a second input connection coupled to said slope means and an output connection coupled to said first input connection of said storage counter means and to said gating means of said means for generating a carrier generator, said first summation means being operative to add the resultant signal of said first storage counter means to the third predetermined signal from said slope means to form said second predetermined signal and to direct said second predetermined signal simultaneously to said gating means of said means for generating a pitch carrier and to said first storage counter means to update said first

31. A vocoder system according to claim 30 wherein said fourth means for generating signals includes:

means for computing a slope signal m wherein m = T.sub.p (n-1) - T.sub.pn /T, where T.sub.p is a first voicing signal received from said first input speech signal at a first predetermined time, T.sub.p (n-1) is a voicing signal received from said first input speech signal at a second predetermined time and T is the elapsed time between said first and second predetermined times;

second storage counter means having a first input connection and an output connection and being operative to store a first predetermined signal, to add to said first predetermined signal a second predetermined signal appearing at said first input connection and to supply the resultant signal to said output connection; and

second summation means having a first input connection coupled to the output connection of said second storage counter means, a second input connection coupled to said means for computing a slope signal and an output connection coupled to said first input connection of said second storage counter means and to said gating means of said means for generating a pitch carrier signal, said second summation means being operative to add the resultant signal of said second storage counter means to the slope signal from said means for computing a slope signal to form said second predetermined signal and to direct said second predetermined signal to said gating means of said means for generating a pitch carrier signal and to said second storage counter means to update said first

32. A vocoder system according to claim 25 wherein said means for obtaining the frequency spectrum magnitude of said first speech signal includes:

Fourier transform computer means operable to solve the expression

where V.sub.n is the n.sup.th frequency sample of said first speech signal, C.sub.k is the k.sup.th sample of said first speech signal and k and R are predetermined limits of summation; and

delogging computer means operative to obtain the antilogarithm of said expression to yield the frequency spectrum magnitude of said first speech

33. A vocoder system according to claim 26 wherein:

said first and second input ports of said computing means are real and imaginary input ports respectively;

said first and second predetermined symmetries of said set of first predetermined input signals are even and odd symmetries respectively;

said set of first predetermined input signals includes 256 samples of said input speech signal;

said first set of output signals at said first output port of said computing means includes 128 samples having even symmetry and representing the Fourier transform of the even portion of the 256 samples at the real input port of said computing means;

said second set of output signals at said second output port of said computing means includes 128 samples representing the Fourier transform of the portion of the 256 input samples at said real input port having odd symmetry, said first and second sets of output signals representing, respectively, the real and imaginary parts of the frequency spectrum of the 256 samples of the second input speech signal at the real input port of said computing means;

said set of second predetermined input signals at said imaginary input port of said computing means includes 256 samples having even and odd symmetry associated therewith, said even symmetry portion representing the logarithm of the spectrum magnitudes of the second input speech signal and said odd symmetry portion representing the frequency spectrum of the first input speech signal;

said third set of output signals at the third output port of said computing means includes 128 samples having odd symmetry and representing the Fourier transform of the odd symmetry portion of 256 samples at said imaginary input port of said computing means, said 128 samples at said third output port of said computing means represents the vocal tract information of said first speech signal; and

said fourth set of output signals at the fourth output port of said computing means includes 128 samples having even symmetry and representing the Fourier transform of the logarithm of the spectrum magnitudes from which the vocal tract and the voicing information of the second input

34. A vocoder system according to claim 33 wherein said pitch detection logic means includes:

selection means having an input connection coupled to said fourth output port of said computing means and being operative to select the sample of said fourth set of output signals having the largest magnitude;

first comparator means having an input connection coupled to said selection means and a first and second output connection, said first comparator means being operative to compare the magnitude of the selected output signal of said selection means to a predetermined threshold level and to generate an output signal at said first output connection if the magnitude of said selected sample exceeds the predetermined threshold level and to generate a predetermined output signal at said second output connection if the magnitude of said selected sample is less than the predetermined threshold level; and

buffer storage means having a first input connection connected to the common juncture of said selection means and said first comparator means, a second input connection connected to the first output connection of said first comparator means and an output terminal and being operative to store the selected sample from said selection means and to shift the stored sample to said output terminal upon receipt of a signal from said first comparator means,

whereby an unvoiced second speech signal is indicated when said first comparator means has an output signal at said second output connection and a voiced signal is indicated when the fourth output signal of said computing means is shifted to the output of said buffer storage means.

35. A vocoder system according to claim 33 wherein said convolution unit includes:

logic means having a first input connection coupled to said means for generating a pitch carrier signal, a second input connection coupled to the third output port of said computing means and first, second, third and fourth output connections, said logic means being operative in response to a first predetermined time period to provide a data path from said first and second input connections to said first and second output connections, respectively, and being operative in response to a second predetermined time period to provide a data path from said first and second input connections to said third and fourth output connections respectively;

first storage means having first and second input connections coupled respectively to said first and second output connections of said logic means and a plurality of output connections, said first storage means being operative to store the output signals representing the vocal tract information received from the third output port of said computing means via the data path established by said logic means during said first predetermined time period and to gate from a different one of said plurality of output connections a complete set of vocal tract signals upon the receipt of each signal from said means for generating a pitch carrier signal during said first predetermined time period;

second storage means having first and second input connections coupled respectively to said third and fourth output connections of said logic means and a plurality of output connections, said second storage means being operative to store the output signals representing the vocal tract information received from the third output port of said computing means via the data path established by said logic means during said second predetermined time period and to gate from a different one of said plurality of output connections a complete set of vocal tract signals upon receipt of each signal from said pitch carrier generator during said second predetermined time period; and

summing means having a plurality of input connections each coupled to one of said plurality of output connections of said first and second storage means and being operative to add the vocal tract signals from said first and second storage means whereby a synthesized version of the first input

36. A method of compressing the bandwidth of an input signal having an excitation portion and an impulse response portion comprising the steps of:

generating a time variant electrical signal representing the Fourier transform of the logarithm of the spectrum magnitude of the input signal;

separating out a first time interval signal of said time variant electrical signal to represent the impulse response portion of the input signal; and

separating out a second time interval signal of said time variant electrical signal to represent the excitation portion of the input signal, said first and second time interval signals of said time variant

37. A method of simultaneously synthesizing a first speech signal and analyzing a second speech signal, said first and second speech signals including voicing and vocal tract data, said method comprising the steps of:

generating a pitch carrier signal from said first speech signal;

generating the frequency spectrum magnitude signals of said first speech signal;

converting the frequency spectrum magnitude signals into signals having a first predetermined symmetry;

generating the frequency spectrum magnitude signals of the second speech signal;

converting the frequency spectrum magnitude signals of the second speech signal into a series of signals having a second predetermined symmetry and representing the logarithm of the frequency spectrum magnitudes of said second speech signal;

combining the signals having the first predetermined symmetry with the series of signals having the second predetermined symmetry to generate a series of composite signals;

generating from said series of composite signals first and second sets of signals representing the complex Fourier transform of the composite signal, said first set of signals having said first predetermined symmetry and said second set of signals having said second predetermined symmetry;

combining the pitch carrier signal from said first speech signal and said first set of signals having said first predetermined symmetry to thereby generate a synthesized version of the first speech signal;

selecting a predetermined number of said second set of signals to represent the vocal tract data of said second input speech signal; and

selecting a predetermined number of the remaining signals of said second set of signals to represent the voicing information of said second input speech signal.
Description



BACKGROUND OF THE INVENTION

This invention relates to speech compression systems and in particular to digital vocoder systems.

It is well-known that the vocal tract, consisting of throat, mouth, tongue, lips, teeth and nasal passages, forms a time varying linear filter in which the amplitude response versus frequency characteristics is responsible for practically all the information content in a speech signal. This filter is driven by energy sources, commonly known as "buzz" and "hiss" energy sources.

The term "buzz" is associated with the type of vocal source excitation function which exists when the vocal cords are oscillating at some quasi-periodic rate (called the pitch). Under this condition the chest cavity is supplying puffs of air to the vocal tract at the quasi-periodic rate at which the vocal cords are oscillating. The term "hiss" is associated with the type of vocal source excitation which exists when the vocal cords are not oscillating in a quasi-periodic manner but are always allowing air to pass through from the chest cavity and excite the vocal tract.

For voiced sounds, e.g., vowels, the excitation is from the buzz energy source. For unvoiced sounds, e.g., ss, sh, f and whispered speech, the excitation is from the hiss source. The information content is impressed upon the speech signal by the vocal tract acting essentially as a time varying distributed constant linear filter. Thus, to recreate speech which is both intelligible and natural sounding, it is necessary to use both the information describing the time varying spectral shape and the information describing the buzz and hiss energy sources. The latter information generally takes the form of measurements of the fundamental frequency of the buss sources as a function of time (pitch extraction). Information as to whether the excitation is buzz or hiss is used by the speech compression system. Combinations of buzz and hiss excitation are used to generate some sounds, but speech compression systems do not generally try to detect the combined excitation. A decision is usually made as to whether to use buzz or hiss excitation for this combined excitation in a speech compression system of this type.

Speech compression systems using spectral analysis are generally called vocoders. In existing speech compression systems, the spectrum data are transmitted by digitally encoding the logarithm of about 16 voltage spectrum amplitude which are derived from a filter bank spectrum analyzer. This method is known to be inefficient because of the high correlations among the various spectrum amplitudes. Various techniques are now used to remove these correlations and therefore reduce the required data rate for a given transmission fidelity. One approach which produces some improvements is the use of a delta pulse code modulation scheme in which only the decibel differences in level between adjacent frequency channels are transmitted. Another scheme is to form weighted sums of the logged, digitized spectrum amplitudes, the weighting being arranged so that cross-correlation of the speech wave against a waveform derived from the input speech are markedly reduced.

Another type of vocoder is called the autocorrelation vocoder which derives its name from the fact that in the first step of the analysis process the autocorrelation function of the speech input is measured in terms of orthonormal functions. Just as the power spectrum of the speech input varies with time (as a talker articulates various sounds), so does the autocorrelation function. There is a one-to-one correspondence between the power spectrum and the autocorrelation function of the speech signal so that measuring one is equivalent to measuring the other. Mathematically, the power spectrum and the autocorrelation function are Fourier transform pairs. Thus, autocorrelation is simply an alternative method of measuring the short time energy spectrum of the speech signal. In an autocorrelation vocoder, the input signal is first applied to the inputs of a set of orthogonal filters. The filter output signals are multiplied by the input speech signal, and the product signal is then directed through low pass filters. The output signals from the low pass filter are the coefficients in an expansion of the power spectrum.

The power spectrum P( f) of a speech signal is the product of the power spectrum of a pitch excitation, V(f), and the magnitude squares .vertline.H(f).vertline..sup.2 of a vocal tract transfer function H(f).

P(f) =.vertline.H(f).vertline..sup.2 V(f) (1)

As stated above, the autocorrelation function is the Fourier transform of P(f) and is composed of the convolution of the transform of .vertline.H(f).vertline..sup.2 and V(f). Practically, this means that the autocorrelation function repeats itself at multiples of the pitch period, and it is necessary to represent the vocal tract out to fairly large delay values (near one-half of a pitch period) in order to represent the speech spectrum with any fidelity. The overlap of successive autocorrelation functions due to convolution properties raises some doubt as to the validity of the values of the autocorrelation function alone as a measure of the vocal tract shape. While the autocorrelation vocoder obtains nearly independent spectral measurements, it does not solve the problem caused by confounding the spectral envelope (vocal tract) data with the excitation spectrum data, which results in higher order transmitted coefficients. Furthermore, this type of vocoder is basically an analog device yielding an output consisting of voltage spectrum values which are subsequently digitized.

SUMMARY OF THE INVENTION

Briefly, a bandwidth compression system according to the present invention includes a means for generating an electrical signal representing the Fourier transform of the logarithm of the spectrum magnitudes (FTLSM) of an input signal having excitation and impulse response information included therein. A first detection means, coupled to the means for generating the FTLSM electrical signal, is operative to separate out a first predetermined portion of the FTLSM electrical signal to represent the excitation information of the input signal. A second detection means, also coupled to the means for generating the FTLSM electrical signal, is operative to separate out a second predetermined portion of the FTLSM electrical signal to represent the impulse response information of the input signal. The bandwidth required to pass the combined first and second predetermined portions is less than the bandwidth required to pass the input signal.

The bandwidth compression system further includes a synthesis section comprising am impulse response means coupled to the second detection means and being operative in response to the predetermined number of the first set of predetermined signals to generate an output signal corresponding to the impulse response information. An excitation means, coupled to the first detection means, is operative in response to the output signal from the first detection means to generate an excitation carrier signal. A convolution means having input connections from the excitation means and the impulse response means is operative to convolve the output signals from the impulse response means and the excitation means to thereby synthesize the speech signal.

A second embodiment of a bandwidth compression system according to the present invention is operative to simultaneously synthesize a first speech signal, for example, one received from a remote terminal, and analyze a second speech signal, for example, one to be transmitted at reduced bandwidth to a remote terminal. The system includes means for generating a pitch carrier signal from the first speech signal and means for obtaining the frequency spectrum magnitude of the first speech signal. Coupled to the means for obtaining the frequency spectrum magnitude of the first speech signal is a means for converting the frequency spectrum magnitudes into signals having a first predetermined symmetry.

A computing means, having first and second input ports and first, second, third and fourth output ports, is operative to compute simultaneously the Fourier transform of a set of first predetermined input signals at the first input port, the set of first predetermined input signals having a composite symmetry of the first and second predetermined symmetries, and the Fourier transform of a set of second predetermined input signals at said second input port, the set of second predetermined signals having the first and second predetermined symmetries. The computing means directs to the first, second, third and fourth output ports, respectively, a first set of output signals representing the Fourier transform of the portion of the first predetermined input signals having the second predetermined symmetry, a second set of output signals representing the Fourier transform of the portion of the set of first predetermined input signals having the first predetermined symmetry, a third set of output signals having the first predetermined symmetry and representing the Fourier transform of the portion of the set of second predetermined input signals at said second input port; and a fourth set of output signals representing the Fourier transform of the portion of the second set of predetermined input signals having the second predetermined symmetry.

A sampling means coupled to the first input port of the computing means is operable to sample the second speech signal over a first predetermined time interval. The first and second sets of output signals of the computing means then represents the spectrum of the second input speech signal. A magnitude means, coupled to the first and second input ports of the computing means, is operative to combine in a predetermined manner the first and second sets of output signals of the computing means to generate signals representing the frequency spectrum magnitudes of the second speech signal.

Coupled to the magnitude means is a means for generating output signals having the second predetermined symmetry and representing the logarithm of the frequency spectrum magnitudes of the second speech signal. A summing means, having input connections coupled to the means for converting and to the means for generating and an output connection coupled to the second input port of the computing means, is operative to sum the signals of the first predetermined symmetry with the signals having the second predetermined symmetry to form the set of second predetermined input signals. The third set of output signals of the computing means then represents the vocal tract information of the first speech signal and the fourth set of output signals of the computing means in the FTLSM representing the voicing and vocal tract data of the second speech input signal.

A pitch detection logic means, coupled to the fourth output port of the computing means, is operative to extract a pitch signal from the fourth set of output signals to represent the voicing information of the second input speech signal. Also coupled to the fourth output port of the computing means is a selecting means operative to select a predetermined number of the fourth set of output signals to represent the vocal tract information of the second input speech signal. The pitch signal and the output signals of the selecting means represent the analyzed second input speech signal having a substantially compressed bandwidth. The pitch carrier signal and the third set of output signals of the computing means when convolved in a convolution means represent the synthesized version of the first input speech signal.

A method of compressing the bandwidth of an input signal having an excitation portion and an impulse response portion comprises the steps of generating a time variant electrical signal representing the FTLSM of the input signal, separating out a first time interval signal of the time variant electrical signal to represent the impulse response portion of the input signal and separating out a second time interval signal of said time variant electrical signal to represent the excitation portion of the input signal. The first and second time interval signals of the time variant signal represent the reduced bandwidth input signal.

BRIEF DESCRIPTION OF THE DRAWINGS

The construction and operation of the invention will be more fully understood from the following detailed description taken in conjunction with the accompanying drawings in which:

FIGS. 1A and 1B are a series of waveforms useful in explaining the concept of speech compression;

FIGS. 2A and 2B together represent a block diagram of an embodiment of analysis section of a speech compression system in accordance with the present invention;

FIG. 3 is a series of waveforms useful in explaining the operation of the embodiment of FIGS. 2A and 2B;

FIG. 4 is a block diagram of an embodiment of a pitch detection logic unit employed in the embodiment of FIGS. 2A and 2B;

FIG. 5 is a block diagram of an energy ratio detector employed in the pitch detection logic unit of FIG. 4;

FIGS. 6 through 9B are a series of flow charts useful in implementing the functions of the pitch detection logic unit of FIGS. 2A and 2B on a programable computer;

FIGS. 10A and 10B together form a block diagram of an embodiment of a decoding device employed in the synthesis section of the speech compression system according to the present invention;

FIG. 11 is a block diagram of an embodiment of a weighting and averaging circuit employed in the synthesis section of the speech compression system according to the present invention;

FIG. 12 is a block diagram of an embodiment of a pitch carrier generator employed in the synthesis section of the speech compression system according to the present invention;

FIG. 13 is a series of waveforms useful in explaining the operation of the pitch carrier and a convolution means both of which are employed in the synthesis section of the speech compression system according to the present invention;

FIG. 14 is a block diagram of an embodiment of a convolution means employed in the synthesis section of the speech compression system according to the present invention; and

FIG. 15 is a block diagram of an embodiment of a gating circuit employed in the convolution means of FIG. 14.

DETAILED DESCRIPTION OF THE INVENTION

Mathematical Preliminaries

The vocal track h(.tau.) (nasal and mouth cavities) is a time varying filter, and the vocal source v(t) (chest cavity glottal source and pharynx cavity) is the oscillator function of the time varying filter. The output signal s(t) then is the convolution of v(t) and h(.tau.). It is well-known that convolution in the time domain corresponds to multiplication in the Fourier transform domain. The resultant signal S(.omega.) is equal to the product of V(.omega.) and H(.omega.) where S(.omega.), T(.omega.), and H(.omega.) are the Fourier transforms of s(t), v(t) and H(.tau.) respectively. Typical waveforms for V(t), s(t), V(.omega.), H(.omega.) and S(.omega.) are shown respectively in waveforms (a) through (e) of FIG. 1.

The mathematic basis for calculating the FTLSM of a speech signal will be discussed in conjunction with these waveforms. The vocal source output signal v(t) is a quasi-periodic function with a period T for voiced sounds with an an output speech spectrum S(.omega.) being represented by harmonically related narrow bands of energy spaced 1/T apart. The spectrum envelope shape of waveform (e) is similar to the vocal tract transfer function H(.omega.) of waveform (d). Note that the spectrum S(.omega.) of the speech signal (waveform e) has a high frequency component represented by the narrow bands of harmonically spaced energy and a low frequency modulation in the form of a spectrum envelope shape.

A speech compression system which is based on obtaining vocal source information and vocal tract information could use the output speech spectrum if some convenient processing existed which would separate the envelope information from the fine structure information. One operation that can be used to separate the product of two functions is a logging operation. The resulting function after logging the spectrum has a slowly varying envelope and a fast ripple (1/T) riding on the slow envelope. The The Fourier transform of the logged signal, log .vertline. V(.omega.).vertline., has a spike, the position T of which on the time axis is related to the reciprocal of the periodic component and is shown in waveform (f) of FIG. 1A. The FTLSM separates the components into two distinct time regions. The lower time region, x, represents the transform of log .vertline.V(.omega.).vertline. and the upper region has a peak which is related to the period T of the vocal source function.

In the discussion which follows, reference will be made many times to sampled functions defined at integer values of an independent variable (using K or n) ranging from 0 to a positive upper limit N-1. By way of example N will be 256 or 2.sup.8 . In any case, it is of great practical advantage to have N defined by a relation of the form N = n.sup.k where both n and k are integers because this makes possible the use of the most efficient methods of calculating discrete Fourier transforms of sampled functions where the samples are labeled consecutively over the interval 0 to N-1. These computationally advantageous methods are generally referred to by the term "Fast Fourier Transform," or briefly the "FFT."

The sampled functions to which reference is made can be portrayed by graphs of the kind shown in waveforms (g), (h) and (i) of FIG. 1B (for purposes of illustration N=8).

In waveform (g), a very simple sampled function f.sub.K is shown. The numerical values of the samples (ordinates) are indicated by small circles. The function is defined only at discrete, integer values K = 0, 1, 2, . . . , N-1 of the abscissa. Thus, the function f.sub.K is an ordered set of N real numbers (an N-tuple) and such a function will often be referred to as a vector.

It will often be useful to break up or resolve functions like f.sub.K into their even and odd components about the point N/2. The even and odd parts of f.sub.K are thus defined as (f.sub.K + f.sub.N.sub.-K /2) and (f.sub.K - f.sub.N.sub.-K /2), respectively. f.sub.N.sub.-K is obtained simply by reversing the order of the samples of f.sub.K on the interval 0 to N. The even and odd parts of f.sub.K are plotted in waveforms (h) and (i), respectively, of FIG. 1B. A function is obviously the sum of its even and odd parts, e.g.,

The operation of calculating the even and odd parts of a function will be called even-odd separation.

For mathematical convenience, complex sampled functions will also be defined only for integer values of an independent variable K or n. A sampled complex function is of the form:

Z.sub.K = X.sub.K 30 jY.sub.K (3)

where, in general, the function X.sub.K and Y.sub.K have both an even and odd part which can be calculated in the same manner that the even and odd parts of the function f.sub.K is calculated. The operator j is defined as

j = .sup.+.sqroot. -1 (4)

and X.sub.K and Y.sub.K are real N-tuples similar to f.sub.K. The functions X.sub.K and Y.sub.K are respectively referred to as the real and imaginary parts of Z.sub.K . Thus, to store a complex N-tuple such as Z.sub.K for K = 0, 1, 2, . . . , N-1 in a digital machine requires N separate memory locations for the values of X.sub.K and another set of N memory locations for the values of Y.sub.K.

As part of the invention, it will be necessary to calculate the discrete Fourier cosine transform (DFCT) and the discrete Fourier sine transform (DFST) of sampled functions. The DFCT of a function is itself a sampled function defined only for integer values of some independent variable (say n = 0, 1, 2, -, N-1). The DFCT of the function Y.sub.K is defined as

Note that since the cosine is itself an even function the DFCT of any function Y.sub.K depends only on the even part of Y.sub.K, i.e., the DFCT of an odd function is zero. It is easy to show that the DFCT of Y.sub.K is the same as the DFCT of its even part. Thus, ##SPC1##

so that the DFCT of the odd part of any function is zero.

Similarly the DFST of Y.sub.K is defined as

Since the sine is itself an odd function, the DFST of any function Y.sub.K depends only on the odd part of Y.sub.K, i.e., the DFST of an odd function is zero. It is easy to show that the DFST of Y.sub.N.sub.-K is the negative of the DFST of Y.sub.K. Thus, ##SPC2##

so that the DFST of the even part of any function is zero.

The FFT box or computer that will be described actually calculates the discrete Fourier transform (DFT) of a complex input vector, say Z.sub.K .sup.(1). The DFT is defined by Equation (13) with superscripts (1) and (2) denoting inputs and outputs respectively.

Z.sub.K.sup. (1) is the complex input to the FFT and Z.sub.n.sup. (2) is the complex output of the FFT. Using the well-known identity ##SPC3##

Substituting Equation 15 and Equation 3

Z.sub.K.sup. (1) = X.sub.K.sup. (1) + jY.sub. K.sup. (1) (3)

into Equation 13 yields the complex output in terms of the DFCT's and DFST's of X.sub.K.sup. (1) and Y.sub.K.sup. (1) : ##SPC4##

Inspection of Equation 16 shows the output Z.sub.n.sup. (2) of the FFT appears in two separate parts. Its real part R.sub.e [Z.sub.n.sup. (2) ] : ##SPC5##

Using the facts developed above the even functions of K will have nonzero DFCT's and odd functions of K will have nonzero DFST' s, we notice that R.sub.e [Z.sub.n.sup. (2) ] is the sum of the DFCT of the even part of the real input X.sub.K.sup. (1) plus the DFST of the odd part of the imaginary input Y.sub.K.sup. (1), i.e., ##SPC6##

Similarly I.sub.m [Z.sub.n.sup. (2) ] is the DFCT of the even part of the imaginary input Y.sub.K.sup. (1) minus the DFST of the odd part of the real input.

At this point it is interesting to note that the complex output function of the FFT can be considered to be made up of four functions. The real output R.sub.e [Z.sub.n.sup. (2) ] is the sum of an even and odd function, and similarly the imaginary output also is a sum of an even and odd function.

If a procedure is defined to process one of the parts of the complex output (i.e., R.sub.e [Z.sub. n.sup.(2) ] or I.sub.m [Z.sub.n.sup. (2) ] in such a way that one of the summation terms in the expression for R.sub.e [Z.sub.n.sup. (2) ] or I.sub.m [Z.sub.n.sup. (2) ] would drop out of the final result, then this procedure would in effect separate each of the parts of the complex output to subparts which exhibited the property of being either even functions or odd functions of K. An appropriate name for such a process might be an EVEN/ODD separator.

The development of this process will begin by first calculating the expressions for the time reversed R.sub.e [Z.sub.n.sup. (2) ], i.e., R.sub.e [Z.sup.(2).sub. N.sub.- n ]. The reversed function is written as ##SPC7##

Using familiar trigonometric identities to simplify the argument of the cosine function, it can be written

= cos (2.pi.K) cos (2.pi.Kn/N + sin (2.pi. Kn/N

= (1) cos (2.pi.Kn/N) + (0) sin (2.pi. Kn/N )

= cos (2.pi. Kn/N )

Similarly it can be shown that

Using Equations 20 and 21 in Equation 19 yields the following expression for the reversed real output: ##SPC8##

Adding together the functions R.sub.e [Z.sub.n.sup. (2) ] and R.sub.e [Z.sup.(2).sub.N.sub.-n ] and dividing by two yields the following equation: ##SPC9##

which is seen to be the even part of R.sub.e [Z.sub.n.sup. (2) ]. The odd part of R.sub.e [Z.sub.n.sup.(2) ] can be calculated by subtracting R.sub.e [Z.sub.n.sup.(2) ]and its time reversed function. ##SPC10##

To calculate the even and odd parts of imaginary output I.sub.m [Z.sub.n.sup.(2) ], the reversed quantity I.sub.m [Z.sup.(2).sub.N.sub.-n ] is also required. Using the identities in Equations 20 and 21 yields ##SPC11##

so that we may simply calculate the even part in a manner identical to the even part of the FFT ##SPC12## and the odd part of the imaginary output of the FFT is calculated as ##SPC13##

Equations 23, 24, 26 and 27 show how two DFCT's and two DFST's are obtained by operations on the real and imaginary parts of the DFT output of the FFT box. In a latter section of the application, the various even and odd parts of the real and imaginary inputs and outputs of the DFT and the corresponding FFT computer with various time functions and discrete Fourier transforms to be dealt with in the speech processing system will be identified.

System Description -- Part I

A digital vocoder system according to the present invention is shown in block diagram form in FIG. 2 and includes a transducer 10, such as a microphone, connected to an analog to digital (A/D) converter 12. The output of the A/D converter 12 is connected to a first buffer memory 14 and to a normalization unit 16 which includes an accumulator 18 and an inclusive OR matrix 20, such that blocks of samples from the A/D converter 12 are combined in the inclusive OR matrix 20. The inclusive OR of a block of samples equal in length to the updating number of samples used in the system is then processed to yield a normalizer gain factor for the samples in the buffer memory 14. While the inclusive OR matrix 20 is processing data, it also is supplying the normalization gain for the data being processed from the buffer memory 14. The data loaded into the accumulator 18 is shifted either right or left by the appropriate number of bits by the control signal defining the normalization gain derived from the inclusive OR matrix 20. Connected to the output of the normalization unit 16 is weighting function circuit 22 which includes a digital multiplier circuit 24 having input connections from the normalization unit 16 and from a weighting function storage unit 26.

The output connection of the weighting function circuit 22 is connected to a real input R.sub.K.sup.(1) of an FFT computer means 30 such as the Sylvania Electric Products Inc. ACP-1 computer. The FFT computer means 30 includes an FFT section 32 and an even/odd (E/O) separator section 34, to be explained in detail hereinafter, and has first, second, third and fourth output terminals R.sub.n, I.sub.n, H.sub.n and C.sub.n, respectively. The first and second output terminals R.sub.n and I.sub.n are connected to a first magnitude approximation unit 38. An encoding unit 40 has input connections from the output of the normalization unit 16, the fourth output terminal C.sub.n of the FFT computer means 30 and from the first magnitude approximation unit 38 and is operative to detect a pitch signal and to encode the pitch and spectral signals for transmission. Also connected to the first magnitude approximation circuit 38 is a logging algorithm computer 44, the output of which is connected to the input of an even function generator 46. The output side of the even function generator 46 is connected through a summation circuit 48 to the imaginary part input terminal I.sub.K.sup.(1) of the FFT computer 30.

The system can be divided into two sections, analysis and synthesis sections. The units described thus far are employed in the analysis of an input speech waveform. It may be helpful at this point to describe, in conjunction with the waveform of FIG. 3, the operation of the analysis section of the vocoder system since many of the units employed in the synthesis section perform the reverse function of units in the analysis section.

An input sound wave is impressed on the transducer 10 and is converted to a continuous electrical signal shown as the solid envelope waveform (a) of FIG. 3. The continuous electrical signal is converted by the A/D converter 12 at the specified sampling rate. Blocks of 256 converted samples are stored in the buffer memory 14. The A/D converter 12 output samples are always processed by the normalizing unit 16, the purpose of which is to establish the appropriate normalizer gain so that the louder speech sounds have the N samples in their analysis intervals normalized to a fixed dynamic range, for example 6 db. For weak sounds, the normalizer gain factor will shift the N samples of the analysis interval to make the N samples appear to have more amplitude. This is done to keep the input sample level to the weighting function circuit 22 and consequently the rear input R.sub.K.sup. (1) of the FFT computer means 30 at a high signal input level.

A constant scaling factor is applied to the N samples by the normalizing unit 16 before being directed to the weighting function circuit 22 where the data is effectively multiplied by a smooth weighing function. Both the normalizing unit 16 and the weighting function circuit 22 will be discussed in detail hereinafter. The sampled, normalized and weighted data is then directed to the real part input R.sub.K.sup. (1) of the FFT computer means 30.

The output signals of the FFT computer means 30 include three transforms: (1) the transform of the speech signal which includes the real and imaginary parts of the speech spectrum signal, (2) the transform of the received spectrum envelope which is the vocal tract impulse response H.sub.n and (3) transform of the logarithm of magnitude spectrum which is the FTLSM function C.sub.n. The transform of the received spectrum H.sub.n will be discussed in connection with the synthesis section of the vocoder system.

The real and imaginary parts of the speech spectrum signal are directed to the magnitude approximation circuit 38 where they are combined to obtain the magnitude of N/2 samples of the spectrum. The magnitude of each of the samples may be obtained by taking the square root of the sum of squares of the real and imaginary parts of that sample. To reduce the number of calculations, a magnitude approximation circuit 38 is employed in lieu of taking the square root of the sums of the squares of the real and imaginary parts. The output signal of the magnitude approximation circuit 38 is directed to the logging algorithm computer 44 where the N/2 samples of the spectrum magnitude are logged and directed to the even function generator 46 which converts the N/2 samples to an even function signal having N samples. The logged magnitude spectrum signal (an even function) is combined with a received signal (an odd function) from the synthesis section to form the imaginary input I.sub.K.sup.(1) of the FFT computer means 30.

The output signal from the magnitude approximation circuit 38 and the FTLSM signal C.sub.n from the FFT computer means 30 are directed to the encoding unit 40 where they are combined to extract pitch data as well as spectral envelope information for transmission to a receiving unit (not shown).

COMPONENT DESCRIPTION

Fft computer Means 30

In this section is described the physical significance of the inputs of the FFT section 32 and the E/O separator section 34.

The FFT section 32 takes an N sample, complex input vector S.sub.K.sup.(1), and computes an N sample complex output vector S.sub.n.sup.(2) in accordance with the discrete complex FFT relation

where K = 0, 1, 2, 3, . . . , N-1

and n = 0, 1, 2, 3, . . . , N-1

In the present application, N is typically equal to 256 samples. Again, the inputs and outputs to the FFT section are denoted by superscripts, (1) for inputs and (2) for outputs.

The complex input vector S.sub.K.sup.(1) has a real part R.sub.K.sup.(1) and an imaginary part I.sub.K.sup.(1) so that

S.sub.K.sup.(1) = R.sub.K.sup.(1) + jI.sub.K.sup.(1) (29)

The real input vector, R.sub.K.sup.(1) , is contained in a set of N registers or storage locations (not shown).

The input signals R.sub.K.sup.(1) are the samples of the input speech waveform presently to be analyzed and transmitted. It is thus the sum of the even and the odd part of the N samples of the input speech waveform to be analyzed. The R.sub.K.sup.(1) signals will typically look like waveform (a) of FIG. 3 for a voiced speech signal waveform input and is, in general, neither a purely even nor an odd function of K.

Each sample of the R.sub.K.sup.(1) signal is stored in one of the N registers or storage locations, the K.sup.th sample R.sub.K.sup.(1) being in the K.sup.th location, K = 0, 1, 2, 3, . . . , N-1. Since the R.sub.K.sup.(1) signal has both an even and odd part, it will have both a nonzero discrete cosine transform R.sub.n as well as a nonzero discrete sine transform I.sub.n. The R.sub.n and I.sub.n are, respectively, the samples of the real and imaginary parts of the discrete FFT of the analyzed speech waveform R.sub.K.sup.(1). R.sub.n and I.sub.n are two of the outputs to be obtained from the FFT computer means 30.

I.sub.K.sup.(1), the imaginary input, is contained in another set of N registers or storage locations (not shown) with again the K.sup.th sample I.sub.k.sup.(1) being in the K.sup.th location K = 0, 1, 2, 3, . . . , N-1. I.sub.K.sup.(1) is the sum of an even and odd function from the summation circuit 48. In this system the even part of I.sub.K.sup.(1) is defined as the logarithm of the magnitude of the input speech signal to be transmitted and which was spectrum analyzed during immediately previous analysis operation of the FFT section 32.

The even part of I.sub.K.sup. (1) is 1/2(I.sub.K.sup.(1) + I.sup.(1).sub.N.sub.-K) and, for a voiced speech signal input, will typically look like the sampled function of waveform (b) in FIG. 3. Since 1/2(I.sub.K.sup.(1) + I.sup.(1).sub.N.sub.-K) or the logged spectrum magnitude of the signal to be transmitted is a purely even function of K, centered around K = N/2, it will have a nonzero discrete cosine transform C.sub.n and an identically zero discrete sine transform. Thus the Fourier cosine transform of the even part of I.sub.K.sup.(1) is C.sub.n .sup.. C.sub.n is another of the outputs obtained from the operation of the FFT section 32 and E/O separator section 34. (A typical C.sub.n function is shown in waveform (c) of FIG. 3.) The signal C.sub.n includes the samples of the cosine transform of the logarithm of the magnitude spectrum of the input speech signal. The function C.sub.n is therefore even and is referred to as the FTLSM of the input speech signal samples R.sub.K.sup.(1) for the previous analysis interval. The samples of C.sub.n for n = 0, 1, 2, 3, . . . , 20 are, for example, used in the encoding unit 40 as the spectrum envelope information to be transmitted. C.sub.o is the average value of the logged spectrum magnitude and is always the largest C.sub.n signal. For voiced speech signals, the signal C.sub.n will have a noticeable peak at a value of n = n.sub.p approximately equal to the number of waveform samples in one pitch period of the voiced sounds. Thus, n.sub.p is a measure of the pitch period of the signal to be transmitted and the value of n.sub.p is therefore measured and transmitted so that the receiver may use this information in order to synthesize a speech signal with the correct pitch.

In this system, the odd part of the imaginary input I.sub.K.sup.(1) is defined as equal to a received spectrum magnitude function M.sub.K.sup.(r) (from the synthesis section) which has been arranged in reflected and inverted form so as to be an odd function of K centered about K = N/2, i.e.

M.sub.K.sup.(r) = 1/2(I.sub.K.sup.(1) - I.sup.(1).sub. N.sub.-K) (30)

For a voiced signal being received to by synthesized, this received spectrum magnitude will look like the sampled function in waveform (d) of FIG. 3. In waveform (d) of FIG. 3 is shown the plot of M.sub.K.sup.(r) where K represents frequency. The highest frequency in the synthesized speech signal corresponds to K = N/2 and the lowest to K = 0.

In terms of real frequency, the corresponding real frequencies involved are given by

f.sub.K = rK/N (31)

where r is the sampled rate in samples per second. Since M.sub.K.sup.(r) is a purely odd function of K, it will have a nonzero discrete sine transform H.sub.n and an identically zero discrete cosine transform. Thus, the discrete Fourier transform of M.sub.K.sup.(r) is called H.sub.n. A typical H.sub.n is shown in waveform (c) of FIG. 3. H.sub.n is another of the outputs obtained from the operation of the FFT computer means 30. Since the H.sub.n are the samples of the discrete sine transform of the received spectrum magnitude, H.sub.n is an odd function of n. H.sub.n is the impulse response which is used in the synthesis of the received speech signal to be discussed hereinafter.

Returning to the FFT Equation 28, it can be shown exactly how each of the inputs and outputs discussed above are obtained from the transform. As in Equations 13 and 16, the following identity is substituted:

also, Equation 29 is substituted into Equation (28) obtaining: ##SPC14##

At the end of the DFT or FFT operation, the output complex vector S.sub.n.sup.(2) appears in two sets of N registers or memory locations each. One set of N registers contains the real part of S.sub.n.sup.(2) given by

The other set of N registers contains the imaginary part of S.sub.n.sup.(2) given by

In each case the two sets of registers are numbered n = 0, 1, 2, 3, . . . , N-1.

The even and odd separator section operates on R.sub.n.sup.(2) to produce the even part of R.sub.n.sup. (2) ##SPC15##

The even the odd separator section 34 operates similarly on I.sub.n.sup.(2) to produce the even part of I.sub.n.sup. (2) ##SPC16##

Thus, there are basically four parts of the FFT computer means 30 output as given by Equations 36, 37, 38 and 39. Equations 36 and 39 give, respectively, the real and imaginary parts of the Fourier transform of the speech input waveform R.sub.K.sup.(1) which is to be further processed, and information describing its spectrum magnitude envelope is to be transmitted. Equation 36 defines the transform H.sub.n of the received signal to be used by a synthesizer as the vocal tract impulse response, and Equation 37 represents the C.sub.n function (the FTLSM signal).

Logging Algorithm Computer

As stated hereinabove, the logging algorithm computer 44 must take the log of the frequency spectrum magnitudes from the magnitude approximation circuit 38. The logging algorithm computer can be any computer capable of solving the algorithm discussed hereinbelow (for example, the Sylvania Electric Products Inc. ACP-1 computer can be used). The log function can be approximated in several ways, one of which is by a set of n-1 inscribed straight lines, where n equals the number of bits in the binary integer word which describes the number to be logged. Thus if (as in the Sylvania Electric Products Inc. ACP-1 computer) one has an 11-bit magnitude to be logged, the log is approximated by ten inscribed straight lines.

The easiest way to explain the method is by way of an example. Suppose the integer to be logged is the number y, e.g.,

v = 2.sup.10 2.sup.9 2.sup.8 2.sup.7 2.sup.6 2.sup.5 2.sup.4 2.sup.3 2.sup.2 2.sup.1 2.sup.0

y = 0 0 1 1 0 1 0 1 1 1 0

Furthermore, suppose, for example, that a 7-bit logarithm is desired. First, the position of the most significant unit is found by a shift and test operation. Here this is in the exponent = 8 position. The 4-bit binary code for this exponent is generated and shifted to the left three binary places. In the three empty binary places, the next three most significant bits of y are simply inserted. The resulting logarithm is 1 0 0 0 1 0 1, i.e.,

1 0 0 0 1 0 1 binary code for 8 next 3 bits of y

The rationale behind the method is simple. The number to be logged in the example was

y = 2.sup.8 [1 +x]

where

0 .ltoreq.x .ltoreq.1

Therefore

log.sub.2 y = 8 + log.sub.2 [1 + x]

The method simply replaces log.sub.2 [1 + x] by x and codes the result in binary. In general, then one forms the approximation

log.sub.2 2.sup.v [1 + x] .apprxeq.v + x

where v is the exponent of the position where the most significant "one" appears. In most engineering applications, there will be no point in taking more than about three bits for x since this gives 1 + x to within a factor of l + 1/8 at worst and this corresponds to an error of 0.5 db.

The error obtained in the three least significant bits due to inscribed straight line approximation obtained by replacing log.sub.2 [1 + x] by x may be seen in Table I.

table i --------------------------------------------------------------------------- logging algorithm vs. true log

code Used= correct log.sub.2 [1+x] X Code for X code __________________________________________________________________________ 0 0 0 0 0 0 0 0 0.125 0 0 1 0.17 0 0 1 0.250 0 1 0 0.32 0 1 1 0.375 0 1 1 0.46 1 0 0 codes in 0.500 1 0 0 0.58 1 0 1 error by 0.625 1 0 1 0.70 1 1 0 one count 0.750 1 1 0 0.81 1 1 0 0.875 1 1 1 0.91 1 1 1 __________________________________________________________________________

It can be seen from the table that the largest error made is just one count. Since the largest 7-bit logarithm generated for an 11-bit magnitude input is

1 0 1 0 1 1 1 = 87

[code for 10]

the 60 db dynamic range of this system is divided up into 60/87 = 0.7 db steps. Therefore an error of one count, as shown in the table, corresponds to an output error of 0.7 db. This error will occur on the average half the time and is additive to the worst case error of 0.5 db which could occur as a result of ignoring all bits beyond the third bit to the right of the most significant "one" in y. On the average, the error will be less than 1 db.

Plots of the function FTLSM when the log of the spectrum magnitude was generated by the above algorithm were essentially indistinguishable from those generated by a full accuracy log routine.

Encoding Means 40

The encoding means 40 combines information derived from the pitch detection logic unit 60 with the FTLSM signal C.sub.n from the FFT computer means 30 to generate and encode the pitch and spectral information for transmission to some receiving terminal. The encoding unit 40 has two sections, the first of which is the pitch detection logic unit 60 which employs the spectral magnitude signal from the magnitude approximation circuit 38 and a second section section which uses the FTLSM signal C.sub.n from the FFT computer means 30 to extract the pitch information, if any, from the speech signal. The remaining units of the encoding means 40 extract and encode the spectral envelope information from the FTLSM signal C.sub.n for transmission to a receiver. (As was stated hereinabove, a speech signal can be represented by its excitation function (pitch signal) and the vocal tract transfer function (spectral envelope information).

The spectral information section of the encoding means 40 includes a denormalizing unit 62 having input connections from the C.sub.n terminal of the FFT computer means 30, and the normalizing unit 16, and having an output connection to a scaling unit 64. The number of C.sub.n terms processed by the encoding means 40 is some number K where K is less than the maximum value of n (i.e., 127). Typically K is some value in the range of 10 to 30 and is a function of the system sampling rate once the amount of real time that C.sub.n should define for some optimum representation of the spectrum magnitude envelope has been determined.

The scaling unit 64 includes a plurality of scaling registers 66, such as an accumulator, connected to a scaling storage device, such as a memory 68, and a digital multiplier circuit 70. The multiplier circuit 70 also has a connection from the scaling memory 68. The multiplier circuit 70 has an output connection to a gating means 74. The gating means 74 includes a storage device, such as the register 16, connected to a gating matrix 78 which has a second input from a counter 80. The output signal of the gating matrix 78 is the coded spectral envelope to be transmitted.

The coded spectral envelope signal is obtained from the sampled C.sub.n signal as follows. Assume the FTLSM data, as shown in waveform (c) of FIG. 3, is transferred to a storage register (not shown in the denormalizing unit 62). In the instant invention, only a predetermined number of the N samples of the FTLSM signal C.sub.n are selected to characterize the spectrum of the analyzed speech signal. For example, at an input sampling rate of 6.4 KHz, the first 21 (K=20) C.sub.n coefficients are used to describe the magnitude of the spectrum envelope. The amount of time spanned by K coefficients of C.sub.n, at a given sampling rate (SR), is equal to the ratio K/SR.

To preserve information about the original input speech level in the transmitted data, the normalization gain factor supplied by the normalization unit 16 must be removed from C.sub.o, which is the n = 0 sample of the FTLSM function (C.sub.n). This is done by the denormalization unit 62. Since the application of the normalization gain affects all the speech samples in an analysis interval, this normalization gain only affects the spectrum amplitude and not the spectrum envelope shape. An increase in the input samples therefore only affects the average value of the log of the spectrum magnitude signal. This average value change is only reflected in the C.sub.n function as a corresponding change in amplitude of the DC component in the function C.sub.n or the C.sub.o term. The C.sub.o value (C.sub.o) that would have been calculated if no normalization gain was included can be computed from the following relation

C.sub.o = C.sub.o - 16.sqroot.N log.sub.2 (G.sub.N) (40)

where G.sub.n is the value of normalization gain supplied by the normalization unit 16. The denormalizing unit 62 can be a small computer capable of solving Equation 40. In practice the FFT computer means 30 could perform the function of the denormalizing unit and transmit to the encoding means 40 a corrected value of the C.sub.o coefficient of the C.sub.n signal.

To achieve as much accuracy as possible in the transmission of the lower 21 FTLSM coefficients C.sub.0 through C.sub.20, it is necessary to determine the peak-to-peak range of variation of each coefficient. Computer studies were performed to determine experimentally the range of variation for each of the selected FTLSM coefficients C.sub.n using fifteen test sentences as input speech signals. It was found during these tests that some of the 21 FTLSM coefficients do not vary over symmetric ranges. This fact implies that an average shape exists for the FTLSM. As a result, each coefficient can be defined as having a peak-to-peak range about some average level. Therefore, by adding an experimentally determined bias level (where the bias for each channel is fixed) to each coefficient, the range of variation can be made to exist primarily for positive values only. The value of the bias constant determines the probability with which each one of the C.sub.n samples to be quantized exceeds the allowable range for the particular C.sub.n coefficient. In the case of a C.sub.n coefficient which falls outside the peak-to-peak range allowable, the system will truncate that C.sub.n coefficient to the closest allowable value (i.e., the maximum positive or negative value).

The total positive range of variation of each coefficient (also a known experimental quantity) is employed in conjunction with the bias level to scale and quantize each FTLSM coefficient. In this regard, the quantized value of the j.sup.th FTLSM coefficient, C.sub.j, is computed by the relationship

where Q.sub.j is the size of the quantum step in the j.sup.th channel. For a peak-to-peak range for the j.sup.th channel given by P.sub.j and assuming b.sub.j bits are assigned to the j.sup.th channel, Q.sub.j is given by

Table II presents typical values of the bias levels and peak-to-peak ranges for 21 FTLSM coefficients. Additional data in the table (discussed subsequently) is the bit assignment across the 21 FTLSM coefficients and the interleaved coefficients that are updated every other frame.

TABLE II

MAXIMUM VARIATION RANGE, BIAS INSERTION LEVEL AND CHANNEL BIT ASSIGNMENTS --------------------------------------------------------------------------- FOR TRANSMITTED FTLSM COEFFICIENTS

channel insertion bit number peak-to- bias assignments J peak range (BIAS).sub.j b.sub.j 0 1500 0 5 First five 1 820 400 coeffi- 2 510 200 4 cients. 3 575 200 Updated every 20 msec 4 530 260 4 5 430 185 3 6 390 225 3 7 390 225 3 8 450 330 3 9 320 165 3 Interleaved 10 300 185 3 channels. 11 225 120 2 Each set updated 12 245 140 2 every 13 225 120 2 40 msec 14 245 120 2 15 225 120 2 16 205 100 2 17 205 100 2 18 185 100 2 19 205 100 2 20 205 100 2 __________________________________________________________________________ notes: Total number of channel bits 40 Bits used to specify interleaved set 1 Bits used for pitch word 7 Total number of bits/frame 48 __________________________________________________________________________

Twenty-one FTLSM C.sub.n coefficients (including the denormalized C.sub.o coefficient) are transferred to the scaling registers 66 where the scaling factors, Bias.sub.j, from the memory 68 are added to their respective C.sub.j coefficients to adjust the range of the numbers being quantized such that they go from zero to some positive maximum. The scaled C.sub.n coefficients, C.sub.0 through C.sub.20, are then directed to the multiplier 70 where each coefficient is multiplied by a separate predetermined ratio from the memory 68. (The combination of the scaling registers 66, memory 68 and multiplier 70 comprise in effect a quantizer circuit.) The predetermined ratios stored in the memory 68 are given by Equation 43.

Q.sub.(j) = (2.sup. bj -1)/P (43)

which is seen to be the reciprocal of Equation 42. For example, the ratio of the 0 channel of Table II is equal to (2.sup.5 -1)/1500. By multiplying each FTLSM coefficient C.sub.0 through C.sub.20 by its respective ratio, the quantizer guarantees that the quantized value of each FTLSM coefficient never exceeds the highest number that its particular bit assignment can represent. For example, if a value of 150 were calculated for the C.sub.2 coefficient, the quantized value would be calculated in accordance with the equation

C.sub.2 = (C.sub.2 + BIAS.sub.2) .sup.. Q(2) (43-A)

= (150 + 200) (2.sup.4 - 1/510) = 350 (15/510) = 10

the value 10 can be represented by a four bit binary number and therefore, since b.sub.2 =4 the amplitude value of C.sub.2 (150), falls within the quantizer range and is representable as the integer value 10.

The denormalized, scaled and quantized C.sub.j coefficients are then directed to the storage register 76 of the gating means 74. It is appreciated that there are many well-known species of gating means 74 that could be employed, for example, those employed in telemetry systems for varying the number of times a particular channel is sampled. Still another way of implementing the gating means would be to have the data shifted out of register 76 under program control. Another technique (the one chosen for illustration in FIG. 2B) is to having a gating matrix 78, such as the type employed in time division telemetry systems, connected to the registers 76. A second input to the gating matrix 78 would be provided by the counter 80 which would supply a gating pulse to the appropriate gate in the matrix to thereby gate through the C.sub.j data stored in register 76 in accordince with Table II (i.e., the interleaving operation on the upper C.sub.j values would be performed).

Pitch Detection Logic

One embodiment of the pitch detection logic is shown in block diagram form in FIG. 4 and includes a first storage register 90 having input connections from the first magnitude approximation circuit 38 and a gating circuit 92. A second magnitude approximation circuit 94 has an input connection from the first storage register 90 and first and second output connections to an energy ratio detector 96 (to be discussed in detail hereinafter). A second storage register 98 has input connections from the C.sub.n terminal of the FFT computer means 30 and a maximum selection circuit 100 and output connection to a comparator circuit 102, well-known in the art, and a buffer storage register 104. The comparator circuit 102 has first and second output connections, respectively, to the energy ratio detector 96 and to a third gating circuit, such as the OR circuit 106, the output of which is connected to a first flip-flop circuit 108.

A fourth gating circuit, such as the OR gate 110, has first and second inputs from the AND gate 104 and the flip-flop circuit 108, respectively, and an output connection to a third storage device, such as a second buffer memory 112. Connected between a second output of the energy ratio detector 96 is a second flip-flop circuit 114, the output of which is connected to the buffer storage register 104.

The pitch detection logic has as its inputs the n.sup.th interval of spectrum magnitudes (128 samples) and the n-1 interval FTLSM coefficients, each of which is stored in its respective registers 90 and 98. The low-band and high-band energy values (E.sub.L).sub.n and (E.sub.H).sub.n are computed directly from the spectrum magnitude samples by the second magnitude approximation computer 94. The gating circuit 92, in response to a control signal from a source not shown, gates two sets of a predetermined number of the spectrum magnitude samples into the magnitude approximation computer 94. To determine the low-band energy, samples 6 through 32 (covering approximately a total bandwidth of 170 to 900 Hz at 7,200 Hz sampling rate) are gated out of register 90 and are directed to the magnitude approximation computer 94. Similarly, to determine the high-band energy, samples 99 through 125 (covering approximately a total bandwidth of 2,800 to 3,500 Hz) are employed. The second magnitude approximation computer 94 can be any well-known computer capable of solving the following relation

E.sub.(L or H) = 3/4 max (Si) + 1/4.SIGMA.Si (44)

where i equals 6 through 32 for E.sub.L and 99 through 125 for E.sub.H, and the values Si are obtained as the output spectrum magnitude samples from the first magnitude approximation means 38. The results of solving the Equation 44 give two numerical values, one for low-band energy and one for high-band energy, both of which are directed to the energy ratio detector 96, to be discussed in detail hereinafter.

The N/ 2 C.sub.n coefficients stored in the second storage device 98 are scanned by the maximum selection circuit 100 to determine the magnitude and position of the largest C.sub.n coefficient in a predetermined range (i.e., 20 .ltoreq. n .ltoreq. 115). The maximum selection circuit 100 can also be a two-stage comparator which initially stores the magnitude and position of the C.sub.20 value and thereafter compares the C.sub.20 magnitude sequentially with the remaining C.sub.n coefficients (in the range of n) until it finds a larger magnitude. When a larger magnitude is found, the new magnitude and position become the reference to test C.sub.n magnitudes against. This type of operation can also be programmed on a general purpose computer with well-known techniques; in fact, the FFT computer can perform the function.

The magnitude of the largest C.sub.n coefficient found by the maximum selection circuit 100 is directed to the comparator circuit 102 where it is compared with a predetermined low threshold (LOWTHR) value (for example, 30) which can be determined emperically. If the magnitude of the largest of the C.sub.n coefficients is less than the predetermined LOWTHR value, an output signal is directed from the terminal 103 of the comparator circuit 102 through the first OR circuit 106 to cause the first flip-flop circuit 108 to change its state. The output signal from the first flip-flop circuit 108 is directed to the second OR circuit 110 to a particular memory location in the second buffer memory 112. If the peak C.sub.n sample in the search range for an analysis interval is less than the LOWTHR value, an unvoiced (UV) condition exists. (An unvoiced condition is the absence of a pitch excitation signal.)

If the magnitude of one of the C.sub.n coefficients is greater than the LOWTHR value for some analysis intervals, then an output signal from the 105 terminal is directed to the energy ratio detector 96 where a second test is made to determine voice (v) and unvoiced sounds. To minimize voicing errors during certain types of sounds that yield a sufficiently large C.sub.n peak to pass the comparator test 102 but do not have the normal energy distribution of low-band and high-band energy or sufficient absolute low-band energy, the energy ratio detector 96 (to be discussed in detail hereinafter) checks the energy ratio of low-band energy to high-band energy and the absolute value of the low-band energies for three consecutive analysis intervals. If the energy ratio detector tests are not satisfied, then the unvoiced output signal appears at the output terminal 95 and is directed through the OR circuit 106 indicating an unvoiced sound.

If on the other hand the threshold values of energy ratio detector 96 are satisfied, an output signal is directed to the second flip-flop circuit 114 from terminal 97. The second flip-flop circuit 114 changes its state causing the signal representing magnitude and position of the largest C.sub.n coefficient to be passed by the AND gate 104 through the OR gate 110 to the second buffer memory 112. This particular sequence indicates a voiced condition, with the voiced pitch signal indicated by the position of the C.sub.n coefficient stored in the second buffer memory 112.

Shown in FIG. 5 is a block diagram of one embodiment of an energy ratio detector that can be employed in the pitch detection logic of FIG. 4. In the normal operation of the pitch detection logic, a delay equivalent to two analysis intervals exists before the pitch detection logic will start transmitting nonzero pitch values assuming data put into the pitch detection logic was a voiced sound. This delay is due to the low-band, high-band energy ratio detector which requires three consecutive low-band, high-band energy values that must satisfy the threshold requirements before voicing can occur.

The energy ratio detector of FIG. 5 includes two channels, a low-band channel 120 and a high-band channel 122. The low-band channel 120 includes three storage means, for example, registers 124, 125 and 126, which store the magnitude of the low frequency energy for three successive analysis intervals, n, n-1 and n-2 respectively. (While three analysis intervals were chosen, it is obvious that more or less than three can be employed.) Connected to one output of respective low energy resistors 124, 125 and 126 is a first set of comparators 128, 130 and 132, each of which has as a second input a connection from a first storage device 134. The output of each of the comparators 128, 130 and 132 is connected to the input side of a first gating means, for example, the AND gate 136.

The high-band energy channel 122 includes three registers 140, 142 and 144 which store the magnitude of the high-band energy for the successive analysis intervals n, n-1 and n-2 respectively. Three inverter circuits 146, 148 and 150 have input connections from respective registers 140, 142 and 144 and output connections to three multiplier circuits 152, 154 and 156, respectively. A second input connection to the multiplier circuits 152, 154 and 156 originates at respective low-band energy registers 124, 125 and 126. A second set of comparators 158, 160 and 162 has a first input connection from respective multiplier circuits 152, 154 and 156 and a second input connection from a second storage device 164. (While two storage devices are shown, it is obvious that only one may be employed.) A second gating means 166 has input connections from each of the second set of comparators 158, 160 and 162. A third comparator means 168, for example a two input AND gate, has input connections from each of the AND circuits 136 and 166 and first and second output connections to the OR circuit 106 and the flip-flop circuit 114, respectively, of the pitch detection logic.

In operation, the magnitudes of the low-band energy signal and the high-band energy signal for the n.sup.th analysis interval are directed to respective registers 124 and 140. It is to be appreciated that after each analysis interval n, the data in register 124 is shifted sequentially through the registers 125 and 126 and similarly the data in the high-band energy register is shifted sequentially through registers 142 and 144. For example, after the n.sup.th analysis, the data in register 124 is shifted into register 125, and the data in register 125 is shifted into register 126. (Note, for simplicity the connection lines between these registers are not shown.) The comparators 128, 130 and 132 compare the low-band energy value in their respective registers 124, 125 and 126 with a predetermined value 30 from the first storage device 134. When the magnitude of the signals from each register 124, 125 and 126 exceeds the predetermined value stored in the storage device 134 and a signal is received from the first comparator 102 of FIG. 4, the gating circuit 36 produces a first output signal, for example, a positive pulse. If one or all of the signals stored in the low energy resistors 124, 125 and 126 is less than the predetermined reference (from the storage device 134), then the second predetermined output signal, for example, a negative pulse, is generated.

The object of the second channel 122 of the energy ratio detector 96 is to compare the ratio of the high-band and low-band energies with a predetermined ratio stored in the storage device 164. It has been found experimentally that for a proper varied sound the ratio of the low-band and high-band energy exceeds a certain value, for example, four. To obtain the ratio, the magnitude of the high-band energy signals for the n, n-1 and n-2 intervals stored in registers 140, 142 and 144 are directed through respective inverters 146, 148 and 150 to one input of respective multipliers 152, 154 and 156. A second input signal to the multipliers 152, 154 and 156 originates from the respective low-band energy registers 124, 125 and 126, and the resultant products are directed to respective comparators 158, 160 and 162 where they are compared to the predetermined constant from the storage device 164.

If the products (ratios) from all the multiplier circuits exceed the predetermined number stored in the storage device 164, then the gating circuit 166 delivers a predetermined signal, for example, a positive pulse to the comparator circuit 168. If one or all of the products (ratios) is less than the predetermined value, then the gating circuit 166 delivers a negative pulse to the comparator 168. The comparator circuit 168 produces an output signal at terminal 169 only when both signals at its input terminals are positive and produce a signal at its output terminal 167 under all other conditions. The gating circuits 136 and 166 and the comparator circuit 168 are simple logic circuits and can be assembled by any person having ordinary skill in the art of designing logic circuits. (Another technique for performing the logic functions specified hereinabove is by programming a general purpose computer, in accordance with well-known techniques.)

In addition to implementing the pitch detection logic 60 of FIG. 2B by the embodiment of FIGS. 4 and 5, a special purpose computer, for example, the Sylvania Electric Products Inc. ACP-1 computer, can be programmed in accordance with the flow charts of FIGS. 6, 7, 8, 9A and 9B.

The block labeled "PITCH DETECTION LOGIC" in FIG. 2B includes the low-band, high-band unvoiced to voiced (UV-V) and voiced to unvoiced (V-UV) detectors and pitch logic. A block diagram of the overall pitch detection function is shown in FIGS. 6 through 9A. The symbols used inside the blocks shown in these figures defined in Table III. Numerical values for the threshold levels K.sub.1, K.sub.2, K.sub.3 and K.sub.4 used in the UV-V and V-UV detections operations are given in Table IV.

table iii

glossary of terms used in pitch detector block diagrams (figs. 6 through --------------------------------------------------------------------------- 9a)

(e.sub.l).sub.n.sub.-2 (E.sub.L).sub.n.sub.-1 = Low-band energy value for (n-2), (n-1) and n analysis intervals (E.sub.L).sub.n (E.sub.H).sub.n.sub.-2 (E.sub.H).sub.n.sub.-1 = High-band energy value for (n-2), (n-1) and n analysis intervals q.sub.n.sub.-3 = Value of pitch actually transmitted three analysis intervals ago q.sub.n.sub.-2 = Value of pitch that is transmitted during during n.sup.th analysis interval. (The pitch value (.tau..sub.1).sub.n.sub.-1 for the (n-1).sup.st interval is checked for tracking relative to q.sub.n.sub.-2 before deciding a value for q.sub.n.sub.-1.) q.sub.n.sub.-1 = Value of pitch for (n-1).sup.st FTLSM. This value is being determined during the n.sup.th analysis interval and is transmitted during the (n+1) analysis interim. (.tau..sub.1).sub.n.sub.-1 = Position of primary FTLSM peak in search range of (n-1).sup.st FTLSM (.tau.'.sub.1).sub.n.sub.-1 = Position of peak that is tracking within .+-. 1 msec of previous pitch value (.tau..sub.1 n.sub.-2) (P.sub.1).sub.n.sub.-1 = Value of FTLSM peak at (.tau..sub.1 n.sub.-1) (P'.sub.1).sub.n.sub.-1 = Value of FTLSM peak at (.tau.'.sub.1).sub.n.sub.-1 FLAG = Computation variable __________________________________________________________________________

TABLE IV

THRESHOLD VALUES USED IN UV-V AND V-UV DETECTION OPERATIONS IN PITCH LOGIC --------------------------------------------------------------------------- OF FIGURES 6 THROUGH 9A

K value using simple sum of Threshold low-band or high-band samples __________________________________________________________________________ K.sub.1 3000 K.sub.2 1000 K.sub.3 1000 K.sub.4 1000 __________________________________________________________________________

a complete flow chart of the logical functions performed by the pitch detector in extracting fundamental pitch is presented in FIGS. 6 through 9B. An outline of the logical operations performed in each of the figures is given below.

Fig. 6: uv-v detection (Condition .phi.) and continue voicing condition (Condition .phi.')

Fig. 7: initial pitch tracking following a UV-V boundary (Condition .phi. satisfied)

Fig. 8: extension of voicing one analysis interval (Conditions .phi.' and .phi." not satisfied)

Fig. 9a: test for voicing mode following failure to pitch track and reestablishment of pitch tracking

Fig. 9b: pitch tracking logic during steady voicing (including negation of tracking branch)

The initial operations performed by the pitch logic are diagrammed in FIG. 6 and include the n.sup.th spectrum and (n-1) FTLSM functions (which appear simultaneously at the FFT output) directed to the pitch detector logic. As a first step in the process, the low-band and high-band energy values (E.sub.L).sub.n and (E.sub.H).sub.n are computed directly from spectrum magnitude samples that correspond to the n.sup.th analysis interval. The relations used to compute E.sub.L and E.sub.H are given in Equation 44. This relation gives values of E.sub.L and E.sub.H that are proportional to the square root of the energy contained in the low-band and high-band regions of the speech spectrum respectively. Following this operation, the (n-1) C.sub.n function is scanned to find the amplitude and position of the largest peak in the search range. This peak and its position are designated as (P.sub.1, .tau..sub.1).sub.n.sub.-1.

Energy values computed for the previous two intervals, namely (E.sub.L).sub.n.sub.-1 (E.sub.H).sub.n.sub.-1 (E.sub.L).sub.n.sub.-2 and (E.sub.H).sub.n.sub.-2 are retained in storage together with the position of the primary peak found for the n-2.sup.nd C.sub.n function. This value is designated as (.tau..sub.1).sub.n.sub.-2. The only additional data that may be necessary occurs during the pitch tracking mode (FIG. 9B) and involves a search through the (n-1).sup.st C.sub.n function for a peak amplitude and its position in the vicinity of (.tau..sub.1).sub.n.sub.-2. This secondary peak and its position are designated as (P'.sub.1, .tau.'.sub.1).sub.n.sub.-1 in FIG. 9B.

As an aid to the following discussion, it is convenient to summarize the initial operations and subsequent branch functions performed by the logic of FIG. 6 in terms of the following five conditions and the branch functions that are performed corresponding to each condition:

Condition Branch Function Performed __________________________________________________________________________ 1. q.sub.n.sub.-3 = 0; Condition .phi. Continue unvoiced not satisfied (FIG. 6) 2. q.sub.n.sub.-3 = 0; Condition .phi. Initiate initial tracking satisfied mode (FIG. 7) 3. q.sub.n.sub.-3 0; Condition .phi.' Extend voicing mode not satisfied (FIG. 8) 4. q.sub.n.sub.-3 0; Condition .phi.' Test for continued voicing satisfied (FIG. 9A) Flag = 0 5. q.sub.n.sub.-3 0; Condition .phi.' Steady voicing. Test for satisfied pitch tracking. Flag = 1 (FIG. 9B) __________________________________________________________________________

conditions .phi. and .phi.' are threshold tests involving the low-band and high-band energy measures and are defined in the block diagram of FIG. 6. In the present system, during the n.sup.th analysis interval, the following data is available:

1. Whether q.sub.n.sub.-3 is zero or not

2. q.sub.n.sub.-2 = (.tau..sub.1).sub.n.sub.-2 or (.tau.'.sub.1).sub.n.sub.-2 and (.tau..sub.1).sub.n.sub.-1

3. (E.sub.L).sub.n.sub.-2 (E.sub.H).sub.n.sub.-2 (E.sub.L).sub.n.sub.-1 (E.sub.H).sub.n.sub.-1 (E.sub.L).sub.n (E.sub.H).sub.n

With reference to the n.sup.th analysis interval, q.sub.n.sub.-3 was the previous value of pitch actually transmitted, q.sub.n.sub.-2 is the value of pitch to br transmitted and (.tau..sub.1).sub.n.sub.-1 is the position of the largest peak in the (n-1) C.sub.n function. The value q.sub.n.sub.-1, which is transmitted during the (n+1) analysis interval, may or may not be equal to (.tau..sub.l).sub.n.sub.-1. This depends primarily upon whether or not (.tau..sub.1).sub.n.sub.-1 tracks q.sub.n.sub.-2 and is discussed subsequently with reference to Condition 5 listed above.

It should be noted that in the present system q.sub.n.sub.-3 = 0 does not always imply that the (n-3) analysis interval was called unvoiced and a zero value for pitch was actually transmitted. A nonzero value could actually have been transmitted but conditions were such that the (n-3) interval was tagged as unvoiced within one of the branch operations. This situation can occur in the branch operation shown in FIG. 9A when conditions .phi. and .phi.'" are not satisfied.

Referring to Conditions 1 through 5 listed above, it is seen that a different course of action is taken depending upon whether q.sub.n.sub.-3 was zero or not. In particular, Condition 1 results in continued unvoicing since the previous interval was unvoiced and the low-band and high-band energies, E.sub.L and E.sub.H, are too small and too large respectively to satisfy condition .phi..

Condition 2 generally occurs at a UV-V boundary where the previous interval was tagged as unvoiced (i.e., q.sub.n.sub.-3 = 0) but now condition .phi. is satisfied. When this occurs, an initial FTLSM peak tracking mode is initiated during the n analysis interval which is detailed in the block diagram of FIG. 7. The operations performed in this figure are self-explanatory.

Condition 3 generally occurs during the trailing off of voicing where the low-band energies are starting to diminish. In this regard, condition .phi.', which involves only two energy measures [(E.sub.L).sub.n and (E.sub.L).sub.n.sub.-1 ] and one energy ratio [(E.sub.L /E.sub.H).sub.n.sub.-1 ], is clearly an easier condition to satisfy during trailing off of voicing than is condition .phi.. This less stringent condition is incorporated to ensure that pitch tracking can continue well into the trailing off of voicing. When condition 3 is present, the logic branch shown in FIG. 8 is actuated. Voicing may or may not be extended one analysis interval at this point, depending upon whether or not a secondary condition (condition .phi.") involving the energy ratio (E.sub.L /E.sub.H).sub.n.sub.-1 is satisfied (see FIG. 8).

Condition 4 occurs during the voicing mode whenever pitch tracking of the primary FTLSM peak (p.sub.1, .tau..sub.1).sub.n.sub.-1 fails to occur and the amplitude P'.sub.1 of the tracking peak (P'.sub.1, .tau.'.sub.1).sub.n.sub.-1 is less than one-fourth of the primary peak P.sub.1. Under this condition, Flag is set to zero and Condition 4 results in actuating the logical branch shown in FIG. 9A. The most stringent condition .phi. is applied at this point to determine if the vocoder is in a steady voicing mode. Pitch tracking can be reestablished within this branch as indicated in FIG. 9A if both condition .phi. is satisfied and the FTLSM peak again tracks the peak that failed to track one analysis interval ago. Occasionally, two or more analysis intervals are required to reestablish the pitch tracking mode. When condition .phi. is not satisfied, assuming condition 4 has occurred, a fourth energy condition test is performed (condition .phi.'" of FIG. 9A) that involves only the high-band energy measure for the n-2 analysis interval, (E.sub.H).sub.n.sub.-2. If this value is found to be larger than the preset threshold level, K.sub.4, in this case, the decision is made to call the (n-2) interval unvoiced and then to transmit q.sub.n.sub.-2 = 0. On the other hand, if the high-band energy measure (E.sub.H).sub.n.sub.-2 is less than K.sub.4, the decision is made that q.sub.n.sub.- 2 was a bona fide pitch value and it is transmitted. In either event, q.sub.n.sub.-1 is set to zero whether or not condition .phi.'" was satisfied since condition .phi. itself was not satisfied. Setting q.sub.n.sub.-1 = 0 in this case, in effect, is to assume that a UV-V boundary has occurred due to the failure to satisfy condition .phi..

Condition 5 is referred to as the steady voicing mode. The branch that is actuated during this mode is shown in FIG. 9B. In normal, steady-state voicing, pitch tracking generally occurs normally, and the logical operations performed by the pitch detector are confined for the most part to this branch. The logic is straightforward with the possible exception of the negation of the pitch tracking subbranch. Negation of pitch tracking occurs when the largest peak (P.sub.1, .tau..sub.1).sub.n.sub.-1 fails to track (.tau..sub. 1).sub.n.sub.-2 and the amplitude of the peak in the vicinity of .+-.1 msec of (.tau..sub.1).sub.n.sub.-2 P'.sub.1 fails to exceed one-fourth of the largest peak P.sub.1. When this occurs, Flag is set to zero and the value q.sub.n.sub.-2 is transmitted. Then q.sub.n.sub.-1 is set to equal to (.tau..sub.1).sub.n.sub.-1 which is the position value of the largest peak in the data. Tracking of (.tau..sub.1).sub.n.sub.-1 relative to (.tau..sub.1).sub.n.sub.-2 has thus been negated at this point. Tracking can again be reestablished, however, during the next or a later analysis interval by means of the logic branch shown in FIG. 9A.

Synthesis Section of Vocoder

The synthesis section of the vocoder system has basically three functions to perform: (1) to obtain the vocal tract response function from the FTLSM data, (2) to obtain the voicing data from the FTLSM and (3) to convolve the vocal tract response with voicing data to obtain the desired synthesized speech signal. One embodiment of a device for performing the above-recited functions is given in respective FIGS. 10A, 10B, 11, 12 and 14.

Referring to FIGS. 10A and 10B, the embodiment of the device to obtain the vocal tract response function from the received FTLSM data includes a decoding device 100 (to be discussed in detail hereinafter) which dequantizes, descales and interlaces one set of 13 received FTLSM coefficients C.sub.n with a set of eight interlaced values previously sent to form a set of 21 C.sub.n coefficients to be processed. Connected to the output of the decoding means 100 is a spectrum decoder means 102, for example, a DFT computer such as the Sylvania Electric Products Inc. ACP-1 computer. The output of the spectrum decoder means 102 is connected to a delogging computer 104, the output of which is connected to an unvoiced modifying unit 106. The unvoiced modifying unit 106 has a second input connection originating at the decoding device 100 and an output connection connected to an odd function generator 108. (The second input to the summation circuit 48 of FIG. 2A originates at the odd function generator 108.)

The input data to the decoding device 100 is in the same form as the output data of the processor unit 40 of FIG. 2B as calculated per Equation 41. The spectral envelope data is contained in the scaled and quantized C.sub.n coefficients received at the input to decoding device 100. The particular functions performed by the decoding device are then to dequantize, descale and to interleave two sets of 13 received coefficients into sets of 21 coefficients. (The details of the decoding device 100 will be discussed in detail hereinafter.) The C.sub.n coefficients (received FTLSM signals) thus obtained are applied to the spectrum data decoder 102 where they are Fourier transformed to obtain the spectral magnitude of the analyzed speech signal. Any well-known special purpose computer can be employed as the spectrum data decoder 102, such as the Sylvania Electric Products Inc. ACP-1 computer referred to hereinabove. The spectrum data decoder 102 must solve the equation

where

V.sub.n = n.sup.th frequency sample

C.sub.k = k.sup.th sample of the C.sub.n function

N = effective size of the transform

For example, n may run from 4 to 128 in steps of four, i.e., 32 steps of data points. Therefore the log .vertline.V.sub.n .vertline. will be evaluated at 32 discrete frequencies.

The logged samples are then directed to a delogging computer 104, such as the Sylvania Electric Products Inc. ACP-1 computer, which in essence performs the inverse of the algorithm solved by the logging algorithm computer 44 of FIG. 2B and described in detail hereinabove. The output signal from delogging computer 104 includes 32 samples of spectral magnitudes of the received signal which are directed through the unvoiced modifying unit 106. The data will pass through the unvoiced modifying unit 106 unaltered if the data is a result of a voiced sound and will be modified in a random manner if the data results from an unvoiced sound. To be compatible with the imaginary input, the 32 delogged samples are directed through the odd function generator 108 where they are converted into 256 samples. The odd function generator inserts the 32 samples into every fourth position of a 256 word storage means 184. The original 32 samples when placed into every fourth storage word of storage means 184 occupies the lower 128 storage locations. At the same time the samples are being inserted into every fourth position, the negative of the sample values obtained by negating sample values using the negating circuit 186 are inserted in corresponding positions which form an odd symmetric function about N/2 (128).

Decoding Device

The decoding device 100 includes first and second buffer storage devices 120 and 122 respectively. A multiplier circuit 124 has input connections from the buffer storage device 120 and from one section of a two section memory 126 and an output connection to a subtractor circuit 128 which is well-known in the art. A second input to the subtractor circuit 128 originates at a second section of two section memory 126, and the output is connected to an interlacing means 130 which converts the incoming 13 C.sub.n samples to 21 C.sub.n samples.

The interlacing means 130 includes four input gates, such as a high gate 132, a low gate 134, a negative gate 136 and a positive gate 138, connected to the output of the subtractor circuit 128. A second input connection to the negative gate 136 and the positive gate 138 is the output connection of the low gate 134, and the outputs from the respective gates are connected to a plurality of separate locations in a first storage device 140. The high gate 132 has a second input connection from a central source (not shown) and an output connection to a second storage device 142. The first and second storage devices 140 and 142 are connected together and represent the output connection of interlacing means 130 and the decoding device 100.

The buffer storage 122 (which stores the seven bit pitch information) is connected to the input of a seven input OR gate 150. The output of the OR gate 150 provides a second input to the unvoiced modifying unit 106. If there is a "one" in any of the bit locations of the buffer storage 122 (indicating a voiced sound), then a signal is directed to the unvoiced modifying unit 106 which passes the spectral data from the delogging computer 104 to the odd function generator 108 unmodified. However, if there is no "one" in any of the seven bit positions (indicating an unvoiced sound) of the buffer storage 122, then the unvoiced modifying unit 106 modifies the spectral data in accordance with a manner to be discussed hereinafter.

Recalling that the output signal of the processor unit 40 is a 48 bit data word wherein seven bits represent pitch information (voiced or unvoiced information), 40 bits represent the vocal tract information (spectral information represented by the C.sub.n coefficients) and one bit indicating whether the even or odd set of the C.sub.n values is encoded in the present frame. The functions of the decoding device 100 are then to dequantize, descale and generate 21 coefficients from the 41 bits of the vocal tract information.

A descaling unit 125 includes the combination of the multiplier circuit 124, subtractor circuit 128 and the two section memory 126 and performs the reverse functions of the scaling unit 64 of FIG. 2B. For example, each received coefficient is multiplied in the multiplier circuit 124 by a memory location in the memory 126. This predetermined ratio is the inverse of the ratio as defined by Equation 43. The output signals of the multiplier circuit 124 are the dequantized 13 C.sub.n coefficients. Recalling that the C.sub.n coefficients prior to transmission were scaled in the positive direction to insure that no negative values were transmitted, to reconstruct the true coefficient the analyzer section of the vocoder must remove the scaling factor. The substractor circuit 128 subtracts the appropriate scaling factor (which is stored in the memory 126) from each dequantized coefficient received from the multiplier circuit 124. The details of the scaling factor are discussed hereinabove.

The coefficients generated in the analyzer section of the vocoder were quantized independently in accordance with Table II such that sets of 13 coefficients C.sub.n were transmitted during each data interval as stated hereinabove. The first five coefficients C.sub.0 through C.sub.4 are transmitted every data interval while the higher even and odd numbered coefficients are sent during alternate data frames. For example, in one time interval, coefficients C.sub.0 through C.sub.4 and C.sub.6, C.sub.8. . . C.sub.20 are sent and the next data interval C.sub.0 through C.sub.4 and C.sub.5 . . . C.sub.19 are sent. It has been emperically determined that approximately 3 msec of the C.sub.n function are required to adequately specify the spectrum envelope information. Therefore at the input to the spectrum decoder means 102, the even and odd sets of C.sub.n coefficients from two adjacent analysis intervals are used to specify the C.sub.n values for n = 5, 6, . . . 20, in conjunction with the C.sub.0 -C.sub.4 samples of the most recent frame.

The interlacing means 130 accomplishes this interlacing in the following way. The first five coefficients C.sub.0 through C.sub.4 represented by 21 bits are directed through the high gate 132 to the storage device 142. The control pulse from a source (while the control source is not shown, it is to be appreciated that control circuits to perform the functions recited herein are within the knowledge of one skilled in the art) is removed thereby closing the high gate 132, and a control pulse is applied to the low gate 134 to pass a single bit, the polarity (one or zero) of which activates the appropriate gate, the negative gate 136 or the positive gate 138. (The polarity of one of the 41 bits indicates whether the even numbered or odd numbered coefficients are being received in the present frame as indicated in Table II.)

For example, a one in the particular bit location is directed through the low gate 134 and opens the positive gate 138. The remaining 19 bits representing, for example, the even numbered C.sub.n coefficients are then written into predetermined memory locations of the storage device 140. Assuming there are 16 C.sub.n coefficients in the storage device 140 from the pervious data intervals, the new even numbered C.sub.n coefficients write over the old numbered coefficients previously stored. The five lower C.sub.n coefficients C.sub.0 through C.sub.4 stored in the storage device 142 are then combined with the 16 coefficients in the storage device 140 at point 141 to yield the desired 21 descaled, dequantized coefficients.

Recalling that the C.sub.n function is the FTLSM of the input speech and that the log operation separates the source information from the vocal tract information, the function of the spectrum data decoder 120 is to generate the received logged spectrum envelope from the first 21 C.sub.n coefficients. The process of selecting the low delay values of the FTLSM is the equivalent of low-pass filtering the logged spectrum magnitude. The spectrum data decoder 102 implements Equation 45 which yields the log of the spectrum envelope. THe logged data is then directed through a delogging computer which solves the logging algorithm (discussed hereinabove) in an inverse manner. Both Equation 45 and the delogging operation can be implemented on a computer such as the Sylvania Electric Products Inc. ACP-1 computer using well-known programming techniques.

Unvoiced Modifying Unit

During the synthesis of unvoiced sounds, the excitation function should ideally be the digital equivalent of a random noise impulse carrier in which the impulse areas are .+-.1 with equal probability and the frequency of the noise carrier is equal to the output sampling rate.

A convolution implemented system with a high data noise carrier would preserve the vocal tract impulse response spectrum shape; however, the processing time would be prohibitive in a real time digital processor. The prohibitive processing time is due to the fact that, for voicing, the maximum excitation rate is approximately 3.3 milliseconds at a typical sampling rate of 7,200 Hz. The voiced evaluation has pulses spaced approximately every 22 samples.

For the method of unvoiced carrier generation just described, there would be carrier pulses each sampling time or an increase of 22 times the amount of computation required for unvoiced synthesis vs. voiced synthesis. An alternative method of preserving the impulse response spectral magnitude shape would be to implement the high data rate time domain convolution with a frequency domain multiplication. In effect, the high data rate digital noise carrier pulses are replaced with, for example, 32 multiplications in the frequency domain.

This multiplication in the frequency domain is implemented by the unvoiced modifying unit 106. The output signal of the delogging computer means 104 is supplied as a first input signal to a digital multiplier 105. A second input to the digital multiplier 105 is the output connection of a gate 107, such as an AND gate which has a first input connection coupled to the seven input OR gate 150 of FIG. 10A and a random generator means 109, for example, the lower bit of the A/D converter 12 of FIG. 2A.

When the output signal of the OR gate 150 is comprised of all zeros (indicating an unvoiced sound) then the gate 107 is held open by the application of a logical "1" to its first input. Under this condition, the random sequence generator output signal is passed through the gate 107 and becomes the second input signal to the digital multiplier 105 which multiplies the spectral data from the delogging computer 104. For the case in which the pitch word is voiced, the application by the OR gate 150 of a logical "0" inhibits the gate 107 causing the gate 107 to supply a constant +1 value as to the digital multiplier 105.

Odd Function Generator/Even Function Generator

The odd function generator 108 and the even function generator 46 (FIG. 2A) operate in a similar manner. Therefore, only the odd function generator 108 will be discussed in detail with the appropriate circuit modifications to obtain an even function generator noted.

The output signal from the unvoiced modifying unit 106 includes 32 samples of data. To be compatible with the other data being processed in the FFT computer means 30 of FIG. 2A, the output signals must be converted to 256 samples having odd symmetry about the 128.sup.th sample. (The even function generator 46 must generate an even function about the 128.sup.th sample.)

The odd function generator 108 includes any well-known sequential counter 180 having a plurality of output terminals which are sequentially activated. Each of the plurality of output terminals is connected to a separate one of a plurality of gates 182, for example, an AND gate. In the instant example, there are 32 terminals and gates. A second input connection to each of the gates 182 originates at the output of the unvoiced modifying unit 106. The output of each gate 182 is connected directly to a first predetermined location in a storage means such as the memory 184. The OUTPUT OF each gate 182, except the first and last gates, is connected to a second predetermined location in the memory 184 through a plurality of negating circuits 186. The negating circuits 186, which change the sign of the data and leave the magnitude unaltered, are omitted in the case of the even function generator 46. The reason will become apparent as the operation of the odd function generator 108 is explained.

The sequential counter 180 provides a gating pulse which sequentially activates each of the gates 182. Each of 32 delogged spectrum samples are gated through its associated gate 182 to a predetermined location in the memory 184. For example, the first sample is stored in the 0.sup.th location and the 32 sample is stored in the 128.sup.th location. The remaining 30 samples are distributed in steps of four between the 0.sup.th and the 128.sup.th memory location. To generate the odd function, each of the remaining 30 samples is also directed through its respective negating circuits 186 to a location in the memory 184 symmetrically related to its first storage location with location 128 as a reference. For example, the sample stored in the memory location number 3 would be negated and stored in location number 252, the sample stored in memory location 7 would be negated and stored in memory location number 249, etc. Thus, the desired 256 sampled odd symmetrical function is generated and directed to the imaginary input terminal I.sub.K.sup. (1) of the FFT computer means 30 via the summation circuit 48 (FIG. 2A) and is processed by the FFT computer means 30 in the manner described hereinabove. The FFT computer means 30 generates the discrete sine transform of the received spectrum magnitude which appears at the output terminal H.sub.n of the FFT computer means 30 as the impulse response of the received speech signal.

The vocal tract response data (as represented by the impulse response H.sub.n) is weighted, for example, Hanning weighted, to both limit its duration is time by a precisely controlled amount and to remove the effects of discontinuities at the tails of the response function. The effect of the Hanning weighting in the frequency domain is that of filtering the samples of the spectral envelope function to obtain a smooth spectral envelope. The weighting operation is performed by the weighting circuit 200 of FIG. 11 which includes a storage device such as the register 202 having a first input connection from the H.sub.n terminal of the FFT computer means 30 and a second input connection from a masking circuit 204 well-known in the art. A multiplier circuit 206 has a first input connection from the register 202, a second input from a weighting function storage device 208 and an output connection connected to an odd function generator 210.

In operation, 128 samples representing the impulse response are received at the register 202 from the H.sub.n terminal of the FFT computer means 30. The masking circuit 204 masks out a predetermined number of samples, for example, the last 78 allowing only the first 50 to be transferred to the multiplier circuit 206. (This in effect limits the time of the impulse response.) For each of the remaining 50 samples, there is stored in the weighting function storage device 208 a predetermined value by which the sample is to be multiplied by the multiplier 206. The predetermined values stored in the weighting function storage device 208 is in accordance with the well-known Hanning weighting function.

The 50 weighted samples are then directed to the odd function generator 210 which operates in the same manner as the odd function generator 108 described hereinabove except that point of symmetry is the 50th sample in the case of the odd function generator 210. The output signal of the weighting circuit then is 100 samples of Hanning weighted vocal tract data having an odd symmetry.

The vocal tract data could at this point be convolved with the pitch information to yield the desired speech information. However, the synthesized voice would contain a certain harshness. By obtaining the average impulse response for two data analysis intervals and convolving both the average impulse response and the FFT calculated impulse response with the pitch data during appropriate parts of the analysis interval, the transition in vocal tract information between analysis intervals is reduced thereby reducing the harshness of the synthesized voice. This averaging is particularly useful when the impulse responses generated by the system for adjacent analysis intervals are very different. The occurs at transitions between sounds when the spectrum magnitude of the speech signal is rapidly changing.

Averaging Circuit

An averaging circuit 212 of FIG. 11 for obtaining the average spectral data includes first, second and third storage devices 214, 216 and 218 respectively. While three storage devices are shown, it is to be appreciated that one storage device having different storage addresses could be used. The input connection to the averaging circuit 212 is connected to the first storage device 214 and to a summation circuit 220, the output of which is connected to a multiplier circuit 222. (Both digital summation circuits and multiplier circuits are well-known.) A second input connection to the summation circuit 220 originates at the second storage device 216 which has an input connection from the first storage device 214. The third storage device 218 has an input connection from the multiplier circuit 222.

In operation, the 100 weighted samples from the weighting circuit 200 are directed simultaneously to the summation circuit 220 and to the first storage device 214. The storage device 214 holds the 100 samples for one data analysis interval before shifting its contents into the second storage device 216 for use in the next interval. The present 100 samples (for the n.sup.th analysis interval) are added to the 100 samples from the previous or n-1 analysis interval stored in the second storage device 216. The summed output samples are directed through the multiplier circuit 222 where the samples are multiplied by a constant, for example one-half, to obtain the average vocal tract data for two data analysis intervals. This 100 samples of average vocal tract data is stored in the third storage device 218. The 100 samples of data stored in each of the second and third storage devices 216 and 218, respectively, represents the spectral data which is to be convolved with the excitation function data derived from the seven bits of pitch information during any analysis interval, as will be explained hereinbelow.

Pitch Carrier Generator

The purpose of the pitch carrier generator is to generate from the seven bits of pitch data the proper excitation function. If the pitch data represents an unvoiced sound, then the pitch carrier generator generates an excitation function occurring at a fixed rate, for example, every 4.0 milliseconds. On the other hand, if the pitch information is voiced, the pitch carrier generator generates an excitation function, the instantaneous period of which is related to the value of the pitch signal represented by the seven bit pitch word.

An embodiment of a pitch carrier generator 229 according to the present invention is shown in FIG. 12 and includes a means 230 for obtaining voiced data samples based on the slope of a line as determined by two successive pitch words received during two successive data analysis intervals. Also included is a means 232 for generating a predetermined ramp function, for example, a 45.degree. ramp function. A comparator means 234 has input connections from the means 230 for obtaining voiced data samples and from the means 232 for generating a predetermined ramp function and has an output connection to a gating means such as the OR gate 236. A means 238 for generating an unvoiced pitch carrier provides a second input connection to the OR gate 236, the output of which is connected to a convolution circuit, to be discussed in detail hereinafter.

The waveforms (a), (b) and (c) of FIG. 13 will be useful in explaining the operation of the pitch carrier generator 229. Waveform (a) is a series of time marks where the time T between the timing marks n and n-1 is equivalent to one data analysis interval. In waveform (b) the end points 1 and 2 indicate the value of the pitch words at the n-1 interval and n interval, respectively, and are employed to calculate the slope of the line 3. The lines 4 of waveform (b) indicate the predetermined ramp function. The waveform (c) of FIG. 13 indicates the pitch carrier signals generated by the pitch carrier generator 229. While the waveforms are shown as solid lines, it is to be appreciated that there are in the instant example 100 discrete samples occurring in the time interval T.

In operation, the pitch word (assuming a voiced sound) for the n.sup.th data analysis is received at the input to the means 230 for obtaining voiced data samples from the buffer storage 122 of the decoding device 100. The means 230 determines the slope of the line connecting the magnitude of the pitch signal received at the n.sup.th (or present) data analysis interval with the magnitude of the pitch signal received in the previous data interval. The calculated slope value is sequentially added to each of the 100 samples and directed to the comparator means 234. Simultaneously, the means 232 for generating a 45.degree. ramp signal is supplying the comparator means 234 with samples of the ramp function signal. When the magnitude of the ramp function signal is equal to or greater than the magnitude of the output signal from the means 230, the comparator means 234 generates a fixed amplitude signal as shown in waveform (c) of FIG. 13. If the pitch word indicates an unvoiced sound (the seven bits of the pitch word are all zeros), the means for generating an unvoiced pitch carrier genrates a fixed amplitude signal at a fixed rate, for example, one pitch carrier every 4.0 millseconds.

Means for Obtaining Voiced Data Samples

The input connection from the buffer storage device 122 of FIG. 10A is connected to a gating means such as the AND gate 240 to a slope calculator 242 and to a first pitch storage means such as a pitch storage register 244. A storage counter 246 has a first input connection from the AND gate 240 and an output connection to a gating means such as an AND gate 248, the output of which is connected to a summation circuit 250. The slope calculator 242 has a second input connection from a second pitch storage register 252 and an output connection to a third gating means such as the AND gate 254, the output of which is connected to a second input connection of the summation circuit 250. The output of the summation circuit 250 is connected simultaneously to the comparator means 234 and as a second input to the storage counter 246.

Originating at the system timing unit (not shown) are second input connections to the AND gates 240, 248 and 254. The timing signals, called enabling signals, include an enabling 0 signal occurring once every T seconds where T corresponds to the length (seconds) of a data analysis interval, and an enabling 1 signal occurring once every T seconds where T corresponds to the sampling rate of the A/D converter 12 of FIG. 2A. In the system described herein, T is approximately 20 milliseconds and T is approximately 140 microseconds.

The means for obtaining voiced data samples solves the equation

Y.sub.t = Y.sub.t.sub.-1 + m (46) where Y.sub.t is a calculated value of a particular sample of the pitch signal, Y.sub.t.sub.- 1 is the value of a previously calculated pitch signal and m is the slope of a line joining the two pitch signals received during the n.sup.th and n-1 data analysis intervals.

The operation of means 230 for obtaining voiced data samples is initiated by the arrival of the seven bit pitch word for the n.sup.th data analysis interval. The pitch word is directed simultaneously to the pitch storage register 244, the slope calculator 242 and through the AND gate 240 to the storage counter 246. The pitch word will be held in the pitch storage register 244 for one time period T (corresponding to one data analysis interval) before being transferred to the pitch storage register 252.

The slope calculator 242 takes two pitch words, one for the n.sup.th analysis interval and one for the n-1 data analysis interval received from the pitch storage register n-1, and calculates the slope of a line connecting these two pitch words. The slope calculator can be any well-known computing means for solving the simple calculation

where T.sub.p(n.sub.-1) is the value of the pitch word for the n-1 data analysis interval and T.sub.pn is the value of the pitch word for the n.sup.th data analysis interval and T is the length of a data analysis interval. The output signal of the slope calculator is a constant over the interval T for which it was calculated.

Once every sampling interval T, an enabling signal, enable 1, activates the AND gates 254 and 248 to thereby direct the slope signal m and the previously calculated pitch signal (stored in the storage counter 246) to the summation circuit where the two signals are added in accordance with Equation 46. The summation output signal, Y.sub.t, is simultaneously directed to the storage counter 246 to become the Y.sub.t.sub.-1 signal for the next calculation and to the comparator means 234 for further processing.

Means for Generating a Predetermined Ramp Function

The means 232 for generating a predetermined ramp function includes a storage counter 260 connected through a gating means such as an AND gate 262 to a summation circuit 264. A fixed slope source 266, for example, a 45.degree. slope source, is connected through a gating means such as the AND gate 268 to a second input connection of the summation circuit 264. The summation circuit 264 has an output connection to the storage counter 260 and to the comparator means 234.

The means 232 for generating a predetermined ramp function solves the equation

X.sub.t = X.sub.t.sub.-1 + m

where X.sub.t is the calculated value of the ramp function at a particular time, X.sub.t.sub.-1 is the value of the ramp function calculated during the previous sample interval T and m is the slope value which is unity for a 45.degree. ramp. The means 232 recursively solves Equation 48 in the following way. Initially (when enable 0 occurs) the storage counter 260 starts counting from zero and for every sampling interval T the Equation 48 is solved placing a new ramp value in the storage counter 260 via the line from the summation circuit 264 to the storage counter 260. At any sample time T, an enable 1 signal activates the AND gates 202 and 208 thereby directing a signal having the previously calculated value X.sub.t.sub.-1 of the ramp function from the storage counter 260 to the summation circuit 264 and directing the slope signal from the fixed slope course 266 to the summation circuit 264. The two signals are added together in accordance with Equation 48 to yield the desired solution.

Comparator Means

The comparator means 234 includes a digital comparator circuit 270, well-known in the art, having first and second input connections, respectively from the summation circuit 250 and the summation circuit 264 and first and second output connections, respectively, to a pulsing means such as a one shot multivibrator circuit 272 and the OR gate 236. The one shot multivibrator circuit 272 is connected to the storage counter 260 and to the common juncture of the comparator circuit 270 and the OR gate 236.

The input signal Y.sub.t from the summation circuit 250 corresponds to line 3 of waveform (b) of FIG. 13 taken over the interval T, and similarly the input signal X.sub.t corresponds to the line 4 in the same waveform and over the same time interval. (The data analysis interval T changes once every 20 milliseconds and the sampling interval T occurs once every 140 microseconds.) It is to be appreciated that a value Y.sub.t corresponding to one increment of line 3 and a value X.sub.t corresponding to one increment of one of the lines 4 and then values Y.sub.t and X.sub.t are calculated once every 140 microseconds T over the interval T. Therefore, the comparator 234 makes a sample by sample comparison.

If the signal Y.sub.t is greater than the signal X.sub.t, then the comparator circuit 270 increments a zero from the second output connection to the OR circuit 236. If the signal X.sub.t is greater than or equal to the signal Y.sub.t, then the comparator circuit directs an output connection to the one shot multivibrator circuit 272 which generates and supplies a pulse output signal (see waveform (c) of FIG. 13) to the OR circuit 236. As can be seen in waveform (b) of FIG. 13, X.sub.t .gtoreq.Y.sub.t at the intersections of the lines 4 with the line 3 and therefore a pitch carrier signal, indicating a voiced sound, will be generated whenever the intersections occur. Thus the spacing between voiced pitch carrier signals is a function of the slope of line 3. For example, the greater the slope of line 3, the less frequent are the occurrences of the varied pitch carrier pulses as seen by the spacing between times t.sub.5 and t.sub.6 as compared with the spacing between times t.sub.2 and t.sub.3.

It can be seen from waveform (b) of FIG. 13 that each time an intersection occurs (X.sub.t .gtoreq.Y.sub.t) then the ramp function for the next X.sub.t starts at zero. The reset to zero is implemented by the connection of one shot multivibrator circuit 272 to the storage counter 260. Upon receipt of a signal from the one shot multivibrator circuit 272, the storage counter 260 is reset to zero thus setting the initial value of the ramp function each time the signal X.sub.t .gtoreq. the signal Y.sub.t.

Means for Generating an Unvoiced Pitch Carrier

The means for generating an unvoiced pitch carrier signal 238 includes a gating means such as an AND gate 280 having an output connection to the OR circuit 236 and a first input connection from a means operative to sense unvoiced pitch words such as the series combination of a seven bit OR gate 282 and a complement circuit 281 and a second input from a pulse generation circuit such as a free-running multivibrator circuit 284. It is to be appreciated that the number of inputs to the OR gate 282 corresponds to the number of bits in the pitch word and that the input connection to the OR gate 282 can originate anywhere in the vocoder system where the pitch word for the analysis interval of interest is stored, for example, from the pitch storage register 252 of means 230.

In operation the free-running multivibrator circuit 284 generates a constant amplitude pulse at a predetermined rate, for example, once every 4.0 milliseconds. The second input signal to the AND gate 280 from the complement circuit 281 maintains the AND gate 280 in an open condition in the absence of an input signal to the seven input OR gate 282. For example, in the unvoiced case, the seven bits in the pitch word (stored in pitch storage register 252) are all zeros, and the AND gate 280 directs the output signal from the free-running multivibrator circuit 284 to the OR gate 236. It should also be mentioned that for the unvoiced condition, the second input to the OR gate 236, which normally would come from the voiced pitch carrier generation portion of the pitch carrier generation unit 229, is inhibited. When a voiced pitch word is shifted into the pitch storage register 252 (a voiced pitch signal is characterized by a "1" in any of the seven bit positions), a signal is directed through one of the seven inputs of the seven input OR gate 282 to the complement circuit 281 where it is complemented and inhibits the AND gate 280 from passing any pulse from the free-running multivibrator 284.

Thus the output signal of the OR gate 236 is a fixed amplitude signal having either one of two repetition rate sequences. In the unvoiced case, the repetition rate is fixed, and in the voiced case, the repetition rate varies as a function of the slope of a line connecting two successive pitch words.

Convolution Unit

The synthesis of voiced sounds is accomplished by convolving the odd symmetric and discretely sampled time varying vocal tract response function with the digital equivalent of a unit area impulse carrier at a rate calculated in accordance with the pitch carrier generator unit 229 output signal. A convolution unit 300 according to the present invention is shown in FIG. 14 and includes a convolution means 302 for storing a predetermined vocal tract response signal having input connections from a logic means 304 and output connections to a summation circuit 306. Input connections to the logic means 304 include input connections from the pitch carrier generator 229 of FIG. 12 and the averaging means 212 of FIG. 11. (An input connection may come directly from the H.sub.n terminal of the FFT computer means 30 if the weighting circuit means 200 and/or the averaging circuit 212 are not employed.)

In operation the logic means 304 directs the vocal tract response signals from the averaging circuit 212 to predetermined storage locations in the convolution means 302. In response to each pitch carrier signal received from the pitch carrier generator 229, the logic means 304 selects a predetermined block of storage locations within the convolution means 302 from which a complete set of data samples, 100 in the instant example, are sequentially directed to the summation circuit 306 during the next 100 sample intervals. For each pitch carrier received, the convolution means 302 will supply sequentially 100 samples of the vocal tract response signal. If, for example, four pitch carrier signals are received at predetermined time intervals, the convolution means 302 will supply sequentially 100 samples of vocal tract data, each of the 100 samples starting at a time corresponding to the receipt of one of the pitch carrier signals.

This operation is shown in FIG. 13 by the waveforms (d), (e), (f), (g), (h) and (i). At each time t.sub.j (j = 1, 2, . . . . 6) a new scan of the appropriate impulse response is begun. The output signal corresponding to the output of summation circuit 306 is shown in waveform (k) of FIG. 13. For any specific time instant, only the discrete samples of the impulse response that are in the process of being scanned out are processed by summation circuit 306. For example, at time t.sub.4, the summation circuit would be summing together four samples. These samples would be the first sample of the impulse response starting to be scanned at time t.sub.4 and one sample each from the impulse response which began at times t.sub.1, t.sub.2 and t.sub.3. The 100 samples will therefore have periods of overlap and will be added together at the summation circuit 306 to form a composite signal which represents the desired synthesized speech signal. The composite signal may be directed through any well-known digital to analog converter and speaker, not shown, for listening.

The logic means 304 includes first and second gating means such as first and second AND gates 320 and 322, respectively, each having a first input connection from pitch carrier generator 229. A second input connection to each of the first and second AND gates 320 and 322 originates at first and second output connections, respectively, of a timing means such as a flip-flop 324. A third and fourth gating means, such as the AND gates 326 and 328, have a first input connection from the AND gate 322. A second input connection to the AND gates 326 and 328 originates at the first and second output connections, respectively, of a signaling means such as the flip-flop 330 which is also connected to the first input connections, of a fifth and sixth gating means such as the AND gates 332 and 334. The second input connection to the fifth and sixth AND gates 332 and 334 originates at the output connection of the storage device 216 of the averaging circuit 212 shown in FIG. 11.

The convolution means 302 includes a plurality, for example, three, of storage means 350, 352 and 354. The first storage means 350 has input connections from the third and fifth AND gates 326 and 332, the second storage means 325 has input connections from the fourth and sixth AND gates 328 and 334, and the third storage means 354 has input connections from the first AND gate 320 and the storage device 218 of the averaging circuit 212 shown in FIG. 11.

Each of the storage means 350, 352 and 354 may be similar; therefore, only the details of the storage means 350 are shown in FIG. 14. Each storage means includes a plurality of storage registers, the number of which depends upon the maximum pitch carrier rate. For example, at a frame updating time of 20 msec and with averaging of impulse responses occurring every one-half frame (10 msec), any specific impulse response is used to represent the vocal tract for only one 10 msec interval. Since the maximum pitch rate for the system is approximately a 3 msec rate, then at most four pitch pulses can be generated by the pitch carrier generator during any 10 msec interval. On this basis, at most four storage areas per impulse response are required to satisfy the maximum pitch carrier rate. These four storage blocks are shown for a typical storage means 350 as storage registers 360, 361, 362 and 363 of FIG. 14, and each of the four storage registers has a common input connection from the AND gate 332 and a separate input connection from a gating circuit 346, to be discussed in detail hereinafter. The output connection from each of the four storage registers 360 through 364 is connected to the summation circuit 306.

The operation of the convolution circuit will be explained in conjunction with the waveforms of FIG. 13. Assume that at time t.sub.1 an enable 0 signal is directed to the flip-flop 324, the output signal of which opens the AND gate 322 and blocks the AND gate 320. Simultaneously the enable 0 signal sets the flip-flop 330 such that the output signal of the flip-flop 330 opens the AND gates 326 and 332 and blocks the AND gates 328 and 334. The impulse response H.sub.n (vocal tract response function data) is then passed by AND gate 332 from the storage device 216 of the averaging circuit 212 to the storage registers 360 through 363.

A pitch carrier signal received from the pitch carrier generator 229, for example at time t.sub.1, will be directed through the AND gates 322 and 326 (the AND gate 320 is closed to the gating circuit 346. The gating circuit allows shift pulses at the sampling rate T to be directed to the storage register 360. One sample of the impulse response function H.sub.n, stored in register 360, is shifted into the summation circuit with the receipt of each shift pulse (T rate). While the waveforms of FIG. 13 are shown as continuous waveforms, it is to be appreciated that they are composed of discrete samples, but because of the relative time periods of T and T (20 milliseconds versus 140 microseconds), the signals appear as continuous waveforms.

At time t.sub.2, another pitch carrier signal, received from the pitch carrier generator 229, is directed through the AND gates 322 and 326 to the gating circuit 346 which opens a gate to the storage register 361 allowing the shift pulses occurring at the T rate to also access the H.sub.n data stored in the storage register 361. The summation circuit 306, receiving data samples from the two storage registers 360 and 361 every T seconds generates a composite signal, waveform (k) of FIG. 13, at its output terminal. As stated hereinabove, the composite signal generated at the output of the summation circuit 306 is the desired synthesized speech signal in digital form. Note that the four waveforms (d), (e), (f) and (g) are identical but start at times corresponding to the receipt of pitch carrier signals at t.sub.1, t.sub.2, t.sub.3 and t.sub.4.

At the time T/2, corresponding to the time of one-half the data analysis interval, the flip-flop circuit 324 changes state such that its output signal opens the AND gate 320 and closes the AND gate 322. Therefore, any pitch carrier signal occurring after the time T/2 is directed to the equivalent of the gating circuit 346 in the third storage means 354. For example, at times t.sub.5 and t.sub.6, pitch carrier signals received from the pitch carrier generator 229 are now directed through the gate 320 to the gating circuit of the third storage means 354 which transfers the shift pulses to the first and second storage registers, respectively, of the third storage means 354. The average impulse response data have waveforms (h) and (i) and are transferred to the summation circuit 306 where they are added to the impulse responses that have not been completely scanned as represented by the waveforms (d) through (g) to supply a digital composite signal, waveform (k), representing the synthesized speech.

Note that shortly after time t.sub.5 the impulse response that began to be scanned out of storage means 350 at time t.sub.1 is complete and storage register 360 no longer supplies data to the summation circuit 306. Similarly by time t.sub.6 the data in storage register 361 which was being supplied to the summation circuit 306 due to the pitch carrier generator 229 output at time t.sub.2 has also been completely scanned. Therefore, the convolution is actually a process of gating in and gating out appropriate storage register contents as a function of the pitch carrier generator output. The synthesis for unvoicing is almost identical to the voiced synthesis just described. The differences between the two modes include the following:

1. the pitch carrier generator 229 instead of supplying a pitch carrier which is a function of the speaker's pitch (i.e., during unvoiced sounds there does not exist a periodic excitation) arbitrarily supplies a pitch carrier at some fixed rate (4 msec for this example).

2. The spectrum of H.sub.n must previously have been multiplied by a random sequence of .+-.1 amplitude. This multiplication in the frequency domain is equivalent to convolving the impulse response H.sub.n with a noise carrier in the time domain.

3. The impulse response used in the convolution means 300 during the unvoiced excitation pitch carrier is randomly multiplied by .+-.1 before storage in its respective storage means 350, 352 or 354.

To perform the random multiply by .+-.1 of the impulse response, a control signal from the unvoiced pitch carrier signal means 238 is supplied to storage registers 371 and 372 of FIG. 14. These storage means contain the constants +1 and -1. During normal voicing the multiplier supplied by storage registers 371 and 372 to the multiplier circuits 370 and 373 are always +1. During the unvoiced synthesis intervals the storage registers 371 and 372 randomly supply either +1 or -1 as the second input to the multiplier. In this manner the H.sub.n generated for an unvoiced sound is convolved with a fixed rate sequence whose individual impulse values may be +1 or -1.

One embodiment of a gating circuit 346 employed in the first, second and third storage means 350, 352 and 354 is shown in FIG. 15 and includes an input terminal 381 connected to a first flip-flop circuit 380 and to first, second and third gating means such as the AND gates 382, 384 and 386. The output connection of the first flip-flop 380 is connected to a fourth gating means such as the AND gate 388 and to a second input connection of the first AND gate 382. The output connection of the fourth AND gate 388 is connected to the first storage register 360 of the first storage means 350. A second flip-flop circuit 390 has an input connection from the first AND gate 382 and an output connection to the second AND gate 384 and to a fifth gating means such as the fifth AND gate 392, the output connection of which is connected to the second storage register 361.

A third flip-flop circuit 394 has an input connection from the second AND gate 384 and an output connection to the third AND gate 386 and to a sixth gating means such as the sixth AND gate 396, the output of which is connected to the third storage register 362. A fourth flip-flop circuit 398 has an input connection from the third AND gate 386 and an output connection from the third AND gate 386 and an output connection to a seventh gating means such as the seventh AND gate 400, the output of which is connected to the fourth, storage register 363. A second input connection to each of the fourth, fifth, sixth and seventh AND gates 388, 392, 396 and 400, respectively, originates at the source of shift pulses (not shown).

The operation of the gating circuit of FIG. 15 will be explained in conjunction with the waveforms of FIG. 13. At time t.sub.1, a carrier pulse, waveform (c), is received at terminal 381 which changes the state of the first flip-flop circuit 380 such that the first and fourth AND gates 382 and 388 which are normally closed are opened The fourth AND gate 388 immediately directs shift pulses to the first storage register 360 of FIG. 14 which in turn starts reading out the data corresponding to waveform (d). At time t.sub.2, a second pitch carrier signal is received at the input terminal 381 and is directed simultaneously to the first flip-flop circuit 380 and through the now open first AND gate 382 to the second flip-flop circuit 390 which in turn opens the normally closed second and fifth AND gates 384 and 392. The second pitch carrier signal has no effect on the first flip-flop circuit 380 as its state was changed by the first pitch carrier signal received at time t.sub.1. Both the fourth and fifth AND gates 388 and 392 are simultaneously directing shift pulses to the respective first and second storage registers 360 and 361 to generate the waveforms (d) and (e).

Similarly at times t.sub.3 and t .sub.4 the respective pitch carrier signals are directed to the associated second and third AND gates 384 and 386 to open the sixth and seventh AND gates 396 and 400 via the third and fourth flip-flop circuits 394 and 398 whereby the shift pulses are transferred to the respective third and fourth storage registers 362 and 363 of FIG. 14. Thus the gating circuit 346 not only provides shift pulses to an additional storage register upon the receipt of each carrier signal but also continues to provide the shift pulses to each activated storage register until the 100 samples of vocal tract data are passed to the summation circuit 306.

Synthesizer Summary

In summary, the synthesizer includes a decoding device 100 (FIG. 10A) which dequantizes, descales and converts 13 received C.sub.n coefficients into 21 C.sub.n coefficients. These 21 C.sub.n coefficients are then directed through a spectrum decoder 102 (FIG. 10B) where the logarithm of the spectrum envelope of the received vocal tract impulse response function is computed using the discrete Fourier transform (DFT). The resultant 32 samples of a logged signal are then delogged in the delogging computer 104. The 32 delogged samples are spaced along the frequency axis in every fourth frequency position in the lower 128 locations of a 256 word data area and converted into a signal having 256 samples of odd symmetry by the odd function generator 108. The function is made odd symmetric to use the remaining odd symmetric input of the imaginary input I.sub.k .sup.(1) of the FFT computer means 30 (FIG. 2A). This odd part of I.sub.k .sup.(1) is transformed by the FFT computer means 30 and is available at the output terminal H.sub.n of the FFT computer means 30. Since the 256 samples are the discrete sine transform of the received spectrum magnitude, H.sub.n is an odd function (see waveform (c) of FIG. 3) and is the impulse response which is to be used in synthesis of the received speech signal.

The 256 samples of impulse response data are then directed through a weighting circuit 200 (FIG. 11) where the data is Hanning weighted and the number of samples is reduced to 100. These 100 samples are chosen such that the remaining H.sub.n function is still odd symmetric. The 100 samples of weighted data are directed to an averaging circuit 212 where they are both stored for use in conjunction with the previous 100 samples to form 100 samples of the average impulse response. The previous 100 samples and the 100 average samples are directed to the convolution means 300 (FIG. 14).

The seven bits of received pitch data are directed to a pitch carrier generator 229 (FIG. 12) where the pitch data is converted into a fixed amplitude pitch carrier signal having a fixed rate in the case of unvoiced sounds and having a rate related to the slope of a line connecting to successive pitch words for the voiced sounds. The pitch carrier signals are then directed to the convolution means 300 (FIG. 14) where they are convolved with the 100 samples of the appropriate impulse response data to generate the desired synthesized speech.

What has been shown and described herein is considered a preferred embodiment of the present invention. It will be obvious to those skilled in the art that various modifications and changes may be made without departing from the invention as defined by the appended claims.

* * * * *


uspto.report is an independent third-party trademark research tool that is not affiliated, endorsed, or sponsored by the United States Patent and Trademark Office (USPTO) or any other governmental organization. The information provided by uspto.report is based on publicly available data at the time of writing and is intended for informational purposes only.

While we strive to provide accurate and up-to-date information, we do not guarantee the accuracy, completeness, reliability, or suitability of the information displayed on this site. The use of this site is at your own risk. Any reliance you place on such information is therefore strictly at your own risk.

All official trademark data, including owner information, should be verified by visiting the official USPTO website at www.uspto.gov. This site is not intended to replace professional legal advice and should not be used as a substitute for consulting with a legal professional who is knowledgeable about trademark law.

© 2024 USPTO.report | Privacy Policy | Resources | RSS Feed of Trademarks | Trademark Filings Twitter Feed