Audio Response Apparatus Using Partial Autocorrelation Techniques Patent Grant Saito , et al. May 9, 1 [Nippon Telegraph and Telephone Public Corporation]

Audio Response Apparatus Using Partial Autocorrelation Techniques

Saito , et al. May 9, 1

Patent Grant 3662115

U.S. patent number 3,662,115 [Application Number 05/079,430] was granted by the patent office on 1972-05-09 for audio response apparatus using partial autocorrelation techniques. This patent grant is currently assigned to Nippon Telegraph and Telephone Public Corporation. Invention is credited to Fumitada Itakura, Tsunehiko Koike, Masaaki Nishikawa, Shuzo Saito.

United States Patent	3,662,115
Saito , et al.	May 9, 1972

AUDIO RESPONSE APPARATUS USING PARTIAL AUTOCORRELATION TECHNIQUES

Abstract

The audio response apparatus comprises means for storing speech parameters including partial autocorrelation coefficients between two closely adjacent time instants of speech signal, which are derived by removing the redundant components from the actual speech signal levels of the two adjacent instants in consideration of the effect of intermediate sample levels between them and an excitation source information determined from sampled values at remotely spaced time instants, a memory to store the speech parameters, read out means to read out the speech parameters from the memory which are designated by an electronic computer, and a speech synthesizer to reconstruct the speech signal from the output of the readout means. The synthesizer is comprised by high speed logic elements and operates to synthesize multichannel audio outputs on the time division basis.

Inventors:	Saito; Shuzo (Tokyo, JA), Itakura; Fumitada (Tokyo, JA), Nishikawa; Masaaki (Tokyo, JA), Koike; Tsunehiko (Tokyo, JA)
Assignee:	Nippon Telegraph and Telephone Public Corporation (Tokyo, JA)
Family ID:	26346352
Appl. No.:	05/079,430
Filed:	October 9, 1970

Foreign Application Priority Data


Feb 7, 1970 [JA]			45/10992
Feb 7, 1970 [JA]			45/10993

Current U.S. Class:	704/200; 708/318; 380/35; 704/E13.002
Current CPC Class:	G10L 13/02 (20130101)
Current International Class:	G06F 3/16 (20060101); C10L 1/00 (20060101); H04M 11/00 (20060101); H04M 3/00 (20060101); G06F 15/00 (20060101); C10l 001/00 (); H04m 011/00 ()
Field of Search:	;179/1SA,15.55 ;324/77 ;340/148,152

References Cited [Referenced By]

U.S. Patent Documents


3209074	September 1965	French
3069507	December 1962	David
3281789	October 1966	Willcox

Primary Examiner: Claffy; Kathleen H.
Assistant Examiner: Leaheey; Jon Bradford

Claims

What is claimed is:

1. An audio response apparatus comprising means for previously storing speech parameters including partial autocorrelation coefficients between two closely adjacent time instants of a speech signal required for answering and excitation source informations, said coefficients being determined by calculating, with respect to a plurality of sampling instants, partial autocorrelation coefficients of said two instants representing the correlation of the difference between the error value predicted by the least squares method from sampled values at said two instants and the actual values of the speech signal at said two points, said excitation source informations being obtained by determining the autocorrelation between remotely separated sampled values; an electronic computer for supplying a command signal for designating the speech parameters of a speech signal to be synthesized; means to read out said speech parameters designated by said command signal from said memory means; and a speech synthesizer response to the output from said read out means to synthesize a desired speech signal.

2. The audio response apparatus according to claim 1 which further includes a speech parameter extractor comprising an autocorrelation coefficient extractor having a plurality of cascade connected partial autocorrelation coefficient detector stages, each of said stages including a delay network connected to receive an speech signal, a correlation coefficient calculator receiving the output from said delay network and for directly receiving said speech signal, a first multiplier connected to receive the output from said delay network and the output from said correlation calculator, a second multiplier connected to directly receive the output from said correlation coefficient calculator and said speech signal, a first adder for adding the output from said delay network and the output from said second multiplier, a second adder for adding the output from said first multiplier and said speech signal, and a quantizer to quantize the output from said correlation coefficient calculator to provide a partial autocorrelation coefficient between said two instants; an autocorrelator connected to one output terminal of the last detector stage of said extractor; and a maximum value selecting means for determining the period and amplitude of an excitation source signal from a group of outputs from said autocorrelator.

3. The audio response apparatus according to claim 1 wherein said speech signal synthesizer comprises a pulse generator and a white noise generator which are controlled by the fundamental pitch period of the speech, an amplitude controller connected to said generators and controlled by the fundamental amplitude information of the excitation source, and means for controlling the output from said amplitude controller in accordance with the partial autocorrelation coefficient designated by said electronic computer for reconstructing the speech signal by the correlation between a group of said correlation coefficients.

4. An audio response apparatus comprising means for deriving speech parameters from partial autocorrelation coefficients and an excitation source information of respective speech signals regarding a plurality of speech signals required for answering; memory means for storing said speech parameters; an electronic computer for sending a command signal designating the speech parameters for respective output channels to send answers to a plurality of output channels; a plurality of read out means to read out speech parameters designated by said electronic computer from said memory means, a single speech synthesizer connected to receive a plurality of sets of the speech parameters from said read out means on the time division basis to form a group of digital codes representing respective sets of designated speech wave from the excitation source signals corresponding to the excitation source informations and said partial autocorrelation coefficients of respective sets of speech parameter; means to read out on the time division basis a group of digital codes from said speech synthesizer and to convert said digital codes into pulse amplitude modulated signals; and timing gate means for distributing said modulated signals among a plurality of output channels.

5. The audio response apparatus according to claim 4 comprising cyclic memory means for storing speech parameters including excitation source informations and said partial autocorrelation coefficients regarding a plurality of speech units of a predetermined constant length to be required to send an answer, in a plurality of cyclic store arrangements each divided into a plurality of frames, a parameter buffer memory for temporally storing the speech parameters in respective frames of the speech unit read out from said cyclic memory means; a speech synthesizer including a purely digital logic means response to the speech units designated by the electronic computer and to be answered to a plurality of output channels for correlating the excitation source signals corresponding to the speech informations of the speech parameters selectively read out by said parameter buffer memory means under the control of said partial autocorrelation coefficients whereby to convert said speech parameters into a group of digital codes representing the waveforms of respective speech signals designated; an output buffer memory for temporally storing the group of said digital codes from said speech synthesizer; and means for converting said digital codes read out from said output buffer memory into analogue signals.

6. The audio response apparatus according to claim 4 which comprises means for successively storing vacant addresses of a memory speech parameters each including an excitation source information and a partial autocorrelation coefficient regarding a plurality of speech units required for sending an answer of a predetermined length; a parameter buffer memory for temporally storing the speech parameters of a speech unit read out from an address in said memory corresponding to the speech unit designated by said electronic computer and to be answered to a plurality of output channels; and means for successively reading out said speech parameters from said parameter buffer memory and to apply said read out speech parameters to said speech synthesizer.

Description

BACKGROUND OF THE INVENTION

This invention relates to audio response apparatus utilizing an electronic computer to present various information services, and more particularly to novel audio response apparatus wherein speech signals to be responded are memorized in the form of speech parameters which are read out according to the command from the electronic computer to reconstruct speech by means of a synthesizer.

In prior art apparatus of the type referred above so-called compiling method of prerecorded speech has been used wherein speech segments (hereinafter termed "speech units") in the form of, e.g. word speech units are stored in a memory and the stored speech units are successively selected is a suitable order in response to the command from the electronic computer to reconstruct or compile a speech message. In this method, speech units are generally recorded directly in the form of audio waveforms, and generally as the recording medium is used a low speech analogue magnetic drum having a period of revolution equal to the time length of one speech unit so as to record one speech unit in each track. With this construction, however, it is difficult not only to increase the capacity of the analogue magnetic drum but also to increase the number of speech units that can be recorded to 100 - 200 or more.

To eliminate these problems of the compiling method of prerecorded speech units, it has been proposed a method wherein, instead of directly recorded speech signals, informations of compressed signals are recorded for reconstructing speech signals by means of a speech synthesizer. One example of the audio response apparatus constructed according to this principle is the apparatus utilizing the principle of a channel vocoder (See, for example, R. H. BURON: I.E.E.E. Trans. AU-16, 1, 1968). However, when using a channel vocoder, the quality of the audio output is poor. Moreover, it is necessary to install an expensive speech synthesizer on each output channel.

SUMMARY OF THE INVENTION

It is an object of this invention to provide a novel audio response apparatus according to which an speech signal is represented by a new parameter which is termed "a partial autocorrelation coefficient" and the parameter is used to form a number of speech units whereby to produce speech outputs of excellent quality.

Another object of this invention is to provide a novel speech parameter extracting device for forming a partial autocorrelation coefficient and an excitation source information.

Another object of this invention is to provide an inexpensive cyclic memory device which can store a plurality of parameters in the form of partial autocorrelation coefficients as the speech units.

A further object of this invention is to provide a simple speech synthesizer comprising a plurality of cascade connected digital filters for reconstructing a speech from a number of speech units selected from the memory device.

Still another object of this invention is to provide a novel audio response apparatus in which a single speech synthesizer can be used in common for a plurality of output channels.

According to this invention there is provided means according to which speech signal levels at two closely adjacent time instants are selected and the intermediate signal levels between these time instants are used to determine the difference between the signal levels at these two instants predicted by the least squares method and the actual signal levels, or the partial autocorrelation coefficient. Further, means is provided to vary the time interval between said two time instants to determine the partial autocorrelation coefficient at new two time instants. By repeating these operations it is possible to determine a plurality of partial autocorrelation coefficients. Since these coefficients are closely related to the frequency spectrum envelope of the speech signal it is possible to synthesize a speech from such excitation source informations as the fundamental frequency, its amplitude and the noise amplitude which are extracted from the speech signal. More particularly, there is provided an excitation source generator controlled by the excitation source information so as to control the output signal from the generator by the partial autocorrelation coefficient, to reproduce the frequency spectrum envelope.

Further, in accordance with this invention, the parameter memory device for storing a plurality of partial autocorrelation coefficients may be an inexpensive memory of large capacity and the digital speech synthesizer for reproducing the frequency spectrum envelope is constructed to be utilized on the time division basis.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention can be more fully understood from the following detailed description when taken in conjunction with the accompanying drawings in which:

FIG. 1 is a schematic connection diagram to show the principle of the novel audio response apparatus;

FIG. 2 shows a speech signal curve for explaining the partial autocorrelation coefficient;

FIG. 3 shows a connection diagram of an extracting apparatus for extracting the partial autocorrelation coefficient and the excitation source information;

FIG. 4 is a diagram of apparatus for determining the correlation coefficient utilized in this invention;

FIG. 5 is a connection diagram of one example of a autocorrelation apparatus;

FIG. 6 is a connection diagram of one example of a speech synthesizer;

FIG. 7 is a connection diagram of the novel audio response apparatus in which the synthesizer is utilized in multiplex on the time division basis;

FIG. 8 shows a cyclic store arrangement of the speech parameters on a magnetic drum of the embodiment shown in FIG. 7;

FIG. 9 is a block diagram of the word synchronizer utilized in the embodiment shown in FIG. 7;

FIG. 10 is a time chart of control signals recorded on the magnetic drum;

FIGS. 11 and 12 show a block diagram and a diagram of the time relationship of the sequence control utilized in FIG. 7;

FIG. 13 shows a block diagram of the input control show in FIG. 7;

FIG. 14 is a block diagram of a modified audio response apparatus embodying this invention, and

FIG. 15 is a block diagram of the input control shown in FIG. 14.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

With reference now to FIG. 1 of the accompanying drawings, a request of an information service from a terminal telephone set 1 is coupled to an electronic computer 3 through an exchange equipment 2. Once this connection is established, electronic computer 3 is controlled by the terminal telephone set 1 and the output from the electronic computer is supplied to the audio response device 4, in the form of a code train of speech units to be answered. The audio response device 4 has memories of the partial autocorrelation coefficients and the excitation source information which are necessary to synthesize the answer speech and these memories are read out in response to the output from electronic computer 3 whereby to synthesize the speech. The synthesized speech signal is supplied to the terminal telephone set 1 via the exchange equipment 2. As shown in FIG. 1, a speech parameter extractor 5 is connected to the audio response device 4 for extracting the speech parameters from the speech, that is the partial autocorrelation coefficient and the excitation source information which are to be stored in the audio response device 4. The extractor 5 functions to check, when desired, the speech parameters being stored in the audio response device 5 or to replace such parameters with new speech ones.

The partial autocorrelation coefficient which is one of the parameters utilized to synthesize a speech according to this invention is defined as follows: More particularly, as shown in FIG. 2, when the speech signal is sampled at a frequency of 8KHz, for example, the partial autocorrelation between the values of the speech signal at two relatively close sampling instants t.sub.0 and t.sub.3 is expressed by the correlation of the difference .DELTA.X.sub.0 and .DELTA.X.sub.3 between predicted values by the least squares method X.sub.0 and X.sub.3 which are obtained by utilizing the sampled values X.sub.1 and X.sub.2 presenting in the interval between time instants t.sub.0 and t.sub.3, and the actual sample values X.sub.0 and X.sub.3. The interval between sampling times is varied successively to T, 2T, 3T, 4T . . . and the partial autocorrelation coefficients for these different intervals are determined. The partial autocorrelation coefficient is expressed by the following equation

where nT represents the interval between sampling times.

Denoting the predicated errors .DELTA.X.sub.0 and .DELTA.X.sub.n by using a delay operator D, we have ##SPC1## where .alpha..sub.i and .beta..sub.i are selected so as to make minimum the values of E{ (.DELTA.X.sub.n).sup.2 } and E{ (.DELTA.X.sub.0).sup.2 }.sup.. and D represents the delay operator expressed by an equation D.sup.i X.sub.n = X.sub.n.sub.-i.sup. . A.sub.n.sub.-1 (D) and B.sub.n.sub.-1 (D) are prediction error operators. Then the partial autocorrelation coefficient K.sub.n is expressed as follows

It can be proved that following equations hold among A.sub.n (D), B.sub.n (D) and k.sub.n

A.sub.n (D) = A.sub.n.sub.-1 (D) - K.sub.n B.sub.n.sub.-1 (D) (5) B.sub.n (D) = D[B.sub.n.sub. -1 (D) - K.sub.n A.sub.n.sub.-1 (6) ]

Thus, if A.sub.n.sub.-1 (D) and B.sub.n.sub.-1 (D) are determined, then K.sub.n and hence A.sub.n (D) and B.sub.n (D) can be determined. In this manner, it is possible to determine the partial autocorrelation coefficients. As this coefficient varies relatively gradually with time, the coefficient is determined at each period which is sufficiently long to extract the necessary speech parameter while preserving well the nature of the speech, for example at every 15 milliseconds and the derived coefficient is encoded and stored.

FIG. 3 shows one example of the extractor 5 for extracting a plurality of partial autocorrelation coefficients and excitation source informations from a speech signal. The extractor shown in FIG. 3 comprises n partial autocorrelation coefficient detector stages 14a through 14n which are connected in cascade. Since the respective detector stages have the same construction, the construction of only the stage 14a will be described in the following. More particularly, each partial autocorrelation coefficient detector stage comprises a delay network 7 for delaying the speech signal by one sampling interval T, a correlation coefficient calculator 8, a multipliers 9 and 11, adders 10 and 12 and a quantizer 13. A terminal 6 to the left of the detector stage 14a receives the speech signal and the terminal 15 of the quantizer 13 provides the partial autocorrelation coefficient quantized in each stage. One output terminal 12 of the final detector stage 14n is opened whereas the other output terminal 10 is connected to an autocorrelator 16. The outputs from this autocorrelator 16 are supplied to a maximum value selector 17 which in turn is connected to quantizers 18 and 20.

In operation, the speech signal impressed upon input terminal 6 is divided into two portions, one portion thereof being applied to adder 10 through correlation coefficient calculator 8 and multiplier 9 after being delayed by delay network 7 by one sampling period T. The other portion of the speech signal is supplied to adder 12 through correlation coefficient calculator 8 and multiplier 11. FIG. 4 shows one example of the circuit construction of the correlation coefficient calculator 8 comprising adders 22, a squaring devices 23, adders 24, low pass filters 25 and a division or ratio circuit 26.

Assuming now two inputs B.sub.n.sub.-1 (D)X.sub.n and A.sub.n.sub.-1 (D)X.sub.n for the correlation coefficient calculator 8, the inputs to two low pass filters 25 will be expressed respectively by ##SPC2##

Low pass filters 25 determine mean values of these inputs over a short time. Since mean values of (A.sub.n.sub.-1 (D)X.sub.n).sup.2 and (B.sub.n.sub.-1 (D)X.sub.n).sup.2 are approximately equal, the following equation holds

average value of 2{(A.sub.n.sub.-1 (D)X.sub.n).sup.2 + (B.sub.n.sub.-1 (D)X.sub.n).sup.2 } .congruent. average value of 4{ (A.sub.n.sub.-1 (D)X.sub.n).sup.2. (B.sub.n.sub.-1 (D)X.sub.n).sup.2 }1/2 (9)

whereby the value of K.sub.n is given by the output of ratio circuit 26. The output from the ratio circuit 26 is applied to multipliers 9 and 11 to produce a predicted value X.sub.1 of X.sub.1 on the output of multiplier 11. Adder 10 provides the difference (X.sub.1 - X.sub.1) between the predicted value X.sub.1 and the actual value X.sub.1. Further, multiplier 9 produces on its output the predicted value X.sub.0 of X.sub.0 and adder 12 provides the difference (X.sub.0 - X.sub.0). A portion of the output from the correlation coefficient calculator 8 is supplied to quantizer 13 to produce a quantized output of the partial autocorrelation coefficient at terminal 15.

Similar processings are also performed by another detector stages succeeding the detector stage 14a. More particularly, adders 10 and 12 of the second partial autocorrelation coefficient detector stage 14b provide differences (X.sub.2 - X.sub.2) and (X.sub.0 - X.sub.0). In the same manner, adders 10 and 12 of the last detector stage 14n provide the differences (X.sub.p - X.sub.p) and (X.sub.0 - X.sub.0), respectively. Where X.sub.p represents the sampled value of the audio waveform at a sampling time t.sub.p which is the pth point starting from t.sub.0, and X.sub.p and X.sub.0 represent predicted sampled values at t.sub.0 and t.sub.p which are predicted from the sampled values at two instants t.sub.0 and t.sub.p, respectively. In this manner, quantized values of the partial autocorrelation coefficients different in time intervals of T, 2T, 3T . . . pT are produced at output terminals of respective quantizers 13 in respective detector stages 14a, 14b . . . 14n. As the input speech signal reaches the last stage of a number of cascade connected detector stages 14a, 14b . . . 14n, the correlation between closely adjacent sampled values of the speech signal will be eliminated whereby the autocorrelation corresponding to the formant of the speech is eliminated. However, the correlation corresponding to the fundamental frequency of the speech is preserved without being eliminated. For this reason, when the output of one adder 10 of the last partial autocorrelation coefficient detector stage 14n is applied to the autocorrelator 16 to determine its autocorrelation, a significant peak will be formed with a time delay corresponding to the period of the fundamental frequency when the input speech signal is a voiced sound, whereas when the input speech signal is a unvoiced sound no peak will be formed. Consequently, when the input speech signal is the voiced sound, the output from the autocorrelator 16 is supplied to the maximum value selector 17 and the fundamental pitch period of the speech is obtained by measuring the interval between maximum values of two adjacent autocorrelation coefficients.

As is illustrated in FIG. 5, for example, the autocorrelator 16 comprises a plurality of delay networks 27, a plurality of multipliers 28 and a plurality of low pass filters 29. The fundamental pitch period of the speech obtained by the maixmum value selector 17 is quantized by quantizer 18 and is then sent to output terminal 19.

On the other hand when the input speech signal is the unvoiced sound, the fundamental pitch period of the speech will not appear at terminal 19. In such a case this information is utilized as a white noise signal of the excitation source information.

The excitation source amplitude derived from the amplitude value of the input signal to the autocorrelator 16 is quantized by the quantizing circuit 20 and is then applied to output terminal 21.

In this manner, the partial autocorrelation coefficient and the excitation source inforamtion which are necessary to the synthesis of the speech are produced at output terminals 15, 19 and 21, respectively. Since the temporary variation of the excitation signal is relatively gradual just like the partial autocorrelation coefficient it is sufficient to determine it at every 15 mulliseconds, for example, and the derived informations are encoded and stored.

According to this invention, a plurality of partial autocorrelating coefficients of the speech and the fundamental pitch period and the amplitude of the speech which are utilized as the excitation source signal informations, obtained by the above described operations are stored in the audio response device 4 shown in FIG. 1. When the audio response device 4 receives a command of a code train regarding the speech to be synthesized from the electronic computer 3, the device 4 functions to sequentially select the partial autocorrelation coefficients and the excitation source signal informations which have been stored beforehand in the memory in accordance with the command whereby to synthesize the designated speech.

FIG. 6 shows a block diagram of the device employed to synthesize the designated speech. The device comprises a pulse generator for the voiced sound 30, a white noise generator for the unvoiced sound 31 and an amplitude controller 32. Operations of the pulse generator 30, white noise generator 31 and amplitude controller 32 are controlled by signals applied upon respective input terminals 33 and 34. Terminal 33 is connected to receive one of the excitation signal informations which have been previously stored in the audio response device 4 and is selected by the electronic computer 3, that is the information regarding the fundamental pitch period of the speech so that the pulse generator 30 produces an inpulse train of the unit power having the same period as the fundamental pitch period. During an interval in which the information relating to the fundamental pitch period, one of the excitation source informations, is not applied on the control terminal 33 (that is the unvoiced sound interval), the white noise generator 31 provides a white noise signal output of unit power. In the same manner, the amplitude controller 32 receives an information relating to the signal amplitude, also one of the excitation signal source informations, from control terminal 34 to control the amplitude of the output signal.

The output from the amplitude controller 32 is applied to a number of cascade connected digital filters 35n . . . 35b and 35a. Each of these digital filters has the same construction and comprises adders 36, 38 and 39, a delay network 40 and a multiplier 37. A partial autocorrelation coefficient previously stored and selected by the electronic computer is applied to multiplier 37 through a terminal 41. One terminal of the delay network 40 of the digital filter 35n is opened whereas the output terminal 42 of digital filter 35a receives the synthesized speech output, a portion thereof being supplied to adder 39 in digital filter 35a via a delay network 43. Each of the digital filters 35n . . . 35b and 35a corresponds to each one of the partial autocorrelation coefficient detector stages 14n . . . 14b and 14a shown in FIG. 3. Thus, the partial autocorrelation coefficient (selected by the electronic computer) applied to the input terminal 41 of digital filter 35n is the partial autocorrelation coefficient that has been produced by the detector stage 14n shown in FIG. 3 and stored. In the same manner, the partial autocorrelation coefficient applied to the input terminal of the digital filter 35a has been previously produced by the detector stage 14a shown in FIG. 3. It will thus be noted that the transfer functions of the digital filters are inverse to those of the partial autocorrelation coefficient detector stages so that the correlation between speech waveforms that has been eliminated by a corresponding detector stage will be given to the output from the amplitude controller 32. Accordingly, as this output passes progressively through digital filters 35n . . . 35b and 35a the frequency spectrum envelope will gradually approach to the envelope of the original speech.

Although in this embodiment of the speech synthesizing device, digital circuits are shown to constitute the digital filters controlled by the partial autocorrelation coefficients, it will be clear that the digital filters can also be comprised by analogue circuits. In the system utilizing the digital circuit, by utilizing high speed elements it becomes possible to utilize the speech synthesizing device on the time division basis whereby multiplexing of the answer speech becomes easy as will be described later.

In order to produce a synthesized speech of excellent quality in accordance with this invention, the maximum value of the time difference between partial autocorrelation coefficient may be about 8T. When the partial autocorrelation coefficient for every time interval is encoded into a five bit code and extracted at a frame period of 15 milliseconds, the information capacity of the partial autocorrelation coefficients will be 2667 bits per second. On the other hand when the information of the excitation source signal is given at rate of 15 bits per every 15 milliseconds, the total capacity amounts 3667 bits per second. The term "frame period" herein used means a period in which the speech parameters are stored in a memory which is to be descriminated from the sampling interval.

This information capacity amounts to about 1/15 of that of the speech waveform. For this reason it is possible to obtain synthesized speech of high quality by means of controll signals of small capacity. For this reason, with the novel audio response apparatus it is possible to increase the number of words that can be synthesized by a factor of 15 when compared with the conventional apparatus.

Each one of the digital filter stages shown in FIG. 6 comprises one multiplier and three adders. Thus, when the operations of these multiplier and adders are controlled by a clock frequency of 10MHz, the operation time per one stage will be equal to approximately 1.8 microseconds. Assuming a miximum of 8T of the time difference of the partial autocorrelation coefficients, one sampled value of the synthesized speech will be formed within an interval of about 14.4 microseconds but since each stage completes its operation at every 1.8 microseconds it is possible to give an excitation source information to the input of the digital filters at every 1.8 microseconds thus producing synthesized speech outputs at every 1.8 microseconds. Consequently, above described period of 14.4 microseconds acts as a pure delay time necessary to synthesize one sample of speech output. Thus assuming a sampling frequency of 8KHz for the synthesized speech, it becomes to use in multiple about 64 channels.

In the novel audio response apparatus, the fundamental pitch period may be extracted by any one of another well known means other than that has been described. Further, while in the foregoing description the partial autocorrelation coefficient was obtained from a sampled value of the audio waveform it is to be understood that this coefficient can be determined by predicting the values of two closely adjacent instants by a signal presenting between these two instants and then determining the correlation of the differences between the actual values corresponding to the predicted values and the predicted values. Although in the foregoing embodiment, a plurality of digital filter stages were connected in cascade, it will be clear that a single digital filter may be used repeatedly to provide the desired synthesized speech.

The audio response apparatus described hereinabove comprises a memory to store the partial autocorrelation coefficients of a speech signal and the fundamental pitch period and signal amplitude which are utilized as the excitation source informations, and a speech synthesizer which operates, in response to a command of an electronic compouter, to select the speech parameters stored in the memory to synthesize a speech. In the novel audio response apparatus when the speech synthesizer is utilized in multiplex on the time division basis it is possible to simultaneously synthesize a plurality of different speeches and to simultaneously send out them to respective output channels.

An improved audio response apparatus capable of sending out a plurality of different speech to a number of output channels at the same time will be described hereunder.

There are many types of memory devices which can store speech parameters such as magnetic core type, magnetic drum type and magnetic disc type and so forth. Where it is desired to store several thousands of words, inexpensive and large capacity magnetic drum or magnetic disc type memories are preferred. For this reason in the following two embodiments of the audio response apparatus, a magnetic drum type memories are used to store speech parameters whereby to simultaneously give answers to 64 output channels.

FIG. 7 shows the connection diagram of one of such embodiments in which each speech unit or the speech information of a word is recorded on a magnetic drum in the form of a speech parameter and in a sequence such that the speech parameters of a plurality of words are read out on the time division basis.

FIG. 8 shows a typical arrangement of respective speech parameters on the magnetic drum. With reference first to this arrangement, a set of speech parameters are recorded on each block 73 shown in FIG. 8. Each block 73 comprises bits of the number required for recording a set of speech parameters. Left hand numerals in the blocks designate the word numbers (speech unit numbers) whereas right hand numerals their frame numbers. Taking a word "1" for example, respective speech parameters which have been extracted at a frame period of 15 milliseconds are recorded in separate blocks "1, 1"; " 1, 2"; . . . at every 15 milliseconds so that assuming duration of a word of L seconds, the last speech parameter thereof will be recorded in a block "1, N" spaced apart from the block "1, 1" by L seconds. As shown in FIG. 8 there is a relation N=L/15.times. 10.sup.3. Respective speech parameters of word "2 " are recorded in blocks "2, 1" ; " 2, 2" ; . . . " 2, N" of the same cyclic store arrangement on the same magnetic drum, these blocks being displaced by one block from the blocks for storing the word "1." Word up to a word "M" are recorded in the same manner. By the same way, respective speech parameters of a plurality of another words are recorded in the other cyclic store arrangement of the magnetic drum. The number of words M that can be recorded in multiplex in the same cyclic store arrangement of the magnetic drum in the manner as above described is limited by the frame period of 15 milliseconds and the bit rate of the magnetic drum.

In the following description, use is made of a magnetic drum for recording speech parameters, having period of rotation of 20 milliseconds, bit rate of 2048 KHz, a bit number per one track of 40960 bits and a number of tracks of 800. It is further assumed that each block in the cyclic store arrangements contains 64 bits. (Although the size of the block 73 may be 55 bits which is equal to the magnitude of one set of speech parameters, 64 bits are selected for the purpose of description). In such a case the number of M amounts to 480 and if a word length of about 2 seconds were assumed, the number of N would be about 133. In the case of the word length of about 2 second, it is impossible to record in a single track all speech parameters which constitute cyclic store arrangement shown in FIG. 8. Accordingly in such a case tracks are sequentially switched at each revolution of 20 milliseconds of the drum whereby to form a long cyclic store arrangement as shown in FIG. 8 with a plurality of tracks. In other words, in this case the speech parameters of a word of duration of 2 seconds are recorded in 100 tracks which are switched sequentially. More strictly, in order to sequentially switch the tracks of a magnetic drum of a rotary period of 20 milliseconds for recording at every 15 milliseconds, and to assure a cyclic store arrangement to be perfectly cyclic, the duration of the word should be a common multiple of 20 milliseconds and 15 milliseconds. For this reason, in the following description, it is assumed that a word of duration of 1.98 seconds is to be recorded on 99 tracks which are switched sequentially. In this case the number of N shown in FIG. 8 equals 132. If the number of tracks equals 800, 8 cyclic store arrangements (FIG. 8) can be formed. As above described as the number of words M recorded in multiplex in one cyclic store arrangement (including 99 tracks) equals 480 words it is possible to record speech parameters of the words of the total number of 480 .times. 8 = 3840 in eight cyclic store arrangements.

Speech parameters of each words are cyclically read out from left to right as viewed in FIG. 8 by means of reproducing circuits, one for each cyclic store arrangement. More particularly, with reference to cyclic store arrangement 1, speech parameters of the first set comprising words "1" , " 2" . . . " 480" will appear sequentially in the reproducing circuit within one frame period, that is 15 milliseconds. Thereafter, the speech parameters of the second set comprising words "1," " 2" . . . " 480" will appear sequentially. In the same manner, successive sets of speech parameters are successively reproduced. Thus, in the case of a word length of 1.98 seconds, one cycle of operation is completed when speech parameters of the words of the 132th set appear.

The embodiment shown in FIG. 7 comprises a magnetic drum for recording respective speech parameters of respective words in cyclic store arrangement shown in FIG. 8, and track selection matrix (61-1) . . . (61-8) to switch the tracks on the magnetic drum storage 60 at each revolution thereof for forming 8 cyclic store arrangements of the period of 1.98 seconds each. Each of the track selection matrix is provided for 99 tracks and the outputs from the track selection matrixs are supplied to serial-parallel converters (63-1) . . . (63-8) respectively through read amplifiers (each including an appropriate pulse shaping circuit) (62-1) . . . (62-8). Successively read out speech parameters are converted into a plurality of sets of parallel signals (comprised by 55 bits) by the action of the serial-parallel converters (63-1) . . . (63-8) and are then written in parameter buffer memories (64-1) . . . (64-8) capable of storing one set (55 bits) of speech parameters per each words in the respective cyclic store arrangement. Each of the parameter buffer memories includes a read-write control circuit and generally comprises two planes for simultaneously writing from one side and reading out from the other. The speech parameters selectively read out from the parameter buffer memories are then supplied to the aforementioned digital speech synthesizer 65. Speech signals supplied by the digital synthesizer 65 in the form of PCM are written in an output buffer memory 66 provided for each output channel to store during one frame period (15 milliseconds). Similar to the parameter buffer memories (64-1) . . . (64-8), the output buffer memory has two planes as well as a read-write control circuit. The output buffer memory 66 provides for a D-A converter 67 PCM codes of one sample corresponding to each output channel for converting these PCM codes into PAM signals. The output from the D-A converter 67 is supplied to low pass filters (69-1), (69-2) . . . (69-64) through PAM gates (68-1), (68-2) . . . (68-64), one for each output channel, to be converted to a continuous speech wave. There is also provided an input control 71 which is connected to the electronic computer to receive informations representing the word numbers of the words to be sent to each output channel. In order to control on the time division basis the flow of the signal from the parameter buffer memories (64-1) . . . (64-8) to PAM gates (68-1), (68-2) . . . (68-64) for each output channel, there is provided a sequence control 72. Further a word synchronizer 70 is provided for providing a request for transfer for the electronic compouter and for designating the write address in parameter buffer memories (64-1) . . . (64-8). Although in addition to the word synchronizer 70 it is necessary to provide a magnetic drum read-write control, but in FIG. 7 it is not shown.

The magnetic drum 60, track selection matrix (61-1) . . . (61-8), read amplifiers (62-1) . . . (62-8) and serial-parallel converters (63-1) . . . (63-8) shown in FIG. 7 may be conventional ones commonly used in digital electronic computers. Further, the parameter buffer memories (64-1) . . . (64-8) and the output buffer memory 66 may be magnetic core memories which are widely used in ordinary electronic computers as the main memories. Furthermore, the D-A converter 67, PAM gates (68-1), (68-2) . . . (68-64) and low pass filters (69-1), (69-2) . . . (69-64) may also be conventional ones commonly used in PCM transmission systems.

The details of the word synchronizer 70, the input control 71 and the sequence control 72 are as follows.

FIG. 9 shows one example of the construction of the word synchronizer 70. Two input signals TIMING and MARK shown on the lefthand side of FIG. 9 represent control signals that have been recorded on particular tracks of the magnetic drum strage 60. The time chart of these control signals is shown by FIG. 10. As shown the signal TIMING is generated at each complete revolution of the magnetic drum whereas the signal MARK marks the block 73 corresponding to one set of the speech parameter shown in FIG. 8. In the example shown in FIG. 8, each block includes 64 bits, and a set of the speech parameters (55 bits) is recorded in one block. While another signal CLOCK is also shown in FIG. 10, this signal represents the bit position on the track of the magnetic drum, and in this example the signal is a pulse sequence having a frequency of 2048 KHz. As above described since it is necessary to successively switch the tracks to read the records thereon for each revolution of the magnetic drum, in the circuit shown in FIG. 9, the TIMING signals are counted by a 99 step counter 75 for decording them so as to select a track to be read. The output from the decorder 74 is supplied in parallel to respective track switching circuits (61-1) . . . (61-8). An overflow signal 78 provided by counter 75 means that the period of 1.98 seconds has elapsed, so that this overflow signal 78 is used to send a transfer request signal to the electronic computer. In response to this signal the electronic computer beings to transfer the designated informations of words to be sent out on respective output channels. On the other hand, MARK pulses are counted for the purpose of indicating addresses to write speech parameters which are successively read out from the magnetic drum into respective parameter butter memories. As above described, in the example shown in FIG. 8, since the value of M equals 480, the MARK pulses are counted by a 480 step counter 76 and the resulted counted values are used to indicate write addresses of respective parameter buffer memories. Further, each parameter buffer memory has two planes it is necessary to determine a plane to be written. For this reason a flip-flop circuit 77 is provided to receive the overflow signal 79 from the 480 step counter 76. The flip-flop circuit 77 reverses the polarity of its output each time said counter 76 counts up 480 MARK pulses in 15 milliseconds to indicate that to which plane the information should be written in.

One example of the construction of the sequence control 72 is illustrated in FIG. 11 while the time relationships between various signals are shown in FIG. 12. The sequence control 72 is operated by the clock signal of a frequency of 2048 KHz of the magnetic drum. The clock signal is converted into a signal 87 of a frequency of 512 KHz by means of a 4 step counter 80, and the signal 87 is supplied to a counting circuit including cascade connected 64 step counter 84 and a 120 step counter 85, the contents of these counters indicating the address of the output buffer memory 66 to be read at that time. The address is sent to the output buffer memory 66 to read the content corresponding to the address and the read out content is converted into an analogue signal by means of A-D converter 67. At the same time decorder 86 operates to decode the output from a 64 step counter 84 to produce gate signals (G-1), (G-2), . . . (G-64) for opening PAM gates (68-1), (68-2) . . . (68-8) in the output channels with the time relationships as shown in FIG. 12. In this manner the signal which has been read out from the output buffer memory 66 and converted to analogue form by the D-A converter 67 is sent to the output channel designated by counter 84. The output signal 87 from 4 step counter 80 is also supplied to a counting circuit comprised by cascade connected 120 step counter 81 and 64 step counter 82. The contents of these counters indicate the address in which the PCM code synthesized at this time by synthesizer 65 is to be written in the output buffer memory 66. As shown in FIG. 12, the overflow signal 88 of the 120 step counter 81 is generated at every 234 microseconds and supplied to the following 64 step counter 82. Signal 88 is also used to start input control 71. The overflow signal 89 from 64 step counter 82, which is generated at every 120 .times. 64 / 512 KHz = 15 milliseconds, is supplied to the flip-flop circuit 83. The binary output from this flip-flop circuit indicates which one of two planes of the output buffer memory 66 should be written or read out. The output signal 87 from 4 step counter 80 is sent to synthesizer 65 for the purpose of operating it in synchronism with the writing and read out operations of the output buffer memory 66.

FIG. 13 shows one example of the construction of the input control 71. When a transfer request is sent to the electronic computer by the signal 78 from word synchronizer 70, informations designating the word numbers of words to be sent to respective output channels CH-1, CH-2 . . . CH-64 are transferred from the electronic computer and these informations are temporally stored in registers (93-1), (93-2) . . . (93-64), respectively, corresponding to respective output channels. After elapse of the word length, 1.98 seconds, the word synchronizer 70 sends a request signal to the electronic computer as above described, but signal 78 is also supplied to gates (92-1), (92-2) . . . (92-64) of the input control 71 as the gate signal to transfer the contents of registers (93-1), (93-2) . . . (93-64) into registers (91-1), (91-2) . . . (91-64) respectively. As above described, the sequence control 72 provides a start signal 88 to the input control 71 at every 234 microseconds which is counted by the 64 step counter 95. The content of counter 95 is decoded by decoder 94 to produce gate signals (96-1), (96-2) . . . (96-64) for gate circuits (90-1), (90-2) . . . (90-64) respectively. By the action of these gate signals, the contents of registers (91-1), (91-2) . . . (91-64) are transferred as read addresses successively to the parameter buffer memories at an interval of 234 microseconds to read the same. Assuming a word length of 1.98 seconds, when the contents of respective registers (91-1), (91-2) . . . (91-64) are sent 132 times (132 frames) to the parameter buffer memory, the informations for designating the next word, which have been transferred from the electronic computer and are being stored in respective registers (93-1), (93-2) . . . (93-64) are transferred to registers (91-1), (91-2) . . . (91-64), respectively by the signal 78 generated by the word synchronizer 70 at that time. Above described operations are repeated in synchronism with the duration of the words.

Referring again to FIG. 7, the speech parameters read out from the magnetic drum 60 and converted into parallel signals in each set are written in the addresses of respective parameter buffer memories (64-1) . . . (64-8) corresponding to eight cyclic store arrangements to each set. Accordingly, each address in these memories (64-1) . . . (64-8) includes 55 bits for one set of speech parameters. Of course the above described operation is performed in parallel for eight cyclic store arrangements so that one set of speech parameters regarding one word out of 3,840 words is written in parameter buffer memories (64-1). . . (64-8) in the manner described above. This writing operation into the parameter buffer memories is completed with one frame period of 15 milliseconds. Then a read out cycle begins for the parameter buffer memories (64-1) . . . (64-8) for each output channel. During the proceeding of this read out cycle, the speech parameters for the next frame period read out from the magnetic drum 60 are written in the other plane of parameter buffer memories (64-1) . . . (64-8), respectively, having two planes as above described. During the read out cycle, the contents of registers (91-1), (91-2) . . . (91-8) of the input control 71 are transferred to parameter buffer memories according to the order of the output channels under the control of signal 88 from sequence control 72 to read the contents (speech parameters of one set) of the addresses of parameter buffer memories (64-1) . . . (64-8) and the read out contents are sent to the speech synthesizer. As described above in detail, by receiption of the read out contents, the synthesizer 65 operates to synthesize PCM speech codes, for example 120 PAM sameples, which should be produced in one frame period. These synthesized codes are successively stored in addresses of the output buffer memory 66, said addresses being indicated by 120 step counter 81, 64 step counter 82 and flip-flop 83 of said channel sequence control 72. Each address of the output buffer 66 comprises, for example, 8 bits enough to store one set of PCM speech codes. This operation is performed in a period corresponding to 1/64 of one frame (15 milliseconds) or 234 microseconds. As a result, during one frame period, this operation is performed for 64 output channels on the time division basis. Thus, 120 PCM samples for each output channel are written in the addresses corresponding to respective output channels of the output buffer memories, during one frame period, or 15 milliseconds. The contents of the output buffer memory 66 are read out in the time division basis in synchronism with gate signals (G-1), (G-2) . . . (G-64) of PAM gates (68-1), (68-2) . . . (68-64) of respective output channels, under the control of the sequence control 72. Read out signals are converted into PAM signals by D-A converter 67 which are supplied to output channels as a continuous speech wave through corresponding one of low pass filters (69-1), (69-2) . . . (69-64).

A series of operations described above are repeated with the frame period of 15 milliseconds to provide a speech wave of the duration of the words for respective output channels. The word numbers of the words to be treated next time has already been transferred from the electronic computer to the registers of (93-1), (93-2) . . . (93-64) of the input control 71 by the transfer request signal 78 from the word synchronizer 70 before commencement of the treatment of the next words. By repeating these operations with a period of duration of the word (1.98 seconds, for example), compiled audio messages are sent to respective output channels CH-1, CH-2 . . . CH-64.

Although in the above described example of the audio response apparatus, a magnetic drum was used as the memory for speech parameters, it will be clear that in any other type of memory may be used so long as it can record the speech parameters in the form of cyclic store arrangement.

Another embodiment of the audio response apparatus utilizing a magnetic drum as the memory for storing speech parameters will be described hereunder. Different from the first embodiment in which the speech parameters extracted from respective words were recorded with intervals on the tracks of the magnetic drum, in this modification these speech parameters are recorded continuously, starting from a particular address. More particularly, each word is recorded continuously without any overlap in 132 blocks in the case where the duration of each word is 1.98 seconds, for example, starting from the first address of the drum which is predetermined for each word. When the word numbers of the words to be sent to output channels are transferred from the electronic computer, speech parameters (consisting of 132 sets, each) for the designated words for the output channels are read out from the magnetic drum and are stored in the parameter buffer memory. Thereafter, just in the same manner as in the first embodiment the speech signal is synthesized for each one frame, stored in the output buffer memory, and is sent to the output channel as a continuous speech signal for the designated word through the D-A converter, the PAM gate and the low pass filter.

FIG. 14 shows another embodiment comprising a magnetic drum for storing sets of speech parameters for respective words, a parameter buffer memory for temporarily storing the speech parameters of the word selectively read out from the magnetic drum 60, an input control 100 for storing informations sent from the electronic computer to designate the word number and for sending the read out address to the parameter buffer memory 98 at each definite time, a magnetic drum control response to the command from the input control 100 for reading the contents of magnetic drum 60 to write them in the parameter buffer memory 98. The modification further comprises a synthesizer 65 for synthesizing a speech (120 samples) of one frame (15 milliseconds) from one set of speech parameters read out from the parameter buffer memory, said synthesizer including digital filters for synthesizing the speech signal on the time division basis for each output channel, an output buffer memory for temporally storing a group of PCM codes corresponding the speech signal synthesized by the synthesizer 65, a D-A converter 67 for converting digital codes read out from the output buffer memory into analogue signals, PAM gates (68-1), (68-2) . . . (68-64) for distributing analogue signals from D-A converter 67 among respective output channels CH-1, CH-2 . . . CH-64, low pass filters (69-1), (69-2) . . . (69-64) for converting the outputs from respective PAM gates into a continuous waveform and a sequence control 99 for controlling various component parts described above.

Of these component parts, magnetic drum 60, parameter buffer memory 98, output buffer memory 66, D-A converter 67, PAM gates (68-1) . . . (68-64) and low pass filters (69-1), (69-2) . . . (69-64) are also conventional ones widely used in electronic computers and PCM transmission systems. Magnetic drum control 97 is substantially identical to a conventional magnetic drum channel device. In the conventional computer, in order to read the magnetic drum by means of a magnetic drum channel device and to store the read out information in the main memory (corresponding to the parameter buffer memory 98 shown in FIG. 14) it is necessary to provide some means to give the address to the magnetic drum channel device for reading the drum, number of words and write address of the main memory, but with the magnetic drum control 97 shown in FIG. 14, the number of words to be read is constant (132 words for the speech unit of length of 1.98 seconds) which is determined by the duration of the speech unit and the write address of the parameter buffer memory varies regularly so that it is not necessary to designate these values by the input control 100. The sequence control 99 is substantially identical to the sequence control 72 of the first embodiment except that it is controlled by independent clock signals (in other words not synchronized with the revolution of the magnetic drum.)

FIG. 15 shows the detail of the input control 100. Informations sent from the electronic computer for designating the words to be sent to respective output channels are stored in registers (104-1), (104-2) . . . (104-64) corresponding to respective output channels CH-1, CH-2 . . . CH-64. These informations are transferred to registers (102-1), (102-2) . . . (102-64) through gates (103-1), (103-2) . . . (103-64) operated by the overflow signal 111 (this signal also acts as the transfer request signal for the electronic computer) generated by a 132 step counter 108 at a period of the duration of the word. These inforamtions are successively transferred to the magnetic drum control 97 in the order of registers (102-1), (102-2) . . . (102-64), thus reading the magnetic drum. The input control start signals 88 sent from the sequence controller 99 at an interval of 234 microseconds are counted by 132 step counter 105 and 64 step counter 106. The content of the 64 step counter 106 is decoded by a decoder 112 to produce gate signals (110-1), (110-2) . . . (110-64) for opening gates (101-1), (101-2) . . . (101-64) at an interval of about 30 milliseconds whereby to successively send the contents of registers (102-1), (102-2) . . . (102-64) to the magnetic drum control 97. All parameters of a word designated by the contents of registers (102-1), (102-2) . . . (102-64) are required to be read out within the duration of the word (1.98 seconds) and stored in the parameter buffer memory 98. However, since a magnetic drum generally has a relatively long access time so that it takes a maximum of about 25 milliseconds for giving the information designating the word from the input control 100 to the magnetic control 97 and for reading all parameters of the words to store them in the parameter buffer memory 98. Since, in this case gates (101-1), (101-2) . . . (101-64) are opened at an interval of 30 milliseconds there is sufficient time to read the magnetic drum 60.

Input control start signals 88 are also counted by 64 step counter 107 and 132 step counter 108, and the contents of these counters are sent to parameter buffer memory 98 as an address thereof to be read at this time. Since the writing operation of the speech parameters from the magnetic drum 60 and the reading operation of the content of the address designating the input control are performed in parallel, the parameter buffer memory 98 is provided with two planes, as in the first embodiment. To select either one of these planes there is provided a flip-flop 109 which reverses the polarity of the output in response to the overflow signal 111 from the 132 step counter 108.

During the period in which the magnetic drum 60 is read out by the contents of the registers (102-1), (102-2) . . . (102-64) informations transferred from the electronic computer for designating the next words are received and stored in registers (104-1), (104-2) . . . (104-64). In this manner, informations for designating words are successively received from the electronic computer to send different audio messages designated thereby to respective output channels. It is of course possible to substitute a magnetic disc storage for the magnetic drum to store the speech parameters.

As above described, according to the novel audio response apparatus, speech signals are recorded as compressed information by using partial autocorrelation coefficients as parameters so that it is possible to economically accomodate and read out a great many words. In addition, since a digital speech synthesizer is used, one single synthesizer can be used in common for many output channels, 64 for example, on the time division basis, which is extremely economical.

It is to be understood that the invention is by no means limitted to particular embodiments illustrated but many changes and alternations may be made within the spirit and scope of the invention as defined in the appended claims.

* * * * *