Audio message generating apparatus used for query-reply system Patent Grant Nakata , et al. December 23, 1 [Hitachi, Ltd.]

Audio message generating apparatus used for query-reply system

Nakata , et al. December 23, 1

Patent Grant 3928722

U.S. patent number 3,928,722 [Application Number 05/379,230] was granted by the patent office on 1975-12-23 for audio message generating apparatus used for query-reply system. This patent grant is currently assigned to Hitachi, Ltd.. Invention is credited to Akira Ichikawa, Kazuo Nakata.

United States Patent	3,928,722
Nakata , et al.	December 23, 1975

Audio message generating apparatus used for query-reply system

Abstract

An apparatus for generating the audio message used for a query and reply system. This apparatus is intended especially to provide the audio reply message composed of a fixed word and a variable word by using a relatively small capacity memory device and comprises a magnetic drum in which a sample of the audio waveform of the fixed word and the control signals specifying the fixed word are recorded, a high speed read-out core memory device in which the pitch pattern informations of the variable word are recorded, and a core memory device in which speech segment information constituting the variable word are recorded, the sample signals of the magnetic drum being read out sequentially by the control signal making the voice or sound of the reply sentence made by the signal processing device to produce an audio message corresponding to the fixed word, the variable word detected by said magnetic drum being turned into a pitch-controlled sound or voice by relying upon the information recorded in said two core memories, the variable word being introduced into the audio message composed of the fixed word.

Inventors:	Nakata; Kazuo (Kokubunji, JA), Ichikawa; Akira (Kokubunji, JA)
Assignee:	Hitachi, Ltd. (JA)
Family ID:	23496366
Appl. No.:	05/379,230
Filed:	July 16, 1973

Current U.S. Class:	340/10.31; 704/268; 340/10.42; 340/10.5; 360/12; 704/E13.002
Current CPC Class:	G10L 13/02 (20130101)
Current International Class:	G10L 13/00 (20060101); G10L 13/02 (20060101); H04Q 003/00 ()
Field of Search:	;179/1.2MD,1SM,1SA,1SB,1VS ;340/152R,172.5

References Cited [Referenced By]

U.S. Patent Documents


3314051	April 1967	Willcox et al.
3466394	September 1969	French
3676595	July 1972	Dolansky et al.
3745264	July 1973	Emerson et al.
3749849	July 1973	Kolpek et al.

Primary Examiner: Shaw; Gareth D.
Assistant Examiner: Sachs; Michael C.
Attorney, Agent or Firm: Craig & Antonelli

Claims

What is claimed is:

1. An apparatus for generating a replying voice signal composed of at least one variable word and a plurality of fixed words forming a reply sentence comprising:

a. first memory means for recording a plurality of reply sentences each in the form of word signals corresponding to the voice waveform of each of the fixed words of the reply sentence and control signals having first and second values to designate the position of a fixed or variable word in an output reply sentence;

b. data processing means responsive to a control input for initiating reading out of a selected reply sentence from said first memory means, including means for converting said reply sentence signals into audio signals;

c. pitch pattern information memory means for recording pitch patterns of each variable word;

d. voice segment memory means for recording a series of voice segment waveforms associated with each variable word; and

e. control circuit means responsive to said control signals from said first memory means designating each variable word for reading out pitch patterns from said pitch pattern information memory means and voice segment waveforms from said voice segment memory means, including means for introducing the voice signal formed of the pitch pattern information and voice segment waveforms of the variable word amongst the word signals of the reply sentence composed of the fixed words applied to said converting means of said data processing means.

2. Apparatus according to claim 1 wherein the first memory means is comprised of a magnetic drum, and said pitch pattern information memory means and voice segment memory means are core memories capable of random access.

3. Apparatus according to claim 2 wherein said control circuit means includes a wave decoder connected to said data processing means to detect whether a fixed word or a variable word is to be selected and first gating means connected to said wave decoder to inhibit the transmission of fixed words from said magnetic drum when a variable word is designated.

4. Apparatus according to claim 3 wherein said control circuit means includes a buffer memory connected to said data processing means to store the control signals associated with a designated variable word and an address decoder connected to said buffer memory to decode the address of selected voice segment waveforms in said voice segment memory means.

5. Apparatus according to claim 4 wherein a first address counter is connected to the output of said address decoder and a readout control circuit is connected to said first address counter and said voice segment memory means to effect read-out of said selected voice segment waveforms.

6. Apparatus according to claim 5 wherein a pitch pattern decoder is connected to said buffer memory to receive selected pitch designations and a second address counter is connected to said pitch pattern decoder and said pitch pattern information memory means for controlling the read-out of stored pitch pattern information.

7. Apparatus according to claim 6 wherein said pitch pattern memory means includes a second read-out control circuit controlled by said second address counter, and further including a pitch counter responsive to the output of said second read-out control circuit for controlling said first address counter.

Description

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to an improvement in voice answer apparatus, especially in audio message generating apparatus adapted to produce a speech (voice) message corresponding to a reply sentence by combining prerecorded voice sound waves.

2. Description of the Prior Art

The system is now under investigation wherein a subscriber calling up and making an inquiry of a data processing center provided with digital computers and similar equipment from a distant area may be furnished with information corresponding to such inquiry on the basis of the information recorded in said computers.

Especially, the voice or audio answer apparatus adapted to provide information corresponding to the inquiry in the form of an audio message is also under investigation.

However, the device for producing an audible sound or voice now in use in the voice or audio answer apparatus is based on a so-called recorded word compiling system wherein a number of words or phrases are recorded in the form of human voice indications and these words or phrases are read out and combined together by relying upon the read-out and control signals supplied from the signal processing device. The recorded word compiling system has the drawback that a larger number of different words can be produced only with considerable difficulty because of the large volume of necessary information which must be stored and that a perfect random access memory, such as a magnetic core, cannot be used because of the very large number of memory elements involved. As a result, a partially sequential access memory, such as a drum, must be used; and therefore, the reply becomes rather nonspontaneous, because of the prolonged access time per each word unit.

The large volume of the necessary contents may be attributable not only to the direct memorization voice or sound waveforms, but to such factors that (1) as far as the variable word (a word changing with the contents of the reply, for instance, date, amount, the name of the company, address and so forth) is concerned, a sound for a single digit figure, for instance, must be recorded in a plurality of different pitches, that is, in the raising pitch, flat pitch and falling pitch, since the intonation of the variable word is different according to the position of the variable word in the reply sentence, and (2) the same word must be doubly and/or trebly recorded at the different positions on a recording drum for shortening the access time to the variable word.

Where the sound or voice waveform is directly recorded by word units, since the pitch cannot be controlled because the unique pitch pattern is limited to the time during which the voice or sound is generated, it is impossible to reduce the information which must be generated from the variable word.

SUMMARY OF THE INVENTION

Accordingly, the principal object of the present invention is to provide an economical and practical audio message generating apparatus.

Another object of the invention is to provide, in the audio (voice or sound) replying apparatus adapted to generate a reply voice or sound composed of fixed words and variable words, an audio message generating apparatus capable of generating a voice or sound with natural quality for the reply sentence and having an abundant vocabulary in spite of its simpler construction.

A further object of the invention is to provide an audio message generating apparatus adapted for generating a large number of voices or sounds having different contents.

The apparatus for generating reply voices or sounds composed of a fixed word and variable words according to the present invention is so devised and arranged that the vocabulary corresponding to the fixed word is recorded in a low (slow) speed read out memory, while that corresponding to the variable words is recorded in a high speed memory by random access as speech elements or segments each having a pitch length substantially equal to that of the voice or sound of the variable word, and that, at the time of reading out the voice or sound or at the time of speech synthesis, when the position of the variable words in the reply voice or sound is read out sequentially from the low (slow) speed memory, a series of the speech elements or segments are read out from the high speed read out memory and are interposed between the voices or sounds of fixed words (speech) which are being read out from the low (slow) speed memory.

Therefore, the speech synthesis part of the audio speech (message) generating apparatus according to the present invention is composed of the following units.

A sequential memory (for instance, a magnetic drum) for recording voices or sounds corresponding to the fixed words in a reply speech sentence, a voice or sound (speech) segment memory (for instance, a core memory) capable of high-speed read-out and in which the speech elements or segments of voices or sounds (speeches) corresponding to the variable words, with the length of the elements equal to the pitch of the voice or sound, are recorded, a pitch information memory in which the pitch patterns of the variable words in the reply speech sentence are recorded, a control circuit to make the selective changeover between the read-out from the low (slow) speed memory and that from the high speed memory by relying upon a control signal from the signal processing circuit, such as a digital computer, and a circuit for combining the voice or sound (speech) signals read out from the above two memories and producing the voice or sound (speech) by converting these combined signals.

With the above-mentioned construction of the present invention, it is possible to produce voices or sounds (speeches) by compilation and synthesis of the pitch-unit elements or segments for the variable words while effectively utilizing the low (slow) speed memory.

On that account, it is possible to reduce memory volume necessary for generating the variable words and to increase the vocabulary in case the memory volume is equal. Moreover, by reason of the compilation and synthesis of the pitch-unit elements, the necessary memory volume can be comparatively reduced and the use of the high-speed memory for random access as the memory medium is technically possible. Accordingly, with the apparatus of the present invention, not only the low (slow) speed memory can be effectively used and the kinds of reply sentences can be increased but an access time at the time of reading out of the variable words can be sufficiently reduced and with improved spontaneous voice tone of the reply sentence.

These and other objects and characteristics of the present invention will become more apparent by referring to the following description and the annexed drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram showing the construction of the query and reply system using the audio message generating apparatus of the present invention;

FIG. 2 is a schematic block diagram showing the construction of a preferred embodiment of the audio message generating apparatus according to the present invention;

FIGS. 3a through 3c provide views showing the status of bit signals recorded on one track of a magnetic drum; and

FIG. 4 is a waveform diagram showing schematically the audio waveform of a consonant sound part and a vowel sound part.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIG. 1 is a schematic block diagram of the voice sound query-reply system using the audio message generating apparatus of the present invention. Although this system is known per se, a simple explanation thereof will be provided to facilitate an understanding of the present invention.

A signal processing apparatus 3 is provided containing an electronic computer and an information storage section 4. This processing apparatus supplies a digital output based on the information stored in the storage section 4 in response to data requests from outside. An audio message generating apparatus 2 is provided for converting the digital output from apparatus 3 into a reply sentence consisting of the voices or sounds. An input and output conversion section 1 is provided for distributing the reply voice or sound generated by the apparatus 2 to the inquirer and connecting the questions of the inquirer to the above signal processing section 3. Subscribers 5 are located at remote places, namely the channel of the inquirer, which is connected by a telephone.

The above system provides an effective means to be used for the inquiry into up-to-date stock information, various reservation, banking information services and so forth, wherein the variable words are changeable words, depending on the contents of the reply, such as, for example, the date, an amount, the name of the company, address, etc., as defined above, while the fixed words are the unchanging words of the reply sentence, according to the nature of the inquiry.

In FIG. 2, showing the block diagram of one embodiment in accordance with this invention, a slow access memory device 7 is provided, such as a magnetic drum with multiple recordings of the voices and sounds of a plurality of reply sentences. The voices and sounds of the reply sentences are arranged in the form of a sample, except for the variable word portion, and the amplitude signals arranged in the form of a sample are encoded and recorded by time sharing multiple recording. This system can be used for producing a number of reply sentences consisting of voices or sounds at a time.

Next, the details of the magnetic drum will be explained. The drum rotates at 3,000 r.p.m. (makes a full revolution in 20 m sec) and has 512 tracks divided into 16 zones to shorten the read out time. Thus, each zone has 32 tracks. Thus, there are 16 read-out means, such as magnetic heads, 1-1, 1-2, 1-3, . . . 1-16. The effective clock frequency is 1.92 MHz. It is assumed that 1.92 .times. 10.sup.6 .times. 20 .times. 10.sup.-.sup.3 = 38.4 .times. 10.sup.3 bits are provided for each track, where the audio waveform is sampled at 8 kHz, and one sample is coded by 7 bits and recorded with control information of one bit.

The time sharing multiplexing degree on one track is 1.92 .times. 10.sup.6 /8 .times. 8 .times. 10.sup.3 = 30 (kinds). The longest time interval for one reply sentence is decided by the number of sentence forms; with 30 kinds of sentence forms, it is 20 .times. 10.sup.-.sup.3 .times. 512 = 10 seconds, and with 60 kinds of sentence forms, it is about 5 seconds.

An access time to an arbitrary one of these reply sentences is 2 .times. 10.sup.-.sup.3 .times. 32 = 0.64 second and the number of samples of the same replying sentence on one track is 8 .times. 10.sup.3 .times. 20 .times. 10.sup.-.sup.3 = 160 samples for each track.

FIGS. 3a through 3c show the state of information recorded on one track of the above magnetic drum. FIG. 3a shows the whole of one track, indicating that 160 sample signals (a-1, a-2, . . . a-160) are recorded for one voice or sound. The time needed for read-out of these 160 samples is 20 ms, in which the magnetic drum 7 makes a full revolution. FIG. 3b shows the magnified form of one sample (a-3) of the 160 sections. As stated above, each sample signal of the voices or sounds of reply sentences of thirty kinds is recorded by time sharing multiple recording. The read-out time for all of these sections is equal to the sampling period (frequency) of the voice or sound and amounts to 0.125 ms. FIG. 3c shows the magnified form of one section b-3.4 of the sample of FIG. 3b, indicating one sample signal of one voice or sound.

As aforementioned, one sample is composed of one bit for control information c-0 and seven bits c-1, c-2, . . . c-7 representative of the value for one sample of the audio waveform. This one bit of control information for discriminating whether a signal to be read out next is a control information to read out, a voice (speech) segment or element to be later described or a waveform sample signal of fixed words. For instance, this bit is set to "0" when the waveform sample is to be read out and to "1" when the control information is to be read out. The various known PCM coding forms may be used for producing sample values of the waveform.

When a control code 1 is detected from section C-O, at this time two bits of information to specify the pitch pattern form of the variable words which are to be inserted at the position of the section b-3.4 are detected from the sections c-6 and c-7. A waveform signal to form the variable word is recorded in a separate high speed recording apparatus.

As above mentioned, in the magnetic drum 7, the audio (voice or sound) waveform of the reply sentence except that of variable words, is recorded as a time series of sample values as shown in FIGS. 3a and 3b, and in the position of the variable words of the reply sentence, a control signal showing that it is the variable word is recorded instead of a sample signal.

Returning again to FIG. 2, a pitch pattern memory device 8 is provided in which the pitch patterns of the above variable word are recorded and a voice element memory 9 is provided, for instance a high speed access memory, such as a core memory, in which the voice waveform of the above variable word is divided by pitch units and recorded.

Generally, the waveform of the human voice or sound is, as typically shown in FIG. 4, composed of the part d having no periodicity for consonant sounds and the part e having an approximate periodicity for vowel sounds, each pitch period e.sub.1, e.sub.2, e.sub.3, etc., being determined by the vibrations of the vocal chords, the length of this period determining the high and low values of the voice or sound or intonation.

This period is usually 20-30 ms or thereabouts. The part d of consonant sounds does not affect so much on the high and low of the voice or sound. In the device according to the present invention, the human voice or sound corresponding to the above variable word is divided by pitch lengths, the divided sound elements being processed in a certain mode and arranged into samples. Each element is numbered and recorded in the voice segment memory 9. The processing consists in artificially correcting the waveform of a segment or element cut from natural voice and sound, for example, adding a forcasting waveform to the tail part of the segment waveform or supplying the waveform of the segment to a differentiating unit to give a differentiated voice segment waveform. The voice waveform having no periodicity, such as that of the consonant sounds, is usually partitioned to the average pitch time lengths and the series of sample values are numbered for each section and recorded in the voice segment memory 9.

The length of the voice segment cut from the audio waveform of the variable word of human voice is, in general, shorter than 20 m sec., so that the forecasting waveform is added to the tail of the cut segment to provide an audio segment of 20 m sec., which is subjected to sampling at a rate of 8 kHz to provide 160 sample signals per segment. These sample signals encoded in seven bits in the same way as the waveform in the memory 1 are recorded one after another in the voice segment memory 9. On the other hand, the pitch pattern of the variable word is variable with the position of the variable word in the reply sentence, for instance, the ending of an interrogative sentence.

Therefore, in the pitch pattern memory 8, a plural number of types (in the present embodiment, four types -- such as flat, rising, falling, and figure type) of pitch pattern control information are recorded for each of the variable words recorded in the above audio segment memory. The construction and operation of the apparatus of a unit for making the reply voice and sound by using the above memory apparatus is explained below.

Although, the following explanation is limited to a specified circuit for making one reply voice and sound, it is apparent that the same audio message generating apparatus can be used for a query and reply system having many subscribers by using apparatus arranged in parallel and a time sharing multiple processing.

In FIG. 2, a signal processing section 10 (including computers) is installed at the center, in which section 10 questions (inquiries) are received and the processing is conducted until the reply sentence is selected. This section 10 is not described in detail as it forms no part of the present invention, but such a circuit is known from a number of prior publications, such as U.S. Pat. No. 3,214,520 and others which require slight modification. The control signal for converting the reply sentence selected by this signal processing circuit into voices and sounds is applied to the audio message generating apparatus.

Namely, the control signal corresponding to the fixed word portion of the reply sentence represents the recording place of the above magnetic drum and that of the variable word is given by the pitch pattern form and the recording place in the voice segment memory or address number.

First, the signal for selecting a desired zone corresponding to one reply sentence is applied to one of the selective gate groups. Each gate is respectively connected to a read-out means 11-1, 11-2, . . . 11-22.

When actuated, one of these gates is opened every 0.125 ms to open OR gate 13 and pass 8 bits shown in FIG. 3c. A decoding circuit 14, which may be of a conventional type, as described in the text, Digital Computer and Control Engineering, McGraw-Hill, 1960, pages 547 - 553, for example, by which the foremost bit information (C - 0 in FIG. 3c) is extracted out of the 8 bits at the output of OR gate 13, decides whether it is 0 or 1. When it is 0, i.e., when it is the sample value of the fixed word, a gate drive signal f is supplied to pass bit pulses C.sub.1 - C.sub.7.

Accordingly, unless the C - 0 signal which comes every 0.125 ms is 1, the sample from the magnetic drum passes through the OR gate 16 and is applied to the digital/analog converter 17 and converted into an analog sample waveform to be delivered as the output in the form of a reply voice and sound by the output distributor 18 through one of the output circuit groups (19-1, 19-2, . . . 19-n) including a low pass filter and an amplifier.

Next, when the decoding circuit 14 decides that the C - 0 signal is 1, i.e., detects that not the sample value of the waveform but the variable word is inserted therein, the signal f is set to 0 and gate circuit 15 is turned off for inhibiting the passage of the signal. The signal g which demands that instructions as to the identity of the variable word to be issued is sent to the above signal processing circuit and simultaneously the signal of 2 bits designating the type of the pitch pattern recorded in the C.sub.6, C.sub.7 segments is taken out by the gate circuit 20.

The control information of 10 bits indicating the variable word from the above signal processing circuit 10 and the information of 2 bits indicating the type of the pitch pattern from the above gate 20 are transferred to buffer register 21 with the ten bits occupying the upper rank and the two bits occupying the lower rank.

Therefore, the control information is composed of 10 bits (1,000 words of the variable word) prescribing the variable word and 2 bits designating four kinds of pitch patterns.

The information of these twelve bits are decoded by the decoding circuit 22, which also may be of a type described in the aforementioned text, Digital Computer and Control Engineering, of the pitch information address. This decoding circuit converts the digital signal of said 12 bits into the address signal of the memory in which the information of pitch interval of the leading phonetic segment for the variable word to be read out is stored. This address signal is registered in the address counter 23, and thereafter it is applied to read-out circuit 25 through gate circuit 24.

This read-out circuit 25 has a control circuit such as matrix circuit and an amplifier, and reads out pitch information registered in the address of pitch pattern memory. The pitch information read out is registered in the pitch counter 27 through gate circuit 26.

The contents of the pitch counter 27 are subtracted one by one by the clock signal from clock signal source 28 (8 KHz) during the time interval of the information of pitch interval set in the counter 27. The pitch period of the leading voice segment is thus detected and the read-out of the pitch period of the next audio segment is controlled by stepping of the pitch address counter 23.

The above-mentioned gate circuits 24 and 26 are intended for multiple utilization of informations from high speed pitch information memory apparatus 8 at each circuit and operate so as to be opened only for a predetermined time at the time allotted for that channel and are closed at the time allotted for the other channel. The read-out operation of the above-mentioned pitch period (frequency) information is repeated as long as there is a voice segment of a word indicating the variable word.

On the other hand, the read out of the waveform information of the variable word is performed by the following circuit apparatus and operation. First of all, 10-bit information transferred to buffer register 21 is converted, by voice segment address decoding circuit 29, which again may be of a previously described circuit in the text, Digital Computer and Control Engineering, into the address number of the audio segment memory in which a series of voice segments constituting the variable word are recorded, and the leading address signal is registered in the segment address counter 30.

This counter 30 designates the rank of the leading address of the sample value of the segment to be read out (in this instance, the second and upper digits when the address is indicated by a 160 bit system). Thereafter the contents of sample value address counter 31 (counter showing the order of the first digit of address indicated by 160 bit system) are increased one by one by sample value read out clock 28 (8 kHz) and supplied to voice segment read-out control circuit 33 through gate 32.

The voice segment read-out circuit 33 reads out the sample value (7 bits) of voice segments designated by the sample value address from the above-mentioned counters 30, 31 from audio segment memory 9 sequentially and adds the sample value to OR gate 16 through gate circuit 34.

Meantime, when the end of a pitch period is detected, sample value address counter 31 is reset to 0, and 160 is added to segment address counter section 30 in order to transfer to the leading number of the next segment (20 m sec. in 8 kHz sampling).

The above-mentioned gate circuits 32 and 34 are used for multiple utilization of the high speed audio segment memory 9 in each circuit, in the same way as the above-described gate circuits 24 and 26, and operate so as to be opened only for the constant time period at the time allotted for that circuit and be closed for the time period allotted for the other circuit.

Accordingly, the read-out cycle time for the high speed voice segment memory 9 and high speed pitch information memory 8 required is 1/(8 kHz x number of channels).

This operation is repeated thereater with the same pitch period (frequency) as long as there is a voice segment of the word.

When the last one of the voice segments of a variable word designated from the high speed segment memory 9 is detected and the reproduction of the variable word terminated, the signal of such termination is returned to the signal processing section 10 through line 35 and the sample value of the audio waveform of the following fixed word is read out from low (slow) speed memory 7.

Since the changeover from the fixed word portion to the variable word portion is part of the read out from the high speed random access memory, there is not involved any problem in the access time, but rather it is necessary to provide a fixed pause duration (for instance about 0.3 second), to carry out processing for read out of the audio segment.

The changeover from the variable word portion to the fixed word portion is part of the changeover of the read out from the low speed part sequential memory, and there is provided the longest pause duration (20 ms .times. 32 = 0.64 second).

The pause duration to this extent, is rather necessary, and the problem is that there is occasionally a possibility of changeover occurring with a very short pause. In order to avoid this, a fixed pause duration of at least about 0.3 second is necessary to provide for this changeover, and the safe value for this puase duration is one second at the maximum. In case of complex forestalling control, the changeover can be made with a constant pause time longer than 0.64 second.

A sample signal supplied from the OR gate 16 is converted into a pulse amplitude modulated signal by the above-mentioned digital/analog converter 17 and sent out to the prescribed answer channel through output control circuit 18.

Although the above is an explanation of the audio message generating apparatus having one channel, the number of answer voices and sounds can be simultaneously supplied to a number of circuits by time-sharing multiple processing by common use of the above-mentioned signal processing apparatus 10, memory apparatus 7, 8, and 9.

In the drawing, for other circuits 16-1, 16-2 . . . . 16-16 only corresponding to the above-mentioned 16 are shown. Therefore, when multiple processing is utilized, the output of the digital/analog converter 17 is a pulse amplitude modulated (PAM) signal subjected to a time-sharing multiplexing processing. In the above, the present invention has been described by referring to a preferred embodiment thereof, but it is apparent that the present invention is not limited to such embodiment, but it is subject to a number of modifications in accordance with the practical objects of the answer system to which the audio message generating apparatus of the invention is applied.

The voice segment information can be reduced in order to make memory apparatus 9 small sized, especially a processing circuit for generating the variable word.

It may be often experienced that the very similar waveforms are repeated in a voice waveform of a word with pitch units, the typical instance being those at the middle part of the stationary vowel sound.

These similar pitch waveforms can be replaced by a repetition of the same pitch waveform without deterioration in the quality. In case of a changing pattern, when the pitch waveforms can be used repeatedly at intervals of every pitch or every two pitches, the quality is almost not deteriorated and the capacity of the voice segment memory can thus be reduced by one-half to one-third.

If, in addition, the memory to the high speed segment memory 3 is not made according to the word unit but to the unit suitable to the connection of a phonema chain and/or a dyhone, the control mechanism will become complicated, but the vocabulary of the variable word can be enlarged without limitations.

Taking the phonema chain as an example with six vowel sounds including five vowel sounds and the silent and 20 kinds of consonant sounds including a contracted sound, the total number of the phonema chain is 6 .times. 20 .times. 6 = 720 with the average length of time duration for one unit being 150 m sec. Thus, the necessary memory information volume is 8 .times. 8 .times. 10.sup.3 .times. 150 .times. 10.sup.-.sup.3 .times. 720 = 7.0 .times. 10.sup.8.

As above described, the present invention provides an audio query and reply apparatus free of such defects as shortage in vocabulary inherent in the answer apparatus of the conventional recorded word compilation system and nonspontaneousness in the sound or voice resulting from prolonged access time.

Further, a simple apparatus unified with the recorded word compiling technique can be advantageously used without need for a complex speed synthesis system.

* * * * *