U.S. patent number 3,928,722 [Application Number 05/379,230] was granted by the patent office on 1975-12-23 for audio message generating apparatus used for query-reply system.
This patent grant is currently assigned to Hitachi, Ltd.. Invention is credited to Akira Ichikawa, Kazuo Nakata.
United States Patent |
3,928,722 |
Nakata , et al. |
December 23, 1975 |
Audio message generating apparatus used for query-reply system
Abstract
An apparatus for generating the audio message used for a query
and reply system. This apparatus is intended especially to provide
the audio reply message composed of a fixed word and a variable
word by using a relatively small capacity memory device and
comprises a magnetic drum in which a sample of the audio waveform
of the fixed word and the control signals specifying the fixed word
are recorded, a high speed read-out core memory device in which the
pitch pattern informations of the variable word are recorded, and a
core memory device in which speech segment information constituting
the variable word are recorded, the sample signals of the magnetic
drum being read out sequentially by the control signal making the
voice or sound of the reply sentence made by the signal processing
device to produce an audio message corresponding to the fixed word,
the variable word detected by said magnetic drum being turned into
a pitch-controlled sound or voice by relying upon the information
recorded in said two core memories, the variable word being
introduced into the audio message composed of the fixed word.
Inventors: |
Nakata; Kazuo (Kokubunji,
JA), Ichikawa; Akira (Kokubunji, JA) |
Assignee: |
Hitachi, Ltd.
(JA)
|
Family
ID: |
23496366 |
Appl.
No.: |
05/379,230 |
Filed: |
July 16, 1973 |
Current U.S.
Class: |
340/10.31;
704/268; 340/10.42; 340/10.5; 360/12; 704/E13.002 |
Current CPC
Class: |
G10L
13/02 (20130101) |
Current International
Class: |
G10L
13/00 (20060101); G10L 13/02 (20060101); H04Q
003/00 () |
Field of
Search: |
;179/1.2MD,1SM,1SA,1SB,1VS ;340/152R,172.5 |
References Cited
[Referenced By]
U.S. Patent Documents
Primary Examiner: Shaw; Gareth D.
Assistant Examiner: Sachs; Michael C.
Attorney, Agent or Firm: Craig & Antonelli
Claims
What is claimed is:
1. An apparatus for generating a replying voice signal composed of
at least one variable word and a plurality of fixed words forming a
reply sentence comprising:
a. first memory means for recording a plurality of reply sentences
each in the form of word signals corresponding to the voice
waveform of each of the fixed words of the reply sentence and
control signals having first and second values to designate the
position of a fixed or variable word in an output reply
sentence;
b. data processing means responsive to a control input for
initiating reading out of a selected reply sentence from said first
memory means, including means for converting said reply sentence
signals into audio signals;
c. pitch pattern information memory means for recording pitch
patterns of each variable word;
d. voice segment memory means for recording a series of voice
segment waveforms associated with each variable word; and
e. control circuit means responsive to said control signals from
said first memory means designating each variable word for reading
out pitch patterns from said pitch pattern information memory means
and voice segment waveforms from said voice segment memory means,
including means for introducing the voice signal formed of the
pitch pattern information and voice segment waveforms of the
variable word amongst the word signals of the reply sentence
composed of the fixed words applied to said converting means of
said data processing means.
2. Apparatus according to claim 1 wherein the first memory means is
comprised of a magnetic drum, and said pitch pattern information
memory means and voice segment memory means are core memories
capable of random access.
3. Apparatus according to claim 2 wherein said control circuit
means includes a wave decoder connected to said data processing
means to detect whether a fixed word or a variable word is to be
selected and first gating means connected to said wave decoder to
inhibit the transmission of fixed words from said magnetic drum
when a variable word is designated.
4. Apparatus according to claim 3 wherein said control circuit
means includes a buffer memory connected to said data processing
means to store the control signals associated with a designated
variable word and an address decoder connected to said buffer
memory to decode the address of selected voice segment waveforms in
said voice segment memory means.
5. Apparatus according to claim 4 wherein a first address counter
is connected to the output of said address decoder and a readout
control circuit is connected to said first address counter and said
voice segment memory means to effect read-out of said selected
voice segment waveforms.
6. Apparatus according to claim 5 wherein a pitch pattern decoder
is connected to said buffer memory to receive selected pitch
designations and a second address counter is connected to said
pitch pattern decoder and said pitch pattern information memory
means for controlling the read-out of stored pitch pattern
information.
7. Apparatus according to claim 6 wherein said pitch pattern memory
means includes a second read-out control circuit controlled by said
second address counter, and further including a pitch counter
responsive to the output of said second read-out control circuit
for controlling said first address counter.
Description
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to an improvement in voice answer apparatus,
especially in audio message generating apparatus adapted to produce
a speech (voice) message corresponding to a reply sentence by
combining prerecorded voice sound waves.
2. Description of the Prior Art
The system is now under investigation wherein a subscriber calling
up and making an inquiry of a data processing center provided with
digital computers and similar equipment from a distant area may be
furnished with information corresponding to such inquiry on the
basis of the information recorded in said computers.
Especially, the voice or audio answer apparatus adapted to provide
information corresponding to the inquiry in the form of an audio
message is also under investigation.
However, the device for producing an audible sound or voice now in
use in the voice or audio answer apparatus is based on a so-called
recorded word compiling system wherein a number of words or phrases
are recorded in the form of human voice indications and these words
or phrases are read out and combined together by relying upon the
read-out and control signals supplied from the signal processing
device. The recorded word compiling system has the drawback that a
larger number of different words can be produced only with
considerable difficulty because of the large volume of necessary
information which must be stored and that a perfect random access
memory, such as a magnetic core, cannot be used because of the very
large number of memory elements involved. As a result, a partially
sequential access memory, such as a drum, must be used; and
therefore, the reply becomes rather nonspontaneous, because of the
prolonged access time per each word unit.
The large volume of the necessary contents may be attributable not
only to the direct memorization voice or sound waveforms, but to
such factors that (1) as far as the variable word (a word changing
with the contents of the reply, for instance, date, amount, the
name of the company, address and so forth) is concerned, a sound
for a single digit figure, for instance, must be recorded in a
plurality of different pitches, that is, in the raising pitch, flat
pitch and falling pitch, since the intonation of the variable word
is different according to the position of the variable word in the
reply sentence, and (2) the same word must be doubly and/or trebly
recorded at the different positions on a recording drum for
shortening the access time to the variable word.
Where the sound or voice waveform is directly recorded by word
units, since the pitch cannot be controlled because the unique
pitch pattern is limited to the time during which the voice or
sound is generated, it is impossible to reduce the information
which must be generated from the variable word.
SUMMARY OF THE INVENTION
Accordingly, the principal object of the present invention is to
provide an economical and practical audio message generating
apparatus.
Another object of the invention is to provide, in the audio (voice
or sound) replying apparatus adapted to generate a reply voice or
sound composed of fixed words and variable words, an audio message
generating apparatus capable of generating a voice or sound with
natural quality for the reply sentence and having an abundant
vocabulary in spite of its simpler construction.
A further object of the invention is to provide an audio message
generating apparatus adapted for generating a large number of
voices or sounds having different contents.
The apparatus for generating reply voices or sounds composed of a
fixed word and variable words according to the present invention is
so devised and arranged that the vocabulary corresponding to the
fixed word is recorded in a low (slow) speed read out memory, while
that corresponding to the variable words is recorded in a high
speed memory by random access as speech elements or segments each
having a pitch length substantially equal to that of the voice or
sound of the variable word, and that, at the time of reading out
the voice or sound or at the time of speech synthesis, when the
position of the variable words in the reply voice or sound is read
out sequentially from the low (slow) speed memory, a series of the
speech elements or segments are read out from the high speed read
out memory and are interposed between the voices or sounds of fixed
words (speech) which are being read out from the low (slow) speed
memory.
Therefore, the speech synthesis part of the audio speech (message)
generating apparatus according to the present invention is composed
of the following units.
A sequential memory (for instance, a magnetic drum) for recording
voices or sounds corresponding to the fixed words in a reply speech
sentence, a voice or sound (speech) segment memory (for instance, a
core memory) capable of high-speed read-out and in which the speech
elements or segments of voices or sounds (speeches) corresponding
to the variable words, with the length of the elements equal to the
pitch of the voice or sound, are recorded, a pitch information
memory in which the pitch patterns of the variable words in the
reply speech sentence are recorded, a control circuit to make the
selective changeover between the read-out from the low (slow) speed
memory and that from the high speed memory by relying upon a
control signal from the signal processing circuit, such as a
digital computer, and a circuit for combining the voice or sound
(speech) signals read out from the above two memories and producing
the voice or sound (speech) by converting these combined
signals.
With the above-mentioned construction of the present invention, it
is possible to produce voices or sounds (speeches) by compilation
and synthesis of the pitch-unit elements or segments for the
variable words while effectively utilizing the low (slow) speed
memory.
On that account, it is possible to reduce memory volume necessary
for generating the variable words and to increase the vocabulary in
case the memory volume is equal. Moreover, by reason of the
compilation and synthesis of the pitch-unit elements, the necessary
memory volume can be comparatively reduced and the use of the
high-speed memory for random access as the memory medium is
technically possible. Accordingly, with the apparatus of the
present invention, not only the low (slow) speed memory can be
effectively used and the kinds of reply sentences can be increased
but an access time at the time of reading out of the variable words
can be sufficiently reduced and with improved spontaneous voice
tone of the reply sentence.
These and other objects and characteristics of the present
invention will become more apparent by referring to the following
description and the annexed drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a schematic diagram showing the construction of the query
and reply system using the audio message generating apparatus of
the present invention;
FIG. 2 is a schematic block diagram showing the construction of a
preferred embodiment of the audio message generating apparatus
according to the present invention;
FIGS. 3a through 3c provide views showing the status of bit signals
recorded on one track of a magnetic drum; and
FIG. 4 is a waveform diagram showing schematically the audio
waveform of a consonant sound part and a vowel sound part.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
FIG. 1 is a schematic block diagram of the voice sound query-reply
system using the audio message generating apparatus of the present
invention. Although this system is known per se, a simple
explanation thereof will be provided to facilitate an understanding
of the present invention.
A signal processing apparatus 3 is provided containing an
electronic computer and an information storage section 4. This
processing apparatus supplies a digital output based on the
information stored in the storage section 4 in response to data
requests from outside. An audio message generating apparatus 2 is
provided for converting the digital output from apparatus 3 into a
reply sentence consisting of the voices or sounds. An input and
output conversion section 1 is provided for distributing the reply
voice or sound generated by the apparatus 2 to the inquirer and
connecting the questions of the inquirer to the above signal
processing section 3. Subscribers 5 are located at remote places,
namely the channel of the inquirer, which is connected by a
telephone.
The above system provides an effective means to be used for the
inquiry into up-to-date stock information, various reservation,
banking information services and so forth, wherein the variable
words are changeable words, depending on the contents of the reply,
such as, for example, the date, an amount, the name of the company,
address, etc., as defined above, while the fixed words are the
unchanging words of the reply sentence, according to the nature of
the inquiry.
In FIG. 2, showing the block diagram of one embodiment in
accordance with this invention, a slow access memory device 7 is
provided, such as a magnetic drum with multiple recordings of the
voices and sounds of a plurality of reply sentences. The voices and
sounds of the reply sentences are arranged in the form of a sample,
except for the variable word portion, and the amplitude signals
arranged in the form of a sample are encoded and recorded by time
sharing multiple recording. This system can be used for producing a
number of reply sentences consisting of voices or sounds at a
time.
Next, the details of the magnetic drum will be explained. The drum
rotates at 3,000 r.p.m. (makes a full revolution in 20 m sec) and
has 512 tracks divided into 16 zones to shorten the read out time.
Thus, each zone has 32 tracks. Thus, there are 16 read-out means,
such as magnetic heads, 1-1, 1-2, 1-3, . . . 1-16. The effective
clock frequency is 1.92 MHz. It is assumed that 1.92 .times.
10.sup.6 .times. 20 .times. 10.sup.-.sup.3 = 38.4 .times. 10.sup.3
bits are provided for each track, where the audio waveform is
sampled at 8 kHz, and one sample is coded by 7 bits and recorded
with control information of one bit.
The time sharing multiplexing degree on one track is 1.92 .times.
10.sup.6 /8 .times. 8 .times. 10.sup.3 = 30 (kinds). The longest
time interval for one reply sentence is decided by the number of
sentence forms; with 30 kinds of sentence forms, it is 20 .times.
10.sup.-.sup.3 .times. 512 = 10 seconds, and with 60 kinds of
sentence forms, it is about 5 seconds.
An access time to an arbitrary one of these reply sentences is 2
.times. 10.sup.-.sup.3 .times. 32 = 0.64 second and the number of
samples of the same replying sentence on one track is 8 .times.
10.sup.3 .times. 20 .times. 10.sup.-.sup.3 = 160 samples for each
track.
FIGS. 3a through 3c show the state of information recorded on one
track of the above magnetic drum. FIG. 3a shows the whole of one
track, indicating that 160 sample signals (a-1, a-2, . . . a-160)
are recorded for one voice or sound. The time needed for read-out
of these 160 samples is 20 ms, in which the magnetic drum 7 makes a
full revolution. FIG. 3b shows the magnified form of one sample
(a-3) of the 160 sections. As stated above, each sample signal of
the voices or sounds of reply sentences of thirty kinds is recorded
by time sharing multiple recording. The read-out time for all of
these sections is equal to the sampling period (frequency) of the
voice or sound and amounts to 0.125 ms. FIG. 3c shows the magnified
form of one section b-3.4 of the sample of FIG. 3b, indicating one
sample signal of one voice or sound.
As aforementioned, one sample is composed of one bit for control
information c-0 and seven bits c-1, c-2, . . . c-7 representative
of the value for one sample of the audio waveform. This one bit of
control information for discriminating whether a signal to be read
out next is a control information to read out, a voice (speech)
segment or element to be later described or a waveform sample
signal of fixed words. For instance, this bit is set to "0" when
the waveform sample is to be read out and to "1" when the control
information is to be read out. The various known PCM coding forms
may be used for producing sample values of the waveform.
When a control code 1 is detected from section C-O, at this time
two bits of information to specify the pitch pattern form of the
variable words which are to be inserted at the position of the
section b-3.4 are detected from the sections c-6 and c-7. A
waveform signal to form the variable word is recorded in a separate
high speed recording apparatus.
As above mentioned, in the magnetic drum 7, the audio (voice or
sound) waveform of the reply sentence except that of variable
words, is recorded as a time series of sample values as shown in
FIGS. 3a and 3b, and in the position of the variable words of the
reply sentence, a control signal showing that it is the variable
word is recorded instead of a sample signal.
Returning again to FIG. 2, a pitch pattern memory device 8 is
provided in which the pitch patterns of the above variable word are
recorded and a voice element memory 9 is provided, for instance a
high speed access memory, such as a core memory, in which the voice
waveform of the above variable word is divided by pitch units and
recorded.
Generally, the waveform of the human voice or sound is, as
typically shown in FIG. 4, composed of the part d having no
periodicity for consonant sounds and the part e having an
approximate periodicity for vowel sounds, each pitch period
e.sub.1, e.sub.2, e.sub.3, etc., being determined by the vibrations
of the vocal chords, the length of this period determining the high
and low values of the voice or sound or intonation.
This period is usually 20-30 ms or thereabouts. The part d of
consonant sounds does not affect so much on the high and low of the
voice or sound. In the device according to the present invention,
the human voice or sound corresponding to the above variable word
is divided by pitch lengths, the divided sound elements being
processed in a certain mode and arranged into samples. Each element
is numbered and recorded in the voice segment memory 9. The
processing consists in artificially correcting the waveform of a
segment or element cut from natural voice and sound, for example,
adding a forcasting waveform to the tail part of the segment
waveform or supplying the waveform of the segment to a
differentiating unit to give a differentiated voice segment
waveform. The voice waveform having no periodicity, such as that of
the consonant sounds, is usually partitioned to the average pitch
time lengths and the series of sample values are numbered for each
section and recorded in the voice segment memory 9.
The length of the voice segment cut from the audio waveform of the
variable word of human voice is, in general, shorter than 20 m
sec., so that the forecasting waveform is added to the tail of the
cut segment to provide an audio segment of 20 m sec., which is
subjected to sampling at a rate of 8 kHz to provide 160 sample
signals per segment. These sample signals encoded in seven bits in
the same way as the waveform in the memory 1 are recorded one after
another in the voice segment memory 9. On the other hand, the pitch
pattern of the variable word is variable with the position of the
variable word in the reply sentence, for instance, the ending of an
interrogative sentence.
Therefore, in the pitch pattern memory 8, a plural number of types
(in the present embodiment, four types -- such as flat, rising,
falling, and figure type) of pitch pattern control information are
recorded for each of the variable words recorded in the above audio
segment memory. The construction and operation of the apparatus of
a unit for making the reply voice and sound by using the above
memory apparatus is explained below.
Although, the following explanation is limited to a specified
circuit for making one reply voice and sound, it is apparent that
the same audio message generating apparatus can be used for a query
and reply system having many subscribers by using apparatus
arranged in parallel and a time sharing multiple processing.
In FIG. 2, a signal processing section 10 (including computers) is
installed at the center, in which section 10 questions (inquiries)
are received and the processing is conducted until the reply
sentence is selected. This section 10 is not described in detail as
it forms no part of the present invention, but such a circuit is
known from a number of prior publications, such as U.S. Pat. No.
3,214,520 and others which require slight modification. The control
signal for converting the reply sentence selected by this signal
processing circuit into voices and sounds is applied to the audio
message generating apparatus.
Namely, the control signal corresponding to the fixed word portion
of the reply sentence represents the recording place of the above
magnetic drum and that of the variable word is given by the pitch
pattern form and the recording place in the voice segment memory or
address number.
First, the signal for selecting a desired zone corresponding to one
reply sentence is applied to one of the selective gate groups. Each
gate is respectively connected to a read-out means 11-1, 11-2, . .
. 11-22.
When actuated, one of these gates is opened every 0.125 ms to open
OR gate 13 and pass 8 bits shown in FIG. 3c. A decoding circuit 14,
which may be of a conventional type, as described in the text,
Digital Computer and Control Engineering, McGraw-Hill, 1960, pages
547 - 553, for example, by which the foremost bit information (C -
0 in FIG. 3c) is extracted out of the 8 bits at the output of OR
gate 13, decides whether it is 0 or 1. When it is 0, i.e., when it
is the sample value of the fixed word, a gate drive signal f is
supplied to pass bit pulses C.sub.1 - C.sub.7.
Accordingly, unless the C - 0 signal which comes every 0.125 ms is
1, the sample from the magnetic drum passes through the OR gate 16
and is applied to the digital/analog converter 17 and converted
into an analog sample waveform to be delivered as the output in the
form of a reply voice and sound by the output distributor 18
through one of the output circuit groups (19-1, 19-2, . . . 19-n)
including a low pass filter and an amplifier.
Next, when the decoding circuit 14 decides that the C - 0 signal is
1, i.e., detects that not the sample value of the waveform but the
variable word is inserted therein, the signal f is set to 0 and
gate circuit 15 is turned off for inhibiting the passage of the
signal. The signal g which demands that instructions as to the
identity of the variable word to be issued is sent to the above
signal processing circuit and simultaneously the signal of 2 bits
designating the type of the pitch pattern recorded in the C.sub.6,
C.sub.7 segments is taken out by the gate circuit 20.
The control information of 10 bits indicating the variable word
from the above signal processing circuit 10 and the information of
2 bits indicating the type of the pitch pattern from the above gate
20 are transferred to buffer register 21 with the ten bits
occupying the upper rank and the two bits occupying the lower
rank.
Therefore, the control information is composed of 10 bits (1,000
words of the variable word) prescribing the variable word and 2
bits designating four kinds of pitch patterns.
The information of these twelve bits are decoded by the decoding
circuit 22, which also may be of a type described in the
aforementioned text, Digital Computer and Control Engineering, of
the pitch information address. This decoding circuit converts the
digital signal of said 12 bits into the address signal of the
memory in which the information of pitch interval of the leading
phonetic segment for the variable word to be read out is stored.
This address signal is registered in the address counter 23, and
thereafter it is applied to read-out circuit 25 through gate
circuit 24.
This read-out circuit 25 has a control circuit such as matrix
circuit and an amplifier, and reads out pitch information
registered in the address of pitch pattern memory. The pitch
information read out is registered in the pitch counter 27 through
gate circuit 26.
The contents of the pitch counter 27 are subtracted one by one by
the clock signal from clock signal source 28 (8 KHz) during the
time interval of the information of pitch interval set in the
counter 27. The pitch period of the leading voice segment is thus
detected and the read-out of the pitch period of the next audio
segment is controlled by stepping of the pitch address counter
23.
The above-mentioned gate circuits 24 and 26 are intended for
multiple utilization of informations from high speed pitch
information memory apparatus 8 at each circuit and operate so as to
be opened only for a predetermined time at the time allotted for
that channel and are closed at the time allotted for the other
channel. The read-out operation of the above-mentioned pitch period
(frequency) information is repeated as long as there is a voice
segment of a word indicating the variable word.
On the other hand, the read out of the waveform information of the
variable word is performed by the following circuit apparatus and
operation. First of all, 10-bit information transferred to buffer
register 21 is converted, by voice segment address decoding circuit
29, which again may be of a previously described circuit in the
text, Digital Computer and Control Engineering, into the address
number of the audio segment memory in which a series of voice
segments constituting the variable word are recorded, and the
leading address signal is registered in the segment address counter
30.
This counter 30 designates the rank of the leading address of the
sample value of the segment to be read out (in this instance, the
second and upper digits when the address is indicated by a 160 bit
system). Thereafter the contents of sample value address counter 31
(counter showing the order of the first digit of address indicated
by 160 bit system) are increased one by one by sample value read
out clock 28 (8 kHz) and supplied to voice segment read-out control
circuit 33 through gate 32.
The voice segment read-out circuit 33 reads out the sample value (7
bits) of voice segments designated by the sample value address from
the above-mentioned counters 30, 31 from audio segment memory 9
sequentially and adds the sample value to OR gate 16 through gate
circuit 34.
Meantime, when the end of a pitch period is detected, sample value
address counter 31 is reset to 0, and 160 is added to segment
address counter section 30 in order to transfer to the leading
number of the next segment (20 m sec. in 8 kHz sampling).
The above-mentioned gate circuits 32 and 34 are used for multiple
utilization of the high speed audio segment memory 9 in each
circuit, in the same way as the above-described gate circuits 24
and 26, and operate so as to be opened only for the constant time
period at the time allotted for that circuit and be closed for the
time period allotted for the other circuit.
Accordingly, the read-out cycle time for the high speed voice
segment memory 9 and high speed pitch information memory 8 required
is 1/(8 kHz x number of channels).
This operation is repeated thereater with the same pitch period
(frequency) as long as there is a voice segment of the word.
When the last one of the voice segments of a variable word
designated from the high speed segment memory 9 is detected and the
reproduction of the variable word terminated, the signal of such
termination is returned to the signal processing section 10 through
line 35 and the sample value of the audio waveform of the following
fixed word is read out from low (slow) speed memory 7.
Since the changeover from the fixed word portion to the variable
word portion is part of the read out from the high speed random
access memory, there is not involved any problem in the access
time, but rather it is necessary to provide a fixed pause duration
(for instance about 0.3 second), to carry out processing for read
out of the audio segment.
The changeover from the variable word portion to the fixed word
portion is part of the changeover of the read out from the low
speed part sequential memory, and there is provided the longest
pause duration (20 ms .times. 32 = 0.64 second).
The pause duration to this extent, is rather necessary, and the
problem is that there is occasionally a possibility of changeover
occurring with a very short pause. In order to avoid this, a fixed
pause duration of at least about 0.3 second is necessary to provide
for this changeover, and the safe value for this puase duration is
one second at the maximum. In case of complex forestalling control,
the changeover can be made with a constant pause time longer than
0.64 second.
A sample signal supplied from the OR gate 16 is converted into a
pulse amplitude modulated signal by the above-mentioned
digital/analog converter 17 and sent out to the prescribed answer
channel through output control circuit 18.
Although the above is an explanation of the audio message
generating apparatus having one channel, the number of answer
voices and sounds can be simultaneously supplied to a number of
circuits by time-sharing multiple processing by common use of the
above-mentioned signal processing apparatus 10, memory apparatus 7,
8, and 9.
In the drawing, for other circuits 16-1, 16-2 . . . . 16-16 only
corresponding to the above-mentioned 16 are shown. Therefore, when
multiple processing is utilized, the output of the digital/analog
converter 17 is a pulse amplitude modulated (PAM) signal subjected
to a time-sharing multiplexing processing. In the above, the
present invention has been described by referring to a preferred
embodiment thereof, but it is apparent that the present invention
is not limited to such embodiment, but it is subject to a number of
modifications in accordance with the practical objects of the
answer system to which the audio message generating apparatus of
the invention is applied.
The voice segment information can be reduced in order to make
memory apparatus 9 small sized, especially a processing circuit for
generating the variable word.
It may be often experienced that the very similar waveforms are
repeated in a voice waveform of a word with pitch units, the
typical instance being those at the middle part of the stationary
vowel sound.
These similar pitch waveforms can be replaced by a repetition of
the same pitch waveform without deterioration in the quality. In
case of a changing pattern, when the pitch waveforms can be used
repeatedly at intervals of every pitch or every two pitches, the
quality is almost not deteriorated and the capacity of the voice
segment memory can thus be reduced by one-half to one-third.
If, in addition, the memory to the high speed segment memory 3 is
not made according to the word unit but to the unit suitable to the
connection of a phonema chain and/or a dyhone, the control
mechanism will become complicated, but the vocabulary of the
variable word can be enlarged without limitations.
Taking the phonema chain as an example with six vowel sounds
including five vowel sounds and the silent and 20 kinds of
consonant sounds including a contracted sound, the total number of
the phonema chain is 6 .times. 20 .times. 6 = 720 with the average
length of time duration for one unit being 150 m sec. Thus, the
necessary memory information volume is 8 .times. 8 .times. 10.sup.3
.times. 150 .times. 10.sup.-.sup.3 .times. 720 = 7.0 .times.
10.sup.8.
As above described, the present invention provides an audio query
and reply apparatus free of such defects as shortage in vocabulary
inherent in the answer apparatus of the conventional recorded word
compilation system and nonspontaneousness in the sound or voice
resulting from prolonged access time.
Further, a simple apparatus unified with the recorded word
compiling technique can be advantageously used without need for a
complex speed synthesis system.
* * * * *