U.S. patent number 3,646,576 [Application Number 05/001,739] was granted by the patent office on 1972-02-29 for speech controlled phonetic typewriter.
Invention is credited to David Thurston Griggs a/k/a D. Turston Griggs.
United States Patent |
3,646,576 |
|
February 29, 1972 |
SPEECH CONTROLLED PHONETIC TYPEWRITER
Abstract
To convert speech directly into print as it is being spoken, by
machine, is a goal that has been thwarted by two critical wants:
(1) a way to perform many complex selective operations with great
speed, and (2) a way to close the gap between continuous speech as
an unbroken sequence of sounds on the one hand, and the distinctly
separated words and spelling conventions of the printed language,
on the other. Recently, taking advantage of the electronic
computer's speed with multiplex programmed operations, acousticians
have sought to achieve more accurate detection and separation of
speech sounds. Such efforts have required programming and
availability of extensive computer facilities to approach one step
in the problem. The approach described here, however, detects and
analyzes speech sounds instantaneously without a computer,
converting the sounds by means of comparators, timers, filters, and
switching circuits, into a real-time electrical phonemic analog of
what is said; then it adds a special-purpose digital computer
component to process and match syllabic sequences of sounds in the
language. Thus, the computer element is smaller and is used not for
phonetic detection but simply to give an output as close as
possible to conventional printing as can be obtained by means of a
prestored vocabulary of 12,000 words. The readout can be printed by
a modern high-speed electric typewriter.
Inventors: |
David Thurston Griggs a/k/a D.
Turston Griggs (Baltimore, MD) |
Family
ID: |
21697598 |
Appl.
No.: |
05/001,739 |
Filed: |
January 9, 1970 |
Current U.S.
Class: |
704/275;
704/235 |
Current CPC
Class: |
B41J
3/26 (20130101); G10L 15/00 (20130101) |
Current International
Class: |
B41J
3/26 (20060101); B41J 3/00 (20060101); G10L
15/00 (20060101); G10l 001/16 () |
Field of
Search: |
;179/1SA,1VS ;178/31
;35/35R |
References Cited
[Referenced By]
U.S. Patent Documents
Other References
Dolansky, On Certain Irregularities of Voiced-Speech Waveforms,
IEEE Transactions, Vol. Au-16, 3/68, p. 51-56.
|
Primary Examiner: Claffy; Kathleen H.
Assistant Examiner: Leaheey; Jon Bradford
Claims
What is claimed is:
1. Speech-to-writer apparatus comprising:
a detection and analysis transducer module receiving an oral and a
throat signal input and having sound separation means for
detecting, differentiating, processing and producing speech sound
signals according to at least the following stated categories: (1)
vowels and semivowels, (2) nasals, (3) unvoiced fricatives, (4)
voiced fricatives, (5) unvoiced stops, and (6) voiced stops, said
sound separation means (FIG. 2) having sensors responsive to the
oral signal and throat signal inputs,
a stops-or-silence means responsive to said sensors,
ratiometer circuits fed by said sensors and said stops-or-silence
means to produce an output signal, and
a plurality of logic gate means selectively responsive to outputs
from said sensors, said stops-or-silence means, said ratiometer
circuits, for singularly passing processed speech.
2. The invention according to claim 1 wherein said sound signals
produced by said transducer module are fed separately to a
transcriber module having a syllable storage and a word-vocabulary
storage.
3. The invention according to claim 1 wherein a printout signal
means for said transcriber module is connected to a typographic
unit producing a written output of either conventionally spelled
words or syllabic utterances.
4. The invention according to claim 1 wherein said sound separation
means (FIG. 2) has:
an oral sensor gate receiving an input of said oral signal and
producing digital outputs indicative of oral ON and oral OFF,
a throat sensor gate and low-pass filter receiving an input of
throat signal and producing digital outputs indicative of throat ON
and throat OFF signal,
differentiator means responsive to said oral sensor,
throat differentiator responsive to said throat sensor,
said stops-or-silence unit receiving inputs of said oral signal,
said oral ON and said oral OFF signals, said throat ON and OFF
signals, and said oral and throat differentiator outputs,
said ratiometer circuits receiving inputs from said oral and throat
sensor gates, said oral and throat differentiator outputs and
producing ratioed output signals, and
each said logic gate means receiving said oral signal, including an
unvoiced stop gate receiving outputs from said stop-or-silence
unit,
a voiced stop gate receiving outputs from said unvoiced stop gate
and said stop-or-silence unit,
an unvoiced fricative gate receiving said oral ON signals and said
throat OFF signal,
a voiced fricative gate receiving said throat ON signal and one of
said ratioed output signals, and
a vowel gate receiving outputs from said oral gate and one of said
ratioed output signals.
5. The invention according to claim 4 wherein (FIG. 4) said
unvoiced stop gate provides an output to a stop unit comprised of a
terminal for receiving signals indicative of an unvoiced stop that
is undifferentiable as to whether it is /p/, /t/ or /k/,
first, second and third band-pass filters for passing 1,800-2,200,
3,800-4,600 and 3,400-3,800 cycles, respectively, and receiving the
outputs of said unvoiced stop gate and said stops-or-silence unit
producing filter output signals, and
a first, second and third comparator, said first comparator
receiving output signals from said first and second filters, said
second comparator receiving output signals from said second and
third filters, and said third comparator receiving output signals
from said first and third filters.
6. The invention according to claim 1 wherein (FIG. 3) said
stops-or-silence means for voiced stops, unvoiced stops, and for
silence, comprises a comparator 146 receiving outputs from said
oral and throat differentiators and producing output when the
amplitude of said throat differentiator output is equal to or
greater than the amplitude of said oral differentiator output,
a switch means 126 receiving said produced output, said throat ON
signal, and an output from a second circuit indicating the presence
of a stop, said second circuit having
a timer 140 receiving inputs of said oral OFF and throat OFF
signals and producing a delay gate output,
a switch 141 receiving inputs of said oral differentiator and said
delay gate output producing complementary outputs identified as
silence signal without dt and stop with dt,
a timer gate (0.15 seconds max.) 134 receiving inputs of said oral
ON and OFF and oral differentiator signals, to pass a predetermined
delayed signal representing undifferentiated unvoiced stops,
a silence timer gate 130 receiving inputs of said silence signal
without dt and said oral ON signal producing a silence signal,
an unvoiced stop gate 142 receiving inputs from said throat OFF
signal and said stop with dt signal,
a timer (0.01) delay gate 132 receiving inputs of said oral signal
from said oral differentiator and from said second circuit means
producing mutually exclusive outputs of greater or less than
0.01-second delay duration,
a timer (0.06) 136 receiving inputs of said less than 0.01-second
delay duration and said oral signal, and also from the said oral
differentiator, producing an output direct to the unvoiced stops
transducers,
a delay switch (0.03) 144 receiving inputs of said more than
0.01-second delay duration, and said oral differentiator, and
producing outputs direct to both voiced and unvoiced stops
transducers.
7. The invention of claim 1 wherein said detection and analysis
transducer module comprises a nasal unit (FIG. 6) responsive to
oral and throat inputs, a plurality of filter comparators, one of
said comparators deriving an /1/ signal and one deriving an
undifferentiated nasal sound as a result of said added throat
input.
8. The invention of claim 1 wherein said detection and analysis
transducer module comprises a vowel detection unit (FIG. 7)
responsive to oral input to provide detection and differentiation
of single vowels from each other by ratio comparison of first and
second formant peaks to total signal strength deriving vowel
signals therefrom.
9. The invention of claim 8 wherein are means for deriving (FIG.
7A) second formant peak amplitude signals without reference to its
frequency which is accomplished by heterodyning.
10. The invention of claim 9 wherein said transducer module
operates substantially as well with oral input signals from either
a male or female voice.
11. The invention of claim 8 wherein said vowel detection unit
includes a transducer 340 receiving an oral input signal and a
signal indicative of the presence of an identified vowel, said
transducer producing an output signal indicative of an
undifferentiated vowel.
12. The invention of claim 11 wherein said transducer module
comprises a diphthong transducer (FIG. 8) that distinguishes
diphthong sound signals from single vowel sound signals (371-376)
by a differentiated oral signal applied to a memory, and a first
formant quotient signal and from a signal from the ratio comparison
of first formant peaks to total signal strengths (claim 9), said
memory acting upon a comparator to condition diphthong gates
(461-466) to pass a diphthong signal, said memory also acting on a
switch (434) to shunt nondiphthong sound signals to a transcriber
module.
13. The invention of claim 2 wherein said transcriber module (FIG.
9) comprises a phoneme sequence sensor and designator (490) for
receiving signals indicative of fricatives, stops, vowels,
diphthongs, nasals, and a silence signal;
a regrouper and storage means 494 of preselected phoneme groupings
responsive to the output of the designator,
a syllable retainer 510 having a storage capacity of syllables to
be actuated by the regrouper means output, and
a word vocabulary 516 cumulatively responsive to the consecutive
output of the syllable retainer for producing print out signals of
the longest word forms provided by spoken input.
14. The invention of claim 13 wherein said transcriber module in
response to receiving said silence signal produces a spacing
indication in the printout signal, indicating spacing between words
and punctuation, said printout signal providing either (a) dot
signals for a typographic unit, or (b) punctuation signals
responsive to specific voiced inputs, e.g., "comma,"
"semicolon."
15. The invention of claim 14 wherein the transcriber module (FIG.
9) comprises storage of phoneme sequences responsive to (a) an
indication of stress in the oral input signal, (b) output from the
regrouper means, and (c) the output of the syllable retainer, for
producing a printout signal for short words and symbols in capital
letters showing said stress.
16. The invention of claim 15 wherein the transducer module
produces printout signals for driving a Braille printing device.
Description
SPECIFICATION
The present invention relates to a mechanism which transcribes
human speech of the standard American variety from any adult
speaker, instantaneously and automatically into a typed printout
that normally consists, 90 percent or more, of words that are
spaced and appear in conventional spelling. The system comprises
three modules and employs dual inputs, as shown in FIG. 1.
One of the objects and advantages of the present invention is that
there is provided a device that transcribes automatically to give a
printed output which is over 90 percent in conventional spelling
with separations into words or syllabic units, instantaneously
derived from the spoken input of standard American English.
A further object of the present invention is to provide real-time
detection and analysis of speech sounds achieved by preswitching
sounds according to their six manners of production into separate
analytical circuits for each type, namely, vowels, nasals, unvoiced
fricatives, voiced fricatives, voiced stops and unvoiced stops.
A further object of the present invention is to better
differentiate one from another the voiced stops (plosives) and
nasals where two different kinds of vocal inputs are used.
Another object of the present invention is to provide means for
producing distinctions between voiced stops and unvoiced and
unreleased stops (plosives) which are detected independent of
voicing, when necessary, by means of rate of change of signal
strength and durational timing.
A further object of the invention is to provide specific detection
of speech sounds regardless of differences of pitch as between
different speakers. Detection of vowels based upon frequency
measurements alone is obviated because the centrum frequency and
peak amplitude for the first formant are detected and measured and
then correlated to centrum peak amplitude for the second formant,
with a ratio characteristic for each vowel sound, regardless of
pitch.
Another object of the invention is to provide signal indications
for undifferentiated stops, both voiced and unvoiced, for an
undifferentiated nasal and for an undifferentiated vowel, when
those occur, so that the most probable intended but slighted
phonemes may be interpolated in the processes of syllable and word
formation by the device.
A still further object of the invention is to provide a means of
single vowel detection so as to reduce phoneme storage, the
diphthongs being identified by a process based upon detection of
the simple vowels, but signaled out in the same manner.
Another object of the invention is to provide phonetic detection
processes which produce differentiation of 38 different phonetic
entities which serve as phonemes for the transcription process, and
two additional signals--(a) an indication of silence and (b) an
indication of syllabic stress.
A further object of the invention is to provide separation between
successive identical sounds where one terminates the first word and
the other starts the next word, this being accomplished by timing
the duration of each type of sound so that only one signal will
pass during the normal duration of a single occurrence of the
sound, but a second signal will be emitted when it is prolonged as
a bridge between successive words as a repeated sound.
Another object of the invention is to provide, by taking the vowel
as the basis, for the analysis of the input by syllables, and the
invention deals with up to 337 distinctive kinds of syllabic
sequences of different types of phonemes. By means of these
combinations, syllables are separated in connected speech, and
speech vocabulary is classified. Provision is made for
reconstitution of erroneous syllabic formulations, also
instantaneously. Certain nasals are accorded syllabic import where,
through their position, they supplant an articulated vowel.
A further object of the invention is to provide for the
accumulation of separated and identified syllables, which are
combined according to patterns arranged in a prestored vocabulary
so that words will be formed, the longest possible ones first, and
to provide a printout in conventional prestored spellings as a
result of this matching of incoming material with what has been
stored. This is both for ease of reading and to separate words for
printing from within the stream of connected speech.
Another object of the invention is to provide for a vocabulary of
12,000 or more words and syllables in storage, with about 1,150 of
these in a supplementary store so that short words of three or less
phonemes print out independently but only after they have been
useful as parts of longer stored words.
A further object is to provide, by means of coded designations for
stored syllables, variations of pronunciation permissible within
the language structure when these designations are matched to the
stored vocabulary words for printout.
Another object is to provide for compensation for omissions in the
stored vocabulary (mainly proper names) by printing a phonemic or
syllabic printout of incoming unstored verbal material in a close
approximation to conventional spelling, with stress indicated.
Another object is to provide for a printout of punctuation and of
numerals and also of letter designations for spelling from spoken
inputs by means of the syllable designations and vocabulary
matching processes.
A still further object of the invention is to provide for real-time
printout designations at a speed of 10 phonemes per second for the
entire vocabulary, except that printing of each word must delay
until completion of that word, the maximum time being approximately
2.5 seconds.
Another object is to provide for 88 printout signals which are
suitable for a conventional high-speed typewriter or printer
capable of handling 10 characters per second. The 88 printout
signals may also be adapted to drive a Braille printing device, as
well as a conventional printer.
Another object is to provide for the 88 printout signals characters
which provide capital letters for spelling or designation purposes,
numerals, punctuation and indications of phonemes where stress has
occurred as shown by vowels appearing in boldfaced type. A separate
spacing indication signal to the printing module after words also
is provided by the invention.
The design of the invention permits a choice of vocabulary
constituents or substitution of various ones, either as an entire
group or through individually altered circuits without requiring a
wholly different apparatus.
The invention provides at least two opportunities for recording
spoken material for subsequent delayed transcription--(1) a
dual-track recording by oral and throat microphones jointly on
tape, or (2) a recording of the output of the detected phonemes
from the transducer module on single track .
These and other objects and advantages of the invention will become
apparent upon full consideration of the following detailed
description and accompanying drawings in which:
FIG. 1 is a block diagram of the three modules comprising the
invention according to the preferred and best mode of the invention
and showing that the transducer module employs two inputs;
FIG. 1A is a block diagram showing the manner in which FIGS. 1-9
are connected;
FIG. 2 is a block diagram of the sound separator apparatus or unit
of the transducer module;
FIG. 3 is a block diagram of the stops-or-silence detector;
FIGS. 4 through 8 show methods of detecting individual speech
sounds and their components based upon the concept of filtering at
various frequencies with necessary comparators, switches and
attending circuitry, and in particular FIG. 4 shows a block diagram
for detecting and processing stop speech sounds;
FIG. 4A shows a block diagram for processing undifferentiated
voiced stop speech sounds;
FIG. 5 shows a block diagram for detecting and processing fricative
sounds;
FIG. 6 shows a block diagram of a nasal unit for detecting and
processing nasal sounds;
FIG. 7 shows a block diagram of a vowel detection unit for
processing vowel sounds;
FIG. 7A shows a block and circuit diagram of a second formant
scanner unit for processing the second formant of vowel sounds;
FIG. 8 shows a circuit and block diagram of a diphthong transducer
unit for detecting and processing diphthong sounds; and
FIG. 9 is a block diagram of the transcriber module 20 of FIG. 1
according to the preferred and best mode of the invention.
Note also the following tables in the Appendix:
Table 1 is a chart in the specification of sequences of phonemes in
syllabic formations;
Table 2 is a chart or listing of the regrouping of phoneme
sequences from NO-GO syllables into new syllables; and
Table 3 is a chart in the specification showing the proposed type
font consisting of 88 figures or characters that are print-outs of
the typographic unit of FIG. 1. Also included with the 88 figures
is a spacing unit, totaling 89 signal inputs thereto.
Table 4 is a chart showing differentiation of vowels according to
formant peaks.
Referring now to the drawings, there is shown in FIG. 1 the
transducer module which accepts an oral signal which is received by
a microphone (not shown) that picks up the voice that is spoken by
an adult and is fed to the transducer module on oral signal line
12. Also provided to the transducer module is a signal derived from
a microphone positioned on one's throat above and to one side of
one's Adam's apple and in substantial contact with the skin
surface, and is fed to the transducer module on conductor 14.
As is also shown in FIG. 1, the transducer module 10 sorts and
detects the phonetic elements needed for subsequent transcription
which is accomplished in the circuitry and components of FIGS. 2-8
to be described below, and produces 38 real-time electrical output
signals on conductors 16 representing the phonemes that have been
detected and delineated, including diphthongs. Also within the
conductors 16 are two additional outputs, one indicating duration
of silence and one indicating stress, to be described below.
The second or transcriber module 20 of FIG. 1 is a modified digital
computer unit, more particularly described in connection with FIG.
9, and which receives the 38 phoneme output signals from conductors
16 together with the two other signals indicating durations of
silence and stress, respectively. The transcriber module divides
its phoneme input into 337 types or patterns of syllables and makes
words from a stored vocabulary of 12,000 or more words; and for
syllables not stored, it arranges a similar printout. The output is
appropriately coded so as to drive a high-speed typewriter or
similar printing device 26, and thus produces thereby its written
output. The printing device 26, also called a typographic unit, is
responsive to activating signals of 88 different characters
including a punctuation signal, and provides stress for isolated
syllables and provides spacing after each of the language units it
determines.
Thus the third module is any suitable high-speed typewriter or
similar printing device which will accept the outputs of the
transcriber module 20 at speeds up to 10 characters per second with
type font modified to accord with the 88 outputs and space signals
totaling 89 signals, as necessary.
As is shown in FIG. 1A, there is shown the arrangement in which the
various FIGS. 2-8 interconnect in forming the transducer module 10
of FIG. 1. In FIG. 2, there is shown the sound separator 30 having
the oral signal line 12 and the throat signal line 14 connected to
sensor elements 32,34 respectively, which sensor devices amplify
for analyzing the inputs applied thereto for analysis as to kinds
of speech sounds so that they may be shunted through or conveyed to
different subsequent analytical circuits according to the kinds of
speech and their constituent components, for detection of
individual speech sounds in several determined categories. From the
sensor device 32, an output is conveyed or coupled to a linear
amplifier or, what is called herein, a "VOGAD" 36, from whence it
is fed to a set of switches or gates 40,42,44,46,48,50; each of
which passes the oral input when appropriate for its category of
kind of speech sound being analyzed. These categories are six in
number and relate in the following manner to the switches
40-50.
Switch 40 - unvoiced stops
Switch 42 - voiced stops
Switch 44 - unvoiced fricative
Switch 46 - voiced fricative
Switch 48 - nasals
Switch 50 - vowels
By these divisions or separations of conventional electric analogs
of an oral input, there are derived signals from switches 40-50
that provide means of preswitching nasals, vowels and voiced
fricatives.
Signal means of opening each of the gates or switches 40-50
likewise is shown in FIG. 2 by means of a series of three
ratiometers 52,54,56, which act upon the ratios of amplitude of the
signal strength in line 58 from sensor 32 and from the signal
strength in line 60, as applied to the ratiometers 52-56. The
signal strength in lines 58,60 essentially conduct or pass signals
indicative of the strength of the oral and throat inputs
respectively, and an oral OFF-switch 62 coupled to line 58 provides
a ratio of change of strength signal to the ratiometer 54, while
the sensor 34 provides an output to a 700-c.p.s. low-pass filter 64
which provides an output to a throat signal differentiator 66 which
provides a rate of change of signal strength to the ratiometer
56.
In the ratiometer 52, there is an amplitude comparison of signals
of about 3 to 2 at the throat for indicating a vowel; a throat
input signal ratio of about 2 to 1 for providing the oral input
indicating a nasal sound; a ratio of approximately 1 to 1 which
characterizes a voiced fricative sound.
The ratiometer 52, when satisfied, activates a vowel gate over
conductor 52a in gate 50. The ratiometer 54, when satisfied,
activates the nasal gate 48 over line 54a. The ratiometer 56 which
provides an approximately 1-to-1 output comparison, when satisfied,
provides activation of the gate or switch 46 which provides a
voiced fricative indication.
During a rapid rate of change of oral input as shown by the sensors
32,34, the ratiometers 52,54,56 are cut off so that transitional
states that may be developed will not register inappropriate ratios
therein. A 700-Herz low-pass filter 64 is connected to the sensor
34 so that only the lower frequencies which are present and
indicative of the signals and information found in the throat input
14 are used for detection in this process to develop rate of change
output to the throat differentiator 66, and ON-OFF indications over
conductors 70,72, respectively. The ON-OFF indications on
conductors 70,72 are used as inputs to the stops-or-silence
indicator 74 in FIG. 3. The ON output of filter 64 is provided as
an input 76 to the voiced fricative switch 46, and the OFF-signal
72 is used as an input to the unvoiced fricative switch 44.
The stops-or-silence indicator 74, which more particularly is shown
in detail in FIG. 3, has several inputs, namely, 70,72, as referred
to above, signal line 80 from throat differentiator 66, signal
conductor 82 from the oral differentiator 62, and oral OFF signal
on conductor 84, and oral ON signal on conductor 86, and an oral
input from VOGAD 36 on conductor 88.
A threshold adjustment means 90 is connected from the output of the
sensor 32 to a stress signal terminal 92.
There are seven general output signals from the stops-or-silence
indicator 74, three of which are applied to the unvoiced stops
switch 40, i.e., conductors 94,94,169, two of which are applied as
output signals to the voiced stop switch 42 over conductors 96, 96,
an output to a silence terminal 98, and an output from a switch 140
that shows probable presence of a stop, which output passes by
conductor 128 to close the unvoiced fricatives gate 44 in FIG. 5.
This prevents mistaken identification of an unvoiced stop as an
unvoiced fricative in the process shown in FIG. 5.
An output 169 used as an indication of an undifferentiated unvoiced
stop, hereinafter defined as "p, t, or k," is used also to pass
through the unvoiced stop switch 40 and then into the detection
circuit of FIG. 4A.
Shown in FIG. 2 is the output of the linear amplifier or voltage
VOGAD 36 to which input is applied over conductor 12. The output of
VOGAD 36 provides a signal over conductor 88, as described above,
to the stops-or-silence unit 74, and a further output to the
unvoiced stop gate 40 over conductor 110, an output also to the
voiced stop gate 42 over conductor 112, an output to the unvoiced
fricative switch 44 over conductor 114, an output to the voiced
fricative switch or gate 46 over conductor 116, an output to nasal
switch 48 over conductor 118, and an output to the vowel switch 50
over conductor 120. These outputs from VOGAD 36 are used in
conjunction with deriving the gated output of the switches 40-50.
The outputs of these switches 40-50 are applied to further circuit
units of the system as shown in the output terminal extending below
the switch in FIG. 2 so that the output of switch 40 is applied as
an input to FIG. 4; stop 42 is applied similarly to FIG. 4A; the
switch 44 is applied to FIG. 5, respectively; switch 46 to FIG. 5,
respectively; nasal switch 48 to FIG. 6, respectively; and vowel
switch 50 to FIGS. 7 and 7A, respectively. The signals from the
VOGAD 36 are used to derive a measure of stress to be used in the
transcriber module of FIG. 9 through the above-described
arrangements. In the stress indication, an adjustment is provided
as threshold adjustment 90 to the threshold above in which there is
to be an indication of stress. The output 92 is applied to FIG. 9,
as is shown and will be described in detail below.
Similar to the VOGAD unit 36 for the oral signal of conductor 12,
there is also a VOGAD-unit 122 connected to conductor 14 which
carries the throat input signal, and the output of VOGAD 122 is
applied as a gate signal to the nasal switch 48 over conductor
124.
FIG. 3, which has been described in part above, shows the
stops-or-silence detector 74 which receives inputs
70,80,82,84,72,86,88, and which emits six outputs as shown. The
purpose is to distinguish the true silence from various kinds of
stops or plosive sounds and to distinguish the different kinds from
each other when possible. However, when not possible, a signal for
undifferentiated unvoiced stops is produced in conductor 94. The
stops-or-silence detector is provided with voiced stops switch 126,
and the unvoiced stops switch 142. It also provides a direct oral
plosive input direct to the detection or transducer circuits for
the unvoiced stops over conductor 94, and for the voiced stops over
conductor 96. The two remaining outputs are indications of
undifferentiated unvoiced stops over conductor 169, and of silence
over conductor 98. The method of operation is essentially by means
of timers and delay circuits, such as silence timer 130, a
0.01-second timer 132, a timer 134, a 0.06-second timer 136, a
third timer of 0.04-second delay 140, and switches 141,142,144.
Prerequisite to all regular stops must occur a silence of at least
0.04 second followed by a rapid rate of change of oral signal.
Delay switch 140 detects this from inputs 72,84,82, releasing
signal 128 when these conditions occur. For voiced stops, a
comparator 146 uses inputs of the oral and throat rates by
conductor inputs 82,80 to establish whether there is greater than
1:1 ratio; and if so, and if the throat voltmeter or VOGAD 122 is
in its ON-condition 70 and there is a stop signal from delay switch
140 with 128, the voice stops gate 42 is activated. The silence
output over conductor 98 depends on four inputs, that is, the oral
VOGAD or voltmeter OFF condition from conductor 84, the throat
voltmeter OFF condition from conductor 72, the absence of oral
voltmeter ON condition from conductor 86, and the absence of
indication of oral rate of change from oral differentiator circuit
62 from conductor 82. If the throat and oral input are OFF for 0.04
seconds, as determined by delay 140, and if there is no
rate-of-change signal, a switch 141 emits a signal which may
eventuate in a silence output over conductor 98; but with a rate of
signal change present, the switch 142 will emit a signal to open
the unvoiced stop gate over conductor 94. This separate switch 142
will open the gate only when there is determined to be a throat
voltmeter OFF indication received over conductor 72. To return to
the incipient silence signal from switch 141, a silence indication
will pass to produce an output on conductor 94 showing silence if
timer 130 is thus satisfied for 0.15 seconds; but if an oral
voltage cuts in sooner from conductor 86, no silence indication is
passed via conductor 98.
It is seen that thus far there are distinctions made between
different kinds of stops, as well as between the stops and silence
breaks in the speech which are assumed to be detectable upon a
clear and intelligible speech input to the system. However, there
are stops which are not sufficiently clear to be distinguished by
the foregoing method of analysis; for these a separate provision is
made to feed the input to the stops transducers in FIGS. 4 and 4A,
as follows. The oral input 88 is suspended and stored by a timer of
0.01 seconds (timer 132) which is triggered by the switch 141
through conductor 128. The timer also receives an oral rate of
change input 82. When there is a continuing rate of change during
that time, the oral input undergoes additional suspension and
storage up to 0.06 seconds as determined by an additional timer
136; and if during that succeeding 0.06-second interval, there is a
rapid rate of change of oral input, then the stored plosive or stop
oral input is supplied to the unvoiced stops transducers over
conductor 94. It is supplied from the retaining timers 132,136 of
of the stops or silence detector rather than through the normal
opening of the gate. If there is no variation in the rate of change
during the 0.01 second of timer 132, the oral input signal instead
of passing to the 0.06 second timer 136 is switched to 0.03 second
timer 144 for retention. Then it is released to the transducers
either of voiced 96 or unvoiced 94 stops, depending upon the
reading of the associated comparator 144. This comparator has two
settings as to fast or slow rate of change during the 0.03-second
period, switching the stored input to unvoiced stops 94 to FIG. 4
with rapid change, or to voiced stops 96 to FIG. 4A with slow rate
of change.
The foregoing described programs or processes will not handle the
unvoiced stops which are not released and not distinct enough to be
analyzed from ordinary speech in the transducer circuits beyond the
gates. In order to determine the presence of such undifferentiated
stops, i.e., "p, t, or k," the oral voltage ON and OFF inputs and
oral rate of change readings are used in the timer 134 which
receive from conductor 84 the oral OFF signal, from conductor 86
the oral ON signal, and from the oral differential signal from
conductor 82. The timer 134 is set for 0.15 seconds and is a
maximum transducer, activated only when there has been a high rate
of change followed by zero change with oral voltage nil. If the
oral voltage recurs within 0.15 seconds, the associated transducer
in timer 134 emits a signal over conductor 169 for undifferentiated
unvoiced stops. However, if there is a silence longer than 0.15
seconds, the transducer in timer 134 is not activated and the
circuit output provides nothing.
FIG. 4 shows detection of individual unvoiced stops by stops
detector 150. There are two inputs to the stops detector 150,
namely, the unvoiced stops over conductor 88 processed by the stops
or silence detector of FIG. 3, and the direct input from the gate
40 in FIG. 2 applied over conductor 94. The transduced
undifferentiated unvoiced stops signal from the stops or silence
detector of FIG. 3 is carried over on conductor 169 as an
additional output of unvoiced stops gate 40. For detection of /p/,
filters 152,154 are used with the resulting voltages compared in a
comparator 156. Filter 152 passes 1,800-2,200 c.p.s., and filter
154 passes signals in the band 3,800-4,600 approximately, while the
comparator 156 provides an output of filter 152, filter 154 being
greater than one. This means that a ratio showing the output of
filter 152 to be greater than filter 154 output will give a
transduced output signal for the sound /p/ on conductor 164. For
detection of /k/, there are provided a filter 154 and filter 158,
filter 158 passing a band of frequencies between 3,400-3,800
c.p.s., so that the resulting outputs of filters 154,158 are
applied to a comparator 160 wherein the voltage amplitude of
filters 154,158 is greater than unit 2, such that the ratio of
filter 154 and filter 158 must be more than 2:1 to give the
transducer output for the sound /k/ on conductor 166. For detection
of /t/, the voltages from filters 158, 152 are correspondingly
compared in comparator 162 wherein a ratio of output voltages of
more than 2:1 for the outputs of filters 158 and 152 provides
determination of the transducer signal for /t/ on conductor
168.
FIG. 4A illustrates how a similar process is arranged for detection
of the individual voiced stop sounds in a circuit called voiced
stop transducer 170. Two inputs, namely, the oral input shown in
FIG. 2, pass through the voiced stops gate 42 and provided from
conductor 42a, together with a separate signal from the delayed
input from the stops-or-silence detector of FIG. 3 over conductor
96.
The inputs to the undifferentiated voiced stop 170 from conductors
42a and 96 are fed to an ON-OFF detector 172 used to show there is
an active input from either source over conductors 42a and 96. The
ON-OFF detector feeds an input to each of band-pass filters such as
filter 174 passing 1,100-1,700 c.p.s., filter 176 passing
1,700-2,000 c.p.s., and filter 178 passing 2,000-2,400 c.p.s. The
ON-OFF detector also provides a signal to gate or switch 180.
The output voltages passing filters 174,176,178 are applied to a
comparator 182 so that the voltage from filter 174 is compared
together with the combined voltage output from filters 176,178.
Comparator 182 thus compares the output of filter 174 to the sum of
outputs from filters 176,178, and if the order of magnitude of the
comparison is greater than unity, an output of comparator 182 on
conductor 184 is derived indicating the sound /b/ on conductor 184.
The output thus is transduced or present inly if the ratio is
greater than 1:1.
The detection for /g/ by comparator 186 is developed by applying
the output of filter 178 for comparison with the sums of outputs of
filters 174,176, so that when this comparison or ratio is less than
unity, an output on conductor 190 provides the detection of /g/.
FIG. 4A also produces a signal for the sound /d/ which is detected
when there is a voltage output derived from filter 176 over
conductor 192 to a switch 194. Switch 194 produces the output in
conductor 196 when there is input over conductor 192 from filter
176 simultaneously compared with the absence of outputs from
comparators 182,186 over conductors 184,190, respectively. Thus the
sound /d/ is detected when there is the absence of detection of
sounds /b/ or /g/, respectively.
In order to determine the presence of an undifferentiated voiced
sound /b/, /d/ or /g/, outputs from conductors 184,190,196 are
applied to the gate or switch 180. The presence of an output from
the ON-OFF detector 172, but in the absence of any specific filter
output over conductors 184,190,196, switch 180 produces or releases
a transducer signal indicating an undifferentiated voiced stop on
conductor 200.
FIG. 5 shows a fricative transducer module 202 for detection of
both voiced and unvoiced fricative sounds by means of a network
stage of five low-pass filters 210,212,214,216,218 coupled by
comparators 220,222,224,226,228. Low-pass filter 210 passes up to
10 k.; filter 212 passes up to 8 k.; filter 214 passes up to 5 k.;
filter 216 passes up to 2.5 k.; and filter 218 passes up to 1 k.
Input signal on conductor 88 from VOGAD 36 of FIG. 2 is applied to
the low-pass filter 210 and to total signal switch two-way 230
which processes about 96 percent of the total signal strength of
applied unvoiced fricative signal on conductor 44a from switch 44
(FIG. 2), or processes about 60 percent of the total input signal
on conductor 88 for voiced fricative signal on conductor 46a (FIG.
3). The voice input 88 is capable of interruption by switch 206
activated by the stops-or-silence detector in FIG. 3 through
conductor 128. This input therefore will be lacking when a stop is
present.
The two-way output of total signal switch 230 is applied to each of
the comparators 220,222,224,226,228. The two-way output is actually
realized by the conductor 46a for the voiced fricative signal being
applied to a voltage-responsive switch means 234, which upon
actuation thereof, mechanically activates a series
single-pole-double-throw (SPDT) switches 240,242,244,246 from an
initial position (shown vertically) to an actuated position (shown
diagonally disposed) by lever means 248, schematically illustrated
in dotted lines.
Now having described the physical arrangements, it is shown in FIG.
5 that the oral input signals present on conductor 88 are passed
through the 10-k. low-pass filter 210 which output is applied to
the comparator 220 which compares the output of total switch 230,
i.e., either 96 percent of the total signal strength in case of
unvoiced fricative, or 60 percent of the total signal for a voiced
fricative. If the comparison of the 10-k. filter output/total
switch output is greater than 1:1, SPDT-switch 240 connected to the
output of comparator 220 passes an output signal for /unvoiced th/,
unless two-way switch 234 is activated; in such case, an output
signal /voiced th/ is produced, as shown.
The 10-k. low-pass filter 210 output is applied over conductor 250
to the 8-k. low-pass filter 212, which output is applied to the
comparator 222 for comparing the output of total switch 230, as
above described. If the comparison of the 8-k. filter output/total
switch output is greater than 1:1, SPDT switch 242 connected to the
output of comparator 222 passes an output signal /f/ unless two-way
switch 234 has been actuated; in such case, an output signal /v/ is
produced, as shown.
The 8-k. low-pass filter 212 output is applied over conductor 252
to the 5-k. low-pass filter 214, which output is applied in turn to
the comparator 224 for comparing the output of total switch 230, as
above described. If the comparison of the 5-k. filter output/total
switch output is greater than 1:1, SPDT-switch 244 connected to the
output of comparator 224 passes an output signal /s/ unless two-way
switch 234 has been activated; in such instance, an output signal
/z/ is passed, as shown.
The 5-k. low-pass filter 214 output is fed over conductor 254 to
the 2.5-k. low-pass filter 216 which output in turn is fed to the
comparator 226 for comparing the output of total switch 230, as
above described. If the comparison of the 2.5-k. filter
output/total switch output is greater than 1:1, SPDT-switch 246
connected to the output of comparator 226 passes an output signal
/sh/ unless two-way switch 234 has been actuated; in such instance,
an output signal /zh/ is passed from the switch 246.
The 2.5-k. low-pass filter 216 is fed over conductor 256 to the
1-k. low-pass filter 218, which output in turn is fed to the
comparator 228 for comparison with the total switch 230 output, as
has been described. If the comparison of the 1-k. filter
output/total switch output is less than 1:1, output conductor 250
passes a sound /h/. It is noted that the ratio is less than that of
1:1 showing that the 1-k. filter will have critically cut the
frequencies within its band range, therefore properly identifying
the proper /h/ speech sound.
Switch 234 is a mechanical gang switch activated by the voiced
fricatives gate 46 of FIG. 2, so as to switch the transducer output
from an unvoiced fricative to the voiced fricative sound that is in
the same bandwidth.
In general, therefore, the same process of comparison and switching
continues through the cascade of filters 210-218, except that with
the last one (1-k. filter 218) there is no further filter, and a
ratio of less than 1:1 when present will signal the presence of an
/h/.
FIG. 5 there is seen to yield output signals indicative of
/.theta./, /voiced th/, /f/, /v/, /s/, /z/, /sh/ /zh/ and /h/, a
total of nine speech sounds.
FIG. 6 illustrates the nasal detection 260 having oral input from
conductor 12 and throat signal input 14 (see FIG. 2). These signals
are coupled for reinforcement of the lower frequencies on conductor
262; FIG. 6 produces the detection of nasal sounds /m/, /n/, /ng/,
and undifferentiated nasal or /n/, and /l/.
The conductor 262 having reinforced input signals thereon is
applied in parallel to each of band-pass filters including filter
270 passing 700- 1,200 c.p.s. filter 272 passing 1,300- 2,800;
filter 274 passing 700- 1,000; filter 276 passing 1,400- 2,100;
filter 278 passing 2,000- 3,000; and filter 280 passing 1,600-
2,000.
The outputs of filters 270 and 272 are applied to ratio or
amplitude comparator 282 which produces a signal on conductor 284
indicative of the /m/ sound when filter 270 output/filter 272
output is greater than unity.
The outputs of filters 270 and 272 are applied to ratio or
amplitude comparator 286 which produces a signal on conductor 288
indicative of the /n/ sound when filter 272 output/filter 270
output is greater than unity.
The outputs of filters 274 and 276 are fed to a ratio or amplitude
comparator 290, which produces a signal on conductor 292 indicative
of the /ng/ sound when filter 274 output/filter 276 output is
greater than unity.
The outputs of filters 278 and 280 are fed to a ratio comparator
294 which produces a signal on conductor 296 indicative of the /1/
sound when the filter 278 output/filter 280 output is greater than
unity.
The filter 274 output is fed to a gate or switch 300 to which also
is fed the output of comparator 282 on conductor 284, the output of
comparator 286 on conductor 288 and the output of comparator 290 on
conductor 292; such output of switch 300 is a signal, if any,
indicative of an undifferentiated nasal sound, i.e., /.gamma./ on
conductor 302. A voltage through filter 274 between 700 and 1,000
Hz. will yield an output unless outputs on conductors 284,288 or
292 show the presence of a specifically identified /m/, /n/ or
/ng/.
The vowel detector 310 in FIG. 7 operates with only two inputs: the
oral input signal on conductor 50a in FIG. 2, and total oral signal
strength component on conductor 58 in FIG. 2. In summary, it
produces transduced output signals, conductors 311 representing 10
vowel sounds including /r/ and one undifferentiated vowel signal.
Eight of the vowel signals are supplied to the diphthong transducer
of FIG. 8, and three of them are supplied directly to the
transcriber module 20 of FIG. 9. In addition, a ratio signal for
peak amplitude of first formant as a function of total signal
strength 344 is passed to the diphthong transducer in FIG. 8.
The method used in differentiating vowel sounds is to compare the
peak (centrum) amplitudes of the first and second formants relative
to each other and to the total oral amplitude at the same time,
then to check this against the frequency of the first formant.
Detection of the first formant follows conventional methods by
using a bank of filters 312 at 20-Hz. intervals between 240 and 960
Hz. This enables determination of the frequency of the first
formant centrum. Detection of the second formant in FIG. 7A is done
differently--through scanning for the peak, without regard for the
exact frequency, by means of heterodyning.
The input signal from the vowel gate on conductor 50a of FIG. 2 is
diverted into two different circuits, one for each formant, and it
supplies a transducer 340 which indicates the presence of any kind
of vowel input whatever, described in detail below. The first
formant detector 312 receives the input voltage to a bank of 36
sequential centered filters of 20-cycles bandwidth in the range
from 240 to 960 Hz. The 36 filtered outputs are supplied over
individual conductors 314 to a comparator called a peak and centrum
discriminator 316, which reads the peak amplitude's values in
output 318 and determines whether the centrum lies within certain
particular bandwidth ranges: 400- 700 Hz. passing over conductor
321; 340- 460 Hz. passing over conductor 322; 260- 400 Hz. passing
over conductor 323; 240- 340 Hz. passing over conductor 324; 400-
700 Hz. passing over conductor 315; 400- 500 Hz. passing over
conductor 326; or 700- 940 Hz. passing over conductor 327.
Determination of the location of the centrum in these bandwidths is
used to help differentiate the vowels when the peak amplitude
ratios are compared. The method of differentiation is shown in the
chart of Table 4 (see Appendix), where amplitude ratios are derived
from the Peterson-Barney studies.
From the input signal for total oral signal strength on conductor
58, a comparator 342 also receives the first formant's peak
amplitude reading from conductor 318 and compares these, producing
a ratio output on conductor 344 for the first formant quotient.
This is supplied to a final comparator-ratiometer 346 associated
with the transducer 348 for the vowel output signals 311.
For the second formant process in scanner 328, the input signal is
passed through a pair of bandwidth filters, a filter 332 passing
720- 940 Hz. from oral input conductor 50a, and the other filter
350 passing 940-2,900 Hz. The voltage from filter 332 resulting at
below 940 is subject to close-off by a switch 330 which is
activated when the first formant's centrum shall have proved to be
located in that bandwidth as applied over conductor 327 because in
that case the second peak will formant above 940 Hz. The filtered
voltages from the paired filters 332,350 are supplied over
conductors 352,354 to the second formant scanner 328 which is shown
in detail in FIG. 7A. It produces an output of second formant peak
amplitude on conductor 358 and an indication when the second
formant's centrum was above 1,050 Hz. on conductor 356, both to be
described below. That indication on conductor 356 is supplied to
the vowel transducers 348 where it is needed for substantiative
discrimination between certain vowel phonemes as shown in Table
1.
A comparator-ratiometer 346 receives the second formant's peak
amplitude reading through connector 358 and also an input of the
total signal strength component on conductor 58. This ratio output
from ratiometer 346 is then supplied to a comparator 360 for
comparing the ratios of peak amplitude of the first and second
formants as quotients of the total signal strength. Preset ratios
according to Table 1, when met, will activate the vowel transducers
348 in combination with the inputs of information concerning first
formant bandwidth from conductors 321,322,323,324,325,326,327 from
the second formant scanner 328. The resulting transduced output
signals 311 for presence of the various vowels to the diphthong
transducer in FIG. 8 are on conductors 371-378, and to the
transcriber module 30 of FIG. 9 over conductors 379,380, the latter
for sounds which do not appear in diphthongs.
Whenever a vowel is detected by the vowel transducers 348, a signal
is emitted over conductor 339 and passed to the undifferentiated
vowel transducer 340 so that it will not put out a signal on
conductor 341 for an undifferentiated vowel. However, whenever
there is a vowel input through the vowel gate over conductor 50a
without identification of a specific vowel, the transducer 340 will
supply its output direct to the transcriber module 20 of FIG. 9
over conductor 341.
FIG. 7A is a detail of the second formant scanner 328 whose purpose
is to detect and measure the peak amplitude of the second formant
regardless of the frequency of its peak and to determine whether or
not that peak lies in the range of 1,050 to 2,900 Hz. The method
used is to heterodyne the incoming signal so that peak-measuring
voltmeters can be used to read the peak. Input voltages in the
range from 720 to 2,900 Hz. passing over conductor 354 are merged
on conductor 390 as they emanate from two filters passing 720-940
Hz. passing over conductor 332. The merged voltages are fed through
connector 390 to a mixer 392. The mixer 392 is activated from a
sweep oscillator 394 that linearly sweeps through the range from 10
kc. to 11.850 kc. The resulting signal on conductor 396 is passed
through two filters of 20-Hz. bandwidth each, on filter 398
centered on 8,950 Hz., and the other filter 399 on 9280 Hz. The
filtered signal outputs of 398,399 then are separately supplied to
peak-measuring voltmeters 400,402. These voltmeters are activated
in synchronization with the sweep oscillator 394 by means of an
ON-OFF signal applied over conductor 404.
In order to obtain the higher of the two peak amplitudes thus
detected in peak voltmeters 400,402 as an indication of the peak
amplitude of the second formant, the respective signals are
supplied to a third voltmeter 406 the output of which supplies that
information on conductor 358. That information from peak-voltmeter
406 is also supplied to a comparator 409, also receiving output
from voltmeter 400, comparator 408 measuring whether the second
formant peak was that peak which was detected by the filter 399
centered at 8,950 Hz. through connector 358; so that if the peak
thus was in the range 1,050-2,900 Hz., the comparator passes that
information as its output 356.
A diphthong transducer 420 of FIG. 1A and shown in detail in FIG.
8, processes single vowel outputs from the vowel detector 348 of
FIG. 7 to identify diphthongs when present, and it produces
distinctive electrical signals on conductors
421,422,423,424,425,426 for six such diphthongs. Since it also
relays the eight single-vowel signals 371-378 from the vowel
detector 348, it passes all vowel and diphthong output signals
detected by the transducer module 420 to the transcriber module
20.
The single-vowel signals 371-378 fed to the diphthong transducer
420 represent the basic simple vowel phonemes of American English
as continuous signals generally timed to last 0.2 seconds each
except for /U/ and /u/ which are transmitted in single, somewhat
shorter, pulses. The continuous signals allow retention of the
single-vowel signals during the time required to determine whether
or not they are used in a diphthong before releasing them. Since
/U/ and /u/ occur only terminally, such delay is not required for
them. Other inputs to the diphthong transducer 420 are a signal
from the vowel detector 310 in FIG. 7, through conductor 344 from
peak comparator 342 that gives the ratio of the first formant
signal strength to the total oral signal strength. That input (344)
is supplied to a memory unit 424 of 0.15 seconds which is activated
by the signal on conductor 426 from an OR circuit responsive to a
signal on any of 371-376 indicating the presence of any vowel input
except /U/ and /u/. The other input to the diphthong transducer 420
is the rate of change of oral signal on conductor 82 from FIG. 2.
It is supplied each to the memory unit 424 and to a comparator 430
of rates of change.
The presence or absence of a diphthong is determined in the
transducer 420 according to whether there is a steady rate of
change of signal strength when two vowels occur consecutively. The
memory 424 of 0.15 seconds determines the timing parameter for this
comparison in comparator 430; and if there is only a steady rate of
change, which characterizes a diphthong glide, the comparator 430
activates a ganged switch 434 which engages the diphthong circuits
by means of single-pole-double-throw switches
441,442,443,444,445,446,447,448. An unsteady rate of change from
comparator 430 indicates that no glide is present and the vowels
311 are intended to be separate. Consequently, unless the
comparator 430 is satisfied with a steady-state ratio, the switches
441-448 are normally closed (upwardly, as seen in FIG. 8) in a
state where the single, simple vowel signals will be transmitted
following the 0.15-second memory delay. Those output signals will
pass to the transcriber module 20 through connectors
451,452,453,454,455,456,457,458.
When the switch 434 is actuated (closed down, as seen in FIG. 8),
the following occurs. The /eh/ sound signal in conductor 373 is
applied to transducer 461 to combine with sounds /I/ or /i/ from
switch 446, so that a signal "A" is produced in conductor 421. /I/
and /i/ merge in conductor 470.
The /I/ or /i/ sounds from switch 446 also combine in transducer
462 with the signal / / from conductor 376 passing switch 442, to
produce /oi/ in conductor 422.
The / / sounds from switch 443, or / / in conductor 464 from switch
444, combine in transducer 463 with the signal /I/ or /i/ in
conductor 460 to produce "I" in conductor 423.
The / / sound from conductor 374 passing switch 444 merges with the
/ / signal through connector 474 as a dialectal substitution for /
/. / / or / / is applied to transducer 464 with the output from a
rate detector 475 to produce / / in conductor 424. The rate
detector 475 is responsive to the comparator 530 but detects that
the ratio change of the comparator increases in a period of 0.2
seconds.
The /I/ or /i/ sound signal in conductor 470 combines in transducer
465 with the merged /U/ or /u/ signals in conductor 476 to produce
an output for "U" in conductor 425.
The / / sound from switch 442 combines in transducer 466 with the
output from the rate detector 475 to produce /0/ on conductor
426.
The comparator 475 senses any increase within 0.2-second periods in
the ratio of the first formant peak amplitude to the total signal
amplitude in the interval between the beginning and end of a
diphthong. This is the interval involved in the action of the
memory unit 424. This comparator receives the ratio signal through
conductor 344 from the vowel detector of FIG. 7. Its timing is
activated by an output from the comparator 430. When there is an
increase of the first formant strength without a rapid rate of
change of oral signal, the comparator 475 passes an output 477 to
gates 464 and 466 to complete formation of diphthongs /ao/ and /0/
respectively.
The six diphthong output signals 421-426 supplement the eight
single-vowel signals 451-458 in providing the complete vowel and
diphthong indications from the transducer module 20 to the
transcriber module in FIG. 9.
The transcriber module 480 in FIG. 1, and shown in detail in FIG.
9, receives (in general) electrical signals representing the
detected sounds (39 in number) plus a signal representing the
occurrence of silence and processes these, converting them to 89
output signals to drive the typing or printing mechanism 26 of FIG.
1. FIG. 9 shows this transcriber module 480. It has six input
channels--one for the silence signal on conductor 98 from the
stops-or-silence detector 74, FIG. 3; one for the stress indication
via conductor 92 from the threshold setting 90 of FIG. 2; and four
channels representing the categories of speech sounds, i.e., (1)
fricatives from switches 240-246 which come from transducer outputs
220-228, 261 representing the fricative sounds both voiced and
unvoiced in FIG. 5. Since none of these signals will occur
simultaneously, they are merged in the fricative input channel; (2)
stops from conductors 164,166,168,190,196,200 which come from
outputs of transducers 156,160,162,169,180,186,194 representing the
various stops, voiced and unvoiced and undifferentiated in FIGS. 4
and 4A, also merged; (3) vowels from conductors 371-380, 341,
421-426 which come from the outputs of transducers 340,348
representing single vowels and diphthongs and undifferentiated
vowels, in FIGS. 7 and 8, also merged; and (4) nasals from
conductors 284,286,290,300 which come from the outputs of
transducers 282,286,290,300 representing the nasal sounds and /1/
and undifferentiated nasals in FIG. 6, likewise merged.
Input pulse dividers or time choppers 491,492,493,494 are placed in
each channel and divide the pulse or signal input into a series of
approximately equal periods, respectively, for each of the
channels. The time chopper for fricatives and vowels are set to
divide the applied signal into series of 0.2-second pulses, as long
as the signal is applied and, correspondingly, the time chopper for
stops and nasals is set to divide the applied signal into series of
0.1-second pulses for as long as the applied signal is present.
The transcriber module 480 is essentially a multiphase digital
computer having a direct set program. It has two outputs: a spacing
signal to indicate times between words on conductor 482, and a
channel 484 for 88 distinctive signals for printed characters.
The input signals are applied first to a phoneme sequence sensor
and designator 490 to which the silence signal 98 also is applied
since its presence will signal grouping of inputs to constitute
syllables. The phoneme sequence sensor and designator 490 contains
a storage of about 337 syllabic combinations of the four classes of
sounds represented by its four channels. Basic to English, these
syllabic combinations are shown in Table 1 (see Appendix). When a
possible syllable has been identified, its speech-element signals
are passed as inputs over conduit 492 to the regrouper and storage
means 494 of preselected actual syllables for that particular
combination of classes of sounds among the 337 syllabic
combinations. Consequently, sequence indication on conduit 496
accompanies passing of the speech-element signals to facilitate the
search in storage means 494. The storage of actual syllables is
combined with a unit for regrouping sequences if there is no match.
The maximum capacity of this unit is 10 phonemes or not more than 1
second, whichever is first. If, in a particular sequence there is
no match with a stored syllable, the sequence indication is altered
accordingly as the last phoneme of the nonviable sequence is
dropped from it and the shorter sequence then tried. If it then
fits, that last phoneme becomes the first one in a new sequence,
then the last two, then the last three, etc. This shortening and
consequent regrouping of phonemes for the syllable involves
identification of a different sequence. Therefore, that information
is passed back to the phoneme sequence sensor and designator 490
through conductor 498 so that it can start identification of a new
sequence with the terminal elements rejected from the previous
sequence. Regrouping of phonemes starts either with that signal on
conductor 498, with a silence input applied on conductor 98, or
upon receipt of a signal from conductor 482 fed to designator 490
that shows completion of a syllable or word ready for printout. The
pattern used for regrouping is shown in Tables 2(a b) (see
Appendix).
If the actual syllable identified is not one that is used in stored
words of the vocabulary of 12,000 or so words chosen, or if it
consists of only two or three elements (i.e., appears to be a
fragment or remnant), it is passed through conductor 500 to a
storage unit of two and three phoneme units 501 where it is either
matched with a stored short word or is converted to printout
signals in phonetic-phonemic form over conductor 532.
Upon identification of a syllabic sequence unit in designator 490
that could be part of a larger word in the vocabulary storage 516
of multisyllable words, the designation for that particular
syllable is signaled to the storage unit 494 of vocabulary words
through conductor 492 and, simultaneously, the phoneme sequence
units together are passed to a syllable retainer 510 through
conductor 512. The syllable retainer 510 has a storage capacity of
up to nine syllables which it holds either (1) until their release
as parts of a word that matches one in the vocabulary storage 516
through conductor 540, or (2) until the retainer 510 is saturated,
or (3) until there is a spacing input to indicate beginning of a
new verbal unit, as indicated on conductor 482. These latter two
releases through conductors 500 and 520 pass to the storage of two
and three phoneme units for short printouts. In the storage of two
and three phoneme units 501, sequences that do not match about
1,600 stored short words that will be printed conventionally over
conduit 532 will be printed in phonetic-phonemic manner, with
stress shown by uppercase or bold printing in response to a stress
signal on conductor 92 coming from FIG. 2.
An additional input to the storage of two and three phoneme units
501 is provision for supplying a period or dot whenever the silence
input reaches 1 second cumulatively. This is accomplished by a
timer 522 of one second operating on the silence signal 98 which
will automatically signal what appear to be sentence endings or
long pauses on line 524 to the storage of two and three phoneme
units 501. This same storage 501 will convert inputs designating
letters of the alphabet orally into the corresponding capital
letters for each.
The storage of word vocabulary 516 contains about 10,600 words that
are arranged according to syllable designations within the 337 of
this system, stored in the proper sequence. For each such stored
word, there is a corresponding coded printout signal which is
activated through the output 530 to the printing unit 26, whenever
there is an input of syllables that matches a stored word. In the
case of punctuation, the coding translates from the verbal name to
the punctuational printout designation. Upon release of a printout
signal for any word, the storage of word vocabulary unit 516 emits
a spacing signal through conductor 482 which goes both to the
syllable retainer 510 and to the phoneme sequence sensor and
designator 490 as a signal for start of a new verbal unit. This
spacing signal, of course, likewise goes to the typographic or
printing unit 26 at the end of each word. Since printout signals
will not emanate simultaneously from both the word vocabulary
storage 516 and the storage of two and three phoneme units 501,
their outputs 530 and 532 pass through the same connector 484 to
the printing unit 26.
Table 3 (see Appendix) shows 88 printout characters and phonemic
printout symbols (with phonemic equivalents in parenthesis) for use
in the printing unit of FIG. 1. The 88 symbols are those which will
be activated by 88 distinctive signals through connector 484. The
89th signal is for spacing which is applied over conductor 482.
Additional embodiments of the invention in this specification will
occur to others and therefore it is intended that the true spirit
of the invention be limited only by the appended claims and not by
the embodiment described hereinabove. Accordingly, reference should
be made to the following claims in determining the true spirit of
the invention. ##SPC1## ##SPC2## ##SPC3## ##SPC4## ##SPC5##
* * * * *