U.S. patent number 3,723,667 [Application Number 05/214,615] was granted by the patent office on 1973-03-27 for apparatus for speech compression.
This patent grant is currently assigned to PKM Corporation. Invention is credited to William C. Mortimore, John H. Park, Jr..
United States Patent |
3,723,667 |
Park, Jr. , et al. |
March 27, 1973 |
APPARATUS FOR SPEECH COMPRESSION
Abstract
Means for recording and selectively deleting portions of normal
speech sound which includes a recorder for receiving and recording
speech signals from an input, with a drive means being provided for
the recorder, and with a power supply being provided for the drive
means. A speech detector is coupled to the power supply for the
drive means and is arranged to energize the drive means only in
response to the presence of a speech signal in the input. A vowel
detector is provided and is coupled to the drive means power supply
for detecting the initiation and continuing presence of vowel
sounds in speech signals. The vowel detector is adapted to
regularly and periodically interrupt the drive means power supply
for certain predetermined time intervals in response to the
initiation and continued presence of vowel sounds in the input.
Inventors: |
Park, Jr.; John H. (St. Paul,
MN), Mortimore; William C. (Minneapolis, MN) |
Assignee: |
PKM Corporation (St. Paul,
MN)
|
Family
ID: |
22799773 |
Appl.
No.: |
05/214,615 |
Filed: |
January 3, 1972 |
Current U.S.
Class: |
369/47.55;
G9B/27.026; G9B/27.009; G9B/20.001; G9B/15.021; 704/E21.017; 360/8;
704/254; 369/60.01 |
Current CPC
Class: |
G10L
21/04 (20130101); G11B 27/22 (20130101); G11B
27/029 (20130101); G11B 15/18 (20130101); G11B
20/00007 (20130101) |
Current International
Class: |
G10L
21/04 (20060101); G11B 27/029 (20060101); G10L
21/00 (20060101); G11B 20/00 (20060101); G11B
27/22 (20060101); G11B 27/022 (20060101); G11B
27/19 (20060101); G11B 15/18 (20060101); G11b
019/20 (); H04b 001/66 () |
Field of
Search: |
;179/1.1VC,1VC,1SA,15.55R,15.55T |
References Cited
[Referenced By]
U.S. Patent Documents
Primary Examiner: Cardillo, Jr.; Raymond F.
Claims
We claim:
1. Means for recording and selectively deleting portions of normal
speech sound comprising:
a. input means, recording means for receiving and recording speech
signals from said input means, drive means for said recording
means, and a power supply delivering energy to said drive
means;
b. speech detector means coupled to said drive means power supply
for detecting the presence of a speech signal in said input means
and for energizing said drive means power supply only in response
to the presence of a speech signal therein;
c. vowel detector means coupled to said drive means power supply
for detecting the initiation and continuing presence of vowel
sounds in speech signals in said input means, said vowel detector
means being adapted to regularly and periodically interrupt said
drive means power supply for certain predetermined time intervals
in response to the initiation and continued presence of vowel
sounds in said input, with means being provided for periodically
chopping said power supply into a plurality of substantially
regularly spaced apart power pulses having predetermined time
duration, with said periodic chopping of drive means power supply
commencing after a certain predetermined time interval following
initial detection of vowel presence and continuing during the
presence of vowel sounds in said input.
2. The speech compression means as defined in claim 1 being
particularly characterized in that filter means are provided in the
speech input for passing signals of between about 250 Hz and 6,000
Hz.
3. The speech compression means as defined in claim 1 being
particularly characterized in that said periodic chopping of drive
means power supply provides for power pulses of about 60
milliseconds followed by an idle period of about 30
milliseconds.
4. The speech compression means as defined in claim 1 being
particularly characterized in that said recording means has a start
up time capability of less than about 10 milliseconds.
5. The speech compression means as defined in claim 1 being
particularly characterized in that said speech detector means
continues to energize said drive means power supply for a
predetermined period of time greater than approximately 10
milliseconds following the termination of each speech signal.
6. The recording means as defined in claim 1 being particularly
characterized in that filter means are provided for speech
detection, the filter being adapted to pass signals of modest
amplitude at frequencies less than about 1,000 Hz, with the
amplitude increasing substantially uniformly until an input
frequency of about 8,000 Hz is reached.
7. The recording means as defined in claim 6 being particularly
characterized in that said increase is at a level of about 24
db./octave at frequencies from between 1,000 Hz and 8,000 Hz.
8. The recording means as defined in claim 1 being particularly
characterized in that vowel detector means are provided in the
speech input for passing signals having a frequency of between
about 250 Hz and 1,200 Hz.
9. The speech compression means as defined in claim 1 being
particularly characterized in that control means are provided for
controllably adjusting the extent of compression.
10. Means for recording and selectively modifying portions of
normal speech sound comprising:
a. input means, recording means for receiving and recording speech
signals from said input means, drive means for said recording
means, and a power supply delivering energy to said drive
means;
b. speech detector means coupled to said drive means power supply
for detecting the presence of a speech signal in said input means
and for energizing said drive means power supply only in response
to the presence of a speech signal therein;
c. vowel detector means coupled to said drive means power supply
for detecting the initiation and continuing presence of vowel
sounds in speech signals in said input means, said vowel detector
means being adapted to regularly and periodically interrupt said
drive means power supply for certain predetermined time intervals
in response to the initiation and continued presence of vowel
sounds in said input, with means being provided for periodically
chopping said power supply into a plurality of substantially
regularly spaced apart power pulses having predetermined time
duration, with said periodic chopping of drive means power supply
commencing after a certain predetermined time interval following
initial detection of vowel presence and continuing during the
presence of vowel sounds in said input; and
d. means for selectively continuing the energization of said drive
means for predetermined periods of time upon detection of
termination of the presence of a speech signal in said speech
detector means.
11. The speech compression means as defined in claim 10 being
particularly characterized in that said recording means includes
first and second serially coupled recording means with drive means
for each of said recording means, means for continuing the
energization of said second recording means upon each occurrence of
the termination of the presence of a speech signal in said first
recording means.
Description
BACKGROUND OF THE INVENTION
The present invention relates generally to a means for recording
and compressing speech sound, and more particularly to a method and
apparatus for recording and selectively deleting pauses as well as
certain portions of normal speech sound from the recording. It has
been found that controlled and selective deletion of certain
portions of normal speech render the recorded message highly
intelligible, even when compressed to a time of less than one-half
of the actual speech.
Studies have indicated that the normal human ear and brain are
rarely, if ever, overtaxed when listening to human speech at normal
rates. Furthermore, studies have indicated that a normal listener
is able to understand and comprehend speech even when delivered at
a rate at least 3 times as rapid as natural speech. Accordingly, in
recording lectures, business memoranda, or the like, much time can
be saved by compressing the speech in terms of time, without
deleting any significant portions of the spoken words, or
detracting from the intelligibility.
In the past, speech compression has been accomplished by means of
systematic or periodic deletion of certain portions of the spoken
message. Such a device is described in an article by Fairbanks, et
al., "Method for Time or Frequency Compression-Expansion of
Speech," Transactions of the I.R.E., PG on Audio, Vol. AU-2, No. 1,
January-February, 1954, pages 7-11, and achieves a time compression
of the speech input by periodically discarding a fixed segment of
the input and bringing the ends of the retained input together to
make a continuous, time-shortened signal. If the length of the
retained segment is sufficiently long with respect to the
fundamental pitch period of the voice, then the voice will retain
most of its natural quality. The length of deleted segment must be
sufficiently long with respect to the retained segment so as to
effect the desired or required time compression, but not so long so
as to obscure the important transitional elements or consonants in
speech which are normally of short duration. Inasmuch as the
technique or practice of bringing the ends of the retained segments
together results in an apparent low-range frequency of the voice,
the input medium must either be played in at a faster than normal
rate, or the alternative, the output must be arranged to be played
back after processing at an increased rate. The device described by
Fairbanks et al. attains the necessary frequency shifting by
utilizing a rotating head assembly.
Other devices utilizing similar techniques may employ tapped delay
lines in which the input is provided from tapes which are being
sampled at a suitable rate to receive the desired shift and bring
the ends of the retained segments together.
Those speech compression devices which utilize systematic or
periodic deletion of input suffer from a number of disadvantages.
For example, those mechanical devices which utilize rotating head
assemblies require careful adjustment and maintenance, and are
considered complex and expensive. Mechanical delay lines, which
have been utilized in the past, are sensitive to mechanical shock.
Electronic type delay lines have also been utilized. Furthermore,
the extent of time compression which can be derived from systematic
deletion is limited to no less than about 60 percent of the
original time, since if additional compression is undertaken, the
portion retained is such that many of the transitional elements of
the sound are either blurted or deleted, thereby reducing
intelligibility.
The time compression which is obtained from systematic deletion is
frequently unnatural when compared with the normal human production
of rapid speech. Studies have shown that the normal speaker, when
attempting to speak more rapidly, will initially shorten the pauses
between phonemes by bringing spoken sounds up more closely together
without shortening the spoken sounds proportionally. It has been
further found that the shortening that does occur when the speaker
is attempting to speak at a more rapid rate takes place in the
voiced or vowel-like sounds. It is believed that the transitional
elements, particularly unvoiced consonants, cannot be appreciably
shortened in duration since manipulation of the vocal apparatus is
more intricate and involved for these sounds than for the longer
vowel sounds. Accordingly, rapid human speech is characterized by
shortened or minimal pauses along with shortened vowel-like sounds
in the speech. To remain reasonably intelligible, transitional
elements including the unvoiced consonants are shortened only very
slightly, if at all.
It follows, therefore, that there is no reasonable relationship
between the normal or natural reactions of a speaker attempting to
speak at a more rapid rate, and the technique of systematic
deletion. It is appreciated, of course, that systematic deletion
produces a result in which the pauses in the speech appear to be
unnaturally long, and the consonants unnaturally short, the
combined effect of which renders the compressed speech somewhat
unintelligible.
SUMMARY OF THE INVENTION
In order to carry out the method of time compression in accordance
with the present invention, a tape recording device is employed
utilizing the speech input from a microphone, phonograph, tape
recorder, or other similar structures which function in real-time,
with a time-compressed reproduction being produced which may be
played back on any standard playing apparatus. Essentially, the
structure includes a recording means for receiving and recording
speech signals from an input, with drive means being provided for
the recording means, and with a power supply being coupled to the
drive means. Selective deletion of portions of the speech sound is
accomplished by a substantial elimination of pauses, as well as a
means for eliminating periodic portions of vowel sounds. The
apparatus of the present invention permits compression of speech to
be undertaken to a substantial degree, with intelligible results
being obtained with a compression providing a resultant play-back
time of less than about 30 percent of the original speech time.
In addition to compression of speech, the apparatus of the present
invention provides a means for expanding recorded speech as well.
Prior techniques included the use of a slow play-back with a
resulting frequency shift, but changes in pitch make the speech
unintelligible if slow rates are employed. While systematic
repetition of short segments of recorded speech may be utilized to
preserve the pitch, the character of such a recording is diminished
because of apparent breaks in the speech provided at arbitrary
points. The apparatus of the present invention may function by
selectively inserting additional pauses where pauses will normally
occur, and thus allow play-back at the recorded time or greater,
resulting in minimal, if any, loss in intelligibility.
Therefore, it is a primary object of the present invention to
provide an improved speech compression apparatus which functions on
the elimination or drastic shortening of pauses, coupled with the
deletion of certain portions of vowel or vowel-like sounds.
It is a further object of the present invention to provide an
improved apparatus for modifying speech timing including selective
speech compression and expansion which is simple in construction,
rugged, and relatively inexpensive.
It is yet a further object of the present invention to provide an
improved speech compression apparatus which functions on the basis
of shortening or eliminating pauses, and deleting certain portions
of vowels or vowel-like sounds, this speech compression being
accomplished with very little loss of intelligibility.
Other and further objects of the present invention will become
apparent to those skilled in the art upon a study of the following
specification, appended claims, and accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram illustrating the fundamental components
utilized in a speech compressor apparatus prepared pursuant to the
present invention;
FIG. 2 is a characteristic plot of frequency versus relative
amplitude for the pre-filter structure utilized in connection with
the present invention;
FIG. 3 is a plot of the frequency versus the relative amplitude for
the spectrum shaping apparatus of the speech detector portion of
the apparatus;
FIG. 4 is a plot of the frequency versus relative amplitude for the
spectrum shaping for the vowel detector;
FIG. 5 is a schematic diagram of a speech detector system which may
be employed in the apparatus utilized to practice the present
invention, and capable of delivering a response curve similar to
that shown in FIG. 3;
FIG. 6 is a schematic diagram of the vowel detector which may be
utilized to achieve the resultant curve shown in FIG. 4;
FIG. 7 is a typical timing diagram showing how speech compression
is achieved through a combination of pause deletion and vowel
shortening;
FIG. 8 is a block diagram of a speech expansion structure which may
be utilized in connection with the apparatus of the present
invention;
FIG. 9 is a timing diagram showing how speech expansion is attained
through the expander system shown in FIG. 8;
FIG. 10 is a schematic diagram of a vowel chopper which may be
employed in connection with the present invention;
FIG. 11 is a schematic diagram illustrating the pause indicator
which may be utilized in connection with the apparatus of the
present invention;
FIG. 12 is a compression (or expansion) meter which may be utilized
in connection with the apparatus of the present invention, and
particularly for achieving an adjustable compression (or expansion)
with a visual indication of the extent of compression; and
FIG. 13 is a schematic diagram of a portion of the speech expander
concept illustrated in FIGS. 8 and 9.
DESCRIPTION OF THE PREFERRED EMBODIMENT
Attention is now directed to FIG. 1 of the drawings wherein the
speech compressor apparatus fabricated pursuant to the present
invention is illustrated in block diagram form. The system includes
an input 20 which delivers a speech signal to a preamplifier 21.
The preamplified signal then passes to the pre-filter 22, and
thence to a vowel detector system 23 and a speech detector 24. The
speech detector is, in turn, coupled to tape transport 25, so as to
interrupt flow of power to the tape transport upon occurrence of a
pause in the speech. The output of vowel detector 23 is delivered
to vowel chopper 26, and ultimately to tape transport 25 where the
power supply for the tape transport is controllably regulated by
vowel chopper 26.
As is indicated in the drawing of FIG. 1, the minimum pause to be
retained may be adjustably pre-set in the speech detector. Also,
the amount of vowel compression may be adjustably set in vowel
chopper 26. A pause indicator, either visual or audible, such as is
illustrated in FIG. 1 at 27 and 28 may also be employed if desired.
Also, a visual indication of the compression occurring in the
speech signal is provided as indicated at 29.
With continued attention being directed to FIG. 1 of the drawings,
the record electronic section 30 represents a bias oscillator,
record amplifier and record driver. The purpose of this portion of
the system is to supply the appropriate electrical signal to the
record and erase heads of a tape recorder, when a tape recorder is
being employed. Such electronic systems are well known in the
industry and are commercially available. The tape transport 25 is
indicated as having a fast start/stop capability. This transport
incorporates a read/write head, erase head, as well as drive means
for moving the tape across the heads. In addition, a power supply
is provided for the drive means, with the power supply being
actuated electrically for starting and stopping the tape. For a
structure to be fully compatible with the various objects and
methods utilized in the present invention, the tape start-up time
from full stop to full speed should be no greater than about 40
milliseconds for pause shortening operations, and no greater than
about 20 milliseconds for vowel shortening. Start-up times of about
30 milliseconds and 10 milliseconds respectively are preferred.
Furthermore, the stop time from full speed to full stop must be
substantially the same. Tape transports having such start/stop
capabilities are commercially available, and are widely used in the
electronic data processing industry.
As can be appreciated, an important feature of the present
invention is the generation of a control signal for the power
supply to control the drive means for the recording mechanisms. As
indicated, this signal is based on pause elimination and vowel
shortening.
As is indicated in FIG. 1, speech signals are recorded by way of
the tape transport whenever the control signal is on. Such a signal
exists whenever an appropriate voltage or current level is
available to place the transport in the operational mode. When a
speech signal is not present, the control signal will not be
present and the transport will not be moving the tape. When a
speech signal is detected and it is not a vowel sound, then the
transport is operative and tape is being carried across the record
head. When a speech signal is present and it is ascertained that it
is a vowel sound, then a first predetermined portion of the sound
is recorded, and thereafter the sound is recorded on a periodic,
cyclic, or "chopped" basis. For example, a vowel sound is recorded
for the first t.sub.1 seconds, and for the next t.sub.2 seconds,
the sound is not being recorded. Thereafter, if the vowel sound is
continuing, the next t.sub.1 seconds are recorded, followed by a
period of t.sub.2 seconds of no recording. This cycle continues
until the speech sound becomes a non-vowel in which case it is
fully recorded, or, in the alternative, until the speech signal is
no longer present in which case the power supply is interrupted and
the transport stops.
With continued attention being directed to FIG. 1, the input
signal, derived from a microphone, tape head, phonograph, radio or
other transducer providing an electrical signal representing the
speech sound. This signal is initially amplified in the
preamplifier 21 to bring it up to standard levels, such as, for
example, a peak at O-VU at the record head. In order to reduce
noise and other unwanted signals which have a frequency spectrum
falling outside of the voice spectrum, the signal is preferably
filtered. It has been found that the filter utilized should have
the characteristics shown in the diagram of FIG. 2, with
frequencies below about 250 Hz being reduced to eliminate hum and
rumble, and to insure that the envelope detector does not follow
the natural pitch-period resonance of certain speakers.
Furthermore, frequencies substantially above approximately 6,000 Hz
are reduced or eliminated in order to minimize the effects of hiss
and background room noise. This filtered signal is then passed into
the vowel detector and speech detector, as indicated.
Attention is now directed to FIG. 5 of the drawings wherein a
typical speech detector system is illustrated. This detector
includes components for accomplishing three basic functions,
spectrum shaping, envelope detection, and threshold detection.
Spectrum shaping is necessary in order that low energy speech
sounds necessary for good intelligibility, are weighted the same as
the high energy vowel sounds. The weighting shown in FIG. 3 of the
drawings has been found to provide a nearly flat spectrum at the
output of the spectrum shaping circuit for most speakers. After
spectrum shaping, the resulting signal is detected as indicated.
Capacitor 35 charges rapidly when speech energy is present, and
when the voltage reaches a threshold (about 2 volts for the circuit
shown), the output signal goes to a logical level indicating speech
being present. When a pause occurs, transistor 36 is turned off,
and the charge on capacitor 35 discharges through variable resistor
37. When the voltage falls below a second threshold, in this case
about 0.7 volts, the output signal immediately drops to a level
indicating speech being absent. The circuit indicates that the time
to reach this threshold determines the length of the pause that is
retained, and accordingly adjustment of variable resistor 37 may be
utilized to control this time. In the circuit illustrated in FIG.
5, it is simple to utilize times as short as 10 milliseconds or
less, or utilize times as long as 10's of seconds or even longer.
When a signal is again present, capacitor 35 charges and an output
is indicated.
Attention is now directed to FIG. 6 of the drawings wherein the
schematic illustration of the vowel detector is shown. It is known
that vowel sounds have their primary energy (first formants)
between about 250 and 800 Hz. Most consonants have their primary
energy in frequencies above approximately 1,000 Hz. Accordingly,
voice signals are filtered by the vowel spectrum selector, the
circuit shown in FIG. 6. This filter has the characteristic as is
indicated in FIG. 4, and provides energy in the area of between
about 250 Hz and 800 Hz. The output of this filter will provide
consonant sounds having voltage levels that are 30 db or lower in
intensity than vowel sounds. The envelope detector and threshold
device operate similarly to the speech detector discussed above,
with one important difference being that when a vowel sound ends,
the circuit is designed so that the no-vowel level appears at the
output within less than about 20 milliseconds delay. It is, of
course, necessary to retain a portion of the vowel sound, hence the
output of the vowel detector goes to the vowel chopper shown in
FIG. 10. The purpose of this circuit of FIG. 10 is to produce an
output level for the power supply to the drive means for a period
of t.sub.1 seconds, and interrupt this power for the next
succeeding t.sub.2 seconds alternately as is illustrated in FIG. 7
until the vowel sounds terminate. When the vowel sounds terminate,
the output again returns to a level indicating no vowels present.
This function insures that consonants occurring immediately after a
vowel sound are not lost. The system illustrated in FIG. 10
consists of two one-shot multivibrators and several logic gates.
The time constant R.sub.1 C.sub.1 in the first one-shot
multivibrator determines the time period for t.sub.2, and the time
constant R.sub.2 C.sub.2 in the second one-shot multivibrator
determines the time period t.sub.1. The percent of the vowel sound
that is deleted is, of course, equivalent to t.sub.2 /(t.sub.1 +
t.sub.2).times. 100. The time t.sub.1 should be chosen to contain
at least several cycles of the lowest resonant voice sound
anticipated for the device, this frequency typically being on the
order of 100 Hz, and accordingly having a period of 10
milliseconds. Hence t.sub.1 should be at least about 30
milliseconds. On the other hand, t.sub.1 should be smaller than the
shortest vowel sounds so some shortening will, in fact, occur. In
general, vowel sounds are seldom shorter than about 80 milliseconds
for most speakers. Thus, the time t.sub.2 is selected in
conjunction with t.sub.1 in order to obtain the desired vowel
shortening. With t.sub.1 equal to 60 milliseconds, and t.sub.2
equal to 30 milliseconds, good voice quality is readily maintained.
Increased shortening may be obtained by increasing t.sub.2 or
decreasing t.sub.1 within the bounds discussed hereinabove. The
input to the vowel detector is combined with the resulting chopping
wave in a NAND circuit as shown so that the output of the circuit
will be on when a vowel sound is absent.
With attention again being directed to FIG. 1, it is noted that the
output of the vowel chopper and the speech detector are arranged in
AND configuration to form the control signal to the drive means
power supply. This is illustrated in FIG. 7. Thus, the control
signal is off whenever speech is absent or during the time t.sub.2
when vowels are present in the speech signal. This control signal
activates the tape recorder so that the signal derived from the
vowel detector and its chopper element, together with the speech
detector, are utilized to activate the recorder and interrupt or
stop the recorder as appropriate. It will be appreciated that any
style of recorder may be utilized, including magnetic tape, wire,
disc, or the like, the primary requirement being that it have a
capability of starting and stopping rapidly, as indicated
hereinabove. The signal level to the system is set at the
preamplifier 21, as indicated, so that the recorder peaks are
approximately at O-VU, as a standard practice. The preamplifier is,
of course, a standard type structure which is commercially
available. The level set into the controller determines the signal
levels that will activate the speech and vowel detectors. Thus,
when the background noise is of low volume (40 db below O-VU), this
level can be set so that signals as low as 30 db below O-VU
activate the speech and vowel detectors. When the background noise
increases so as to achieve a level of about 20 db below O-VU, this
level must be set so that such noise does not trigger the speech
and vowel detectors, for example, the arrangement being such that
only signals at 15 db below O-VU or greater will trigger the speech
and vowel detectors.
In order to facilitate the setting of the level control and the
pause length control, it is, of course, desirable to have visual
and audible signals to indicate times when the speech detector
output is off. A technique to accomplish such an arrangement is
illustrated in FIG. 11. As can be seen from the schematic of FIG.
11, the light driver is activated to light a lamp when the speech
indicator is off. Also, an audible tone may be generated utilizing
the oscillator as illustrated. It will be appreciated that any form
of oscillator will suffice for generating an audible tone. When no
signal is present at the output of the speech detector, the
oscillator will be activated so as to generate the audible tone at
this time. The resulting tone is available by way of a speaker or
head phone to the operator. This arrangement is illustrated in FIG.
1, wherein this indicated function is added to the incoming voice
signal, and hence played through the monitor speaker or head phone.
At this time, the operator may simultaneously monitor what is being
recorded and the indication of what portions are to be deleted due
to the function of the speech detector and its affect on the power
supply for the driver means.
Another feature of the present invention is the use of the
structure for speech expansion. FIG. 9 illustrates a timing diagram
showing the method of approach for speech expansion. The speech
signal which is being played from a recorded medium is monitored
utilizing the speech detector, and when speech is absent, as
indicated by the detector, a control signal is generated which
stops the play-back of the recorded signal for a period of time
t.sub.3, whereupon play-back resumes. The play-back continues until
the speech detector goes from a speech indication level to a speech
absence level, whereupon the process is repeated. One method of
realizing this method of speech expansion pursuant to the present
invention is shown in the block diagram of FIG. 8. In this
embodiment, a tape transport is in the play-back mode and the
signal to be expanded is recorded on a magnetic tape. The tape head
picks up the recorded speech signal and on the one hand it is
passed through the usual play-back electronics, and presented to
the listener by way of a speaker or head phone. On the other hand,
it is also played into the speech detector described in detail
hereinabove, whereupon the output of the speech detector is on when
speech is present, and off when speech is absent. When this output
signal ceases, a one-shot multivibrator is triggered which produces
a control signal. Normally the output of this one-shot
multivibrator indicates that the transport is in the operational
mode. When the speech detector output goes from an indication of
speech to no-speech, the one-shot multivibrator is triggered and
the control signal is lost, with the transport stopping for a
period of t.sub.3 seconds, after which the transport resumes normal
play-back until the speech detector output again falls to a
no-speech level, whereupon the process is repeated.
A possible method of generating the interval of time of t.sub.3
seconds is shown in FIG. 13. Two methods are provided for adjusting
the amount of expansion. The first of these is by changing the time
constant R.sub.1 C.sub.1 in FIG. 13, thus changing the time
t.sub.3. It is appreciated that with this circuit, one is able to
vary t.sub.3 from as low as about 20 milliseconds to as long as
several seconds or more. Of course, the longer t.sub.3, the more
the speech is expanded. The second method of varying the amount of
expansion is simply by adjusting the minimum pause before the
speech detector indicates a condition of no speech. This is
accomplished by adjusting R.sub.1 C.sub.1 of FIG. 5. If this time
constant is sufficiently long, then short pauses will not be
detected and hence not expanded, and the amount of expansion will
be decreased. If even the very shortest pauses are detected
(R.sub.1 C.sub.1 of FIG. 5) will be very small, and in this case
there will be a greater amount of expansion.
As has been indicated, the drive means and power supply for the
recording means are standard and conventional in the art.
Obviously, battery or AC driven units may be employed. The pause
elimination and vowel shortening occurs by means of controlling the
current flow from the power supply to the drive means.
* * * * *