U.S. patent number 3,644,674 [Application Number 04/837,699] was granted by the patent office on 1972-02-22 for ambient noise suppressor.
This patent grant is currently assigned to Bell Telephone Laboratories, Incorporated. Invention is credited to Olga M. M. Mitchell, Carolyn Ross, Robert L. Wallace, Jr..
United States Patent |
3,644,674 |
Mitchell , et al. |
February 22, 1972 |
**Please see images for:
( Certificate of Correction ) ** |
AMBIENT NOISE SUPPRESSOR
Abstract
Signals from a desired source, such as a person speaking, are
enhanced relative to unwanted ambient sound by a speech processor
that includes an array of microphones arranged at equal distances
from the desired source. The unwanted sound, being "off-center,"
arrives nonconcurrently at the individual microphones. The
processor continuously arranges the instantaneous microphone
outputs in order of their relative energy contained, and selects as
its output some one of the microphone outputs that is intermediate
in the instantaneous ranking.
Inventors: |
Mitchell; Olga M. M. (Summit,
NJ), Ross; Carolyn (Berkeley Heights, NJ), Wallace, Jr.;
Robert L. (Warren Township, Somerset County, NJ) |
Assignee: |
Bell Telephone Laboratories,
Incorporated (Murray Hill, NJ)
|
Family
ID: |
25275171 |
Appl.
No.: |
04/837,699 |
Filed: |
June 30, 1969 |
Current U.S.
Class: |
379/392; 379/416;
381/92 |
Current CPC
Class: |
H04R
3/005 (20130101); H04M 9/001 (20130101); G10K
2210/1082 (20130101); G10K 2210/3045 (20130101); G10K
2210/111 (20130101); G10K 2210/108 (20130101); G10K
2210/3012 (20130101) |
Current International
Class: |
G10K
11/178 (20060101); H04M 9/00 (20060101); G10K
11/00 (20060101); H04R 3/00 (20060101); H04b
015/00 () |
Field of
Search: |
;179/1.8,1P
;325/476,475,474,473,472,304 ;324/77A,77E ;328/115-117 |
References Cited
[Referenced By]
U.S. Patent Documents
Primary Examiner: Claffy; Kathleen H.
Assistant Examiner: Leaheey; Jon Bradford
Claims
What is claimed is:
1. A circuit for enhancing a desired signal in the presence of an
undesired signal comprising:
first and second channels mutually arranged to receive
simultaneously said desired signal and to receive an undesired
signal nonconcurrently;
first means for deriving a signal representing the value of the
amplitude difference between the signals respectively present in
said first and second channels, thus canceling in the derived
signal the components of said desired signal while leaving
substantially intact the components of said undesired signal;
second means for full-wave rectifying the components of said
undesired signal; and
third means for adding the output of said second means to the
signals respectively present in said first and second channels, the
output of said third means consisting of the desired signal
substantially undistorted and the undesired signal half-wave
rectified.
2. A signal processor for enhancing a desired signal in the
presence of an undesired signal comprising:
first and second circuits each constructed in accordance with the
circuit in claim 1 and each having an output from said third means
in which the desired signal components are substantially equal in
amplitude;
means for deriving a signal representing the value of the amplitude
difference between the two said third means outputs, thus canceling
in the just-derived signal the components of desired signal while
leaving substantially intact the components of half-wave-rectified
undesired signal;
means for full-wave rectifying the half-wave-rectified components
of said undesired signal but in a sense opposite to that employed
with said second means; and
means for adding the output of said last-named full-wave rectifying
means to each of the third means outputs, the resulting sum
consisting of the desired signal substantially undistorted and the
undesired signal substantially eliminated.
3. The circuit in accordance with claim 1, wherein said undesired
signal is impulsive in nature.
4. A signal processor in accordance with claim 2, wherein said
undesired signal is impulse-type noise.
5. In a distant-talking two-way telephone system, apparatus for
increasing the signal-to-noise ratio comprising:
a first and a second microphone each separated from the same noise
source by unequal distances;
first means for placing in phase desired signals received from an
information source by each said microphone;
means for deriving a signal representative of the value of the
amplitude difference between the output signals of the respective
said microphones, the components of said desired signals combining
to substantially cancel each other, while the noise components
remain substantially intact;
means for full-wave rectifying the noise components; and
means for additionally combing said derived signal with the signals
received by said first and said second microphones, to produce an
output consisting of the desired signal substantially undistorted
and the noise signal half-wave rectified.
6. A system for processing speech, comprising:
first and second circuits each constructed in accordance with the
apparatus described in claim 5, each circuit having an output
consisting of the said desired signal substantially undistorted and
the noise
signal half-wave rectified;
means for deriving a signal representing the value of the amplitude
difference between said outputs of said first and second circuits,
thus canceling in the just-derived signal the components of said
desired signal while leaving substantially intact the noise
components;
means for full-wave rectifying the noise components but in a sense
opposite to the rectifying means of claim 5; and
means for adding the resulting full-wave-rectified signal to said
outputs of said first and second circuits, the sum consisting of
the undistorted desired signal and the noise components
substantially eliminated.
7. A signal processor for discriminating against the
instantaneously strongest of several unequal input signals in a
microphone array comprising:
a pair of first stages, each first stage connected to a pair of
microphonic inputs;
means for selecting as the instantaneous output of each said first
stage the greater of the stage's two instantaneous inputs;
a second stage including means connecting the first stage's
instantaneous outputs thereinto; and
means for selecting as the instantaneous output of said second
stage the lesser of its two said inputs, whereby the processor
output is never the strongest instantaneous input to said
microphone array.
Description
FIELD OF THE INVENTION
This invention pertains broadly to the field of signal processing
and in particular relates to signal discrimination techniques. As a
principal object, the invention seeks to improve the
intelligibility, at the receiving end of an electrical
communications path, of a desired signal which was acoustically
originated in the presence of noise.
BACKGROUND OF THE INVENTION
In telephony and elsewhere, numerous situations arise in which
desired acoustical signals require enhancement relative to some
unwanted signals. The desired speech signals in "hands-free"
telephony for example, are often generated in an environment that
includes other speech as well as typewriter clatter, chair scraping
and many other background noises impulsive in character. The
randomness of the noise source in relation to the relatively
stationary talker and microphone locations complicates the
problem's general solution, as does the reverberance of most
conference rooms and offices.
The present invention in one aspect is a scheme for canceling
impulse-type noise or, if the noise is of a continuous nature such
as speech, for rendering it relatively unintelligible. In the
latter respect, the invention draws uniquely upon the capability of
the human ear to disregard unintelligible signals.
SUMMARY OF THE INVENTION
In its broadest aspect, the invention in essence is that the
instantaneous outputs of a plurality of arrayed microphones, for
example four, are first correlated with respect to the signal from
a desired source and then arranged or ranked by a processor in an
algebraically ascending order of amplitude value. As its
instantaneous output, the processor, in accordance with one aspect
of the invention, selects some one of the inputs or some averaged
combination of inputs which at that instant occupies a desired
location in the referred-to ranking.
In one embodiment, the processor is arranged to select an output
which in the ranking is intermediate in amplitude value; but the
processor never selects either the algebraically greatest or
smallest output. The processor so programmed discriminates against
a signal which is stronger at one input than at the others.
In a specific case, the desired signal source may be a person
speaking from a location equidistant from each of the four
microphones, or "on-center." An offcenter impulse signal such as
the striking of a typewriter key, if of duration such that the
signal impinges sequentially on successive microphones without
overlap, will appear nonconcurrently at each microphone as a peak.
Since the signal from the desired source is the same at all of the
microphones, the microphone that at any instant contains a noise
component will be either algebraically the largest or the smallest
signal. But because the processor never selects as its output the
algebraically greatest input from the microphones, the output of
each successive microphone on which the impulse noise
instantaneously impinges is never selected as the output of the
processor.
The invention, its further objects, features and advantages are
further delineated in the more detailed descriptions which
follow.
THE DRAWING
FIG. 1 is a waveform diagram of the first stage process;
FIG. 2 is a functional block diagram of the first stage
process;
FIG. 3 is a schematic diagram of one microphone arrangement
practicing the invention;
FIG. 4 is a functional block diagram depicting the two-stage
process;
FIG. 5 is a waveform diagram of the two-stage process;
FIG. 6 is a functional block diagram depicting an equivalent
two-stage process; and
FIG. 7 is a diagram of one circuit for achieving the process of
FIG. 6.
DETAILED DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT
A principal aspect of the inventive signal enhancement process is
illustrated in the waveform diagram of FIG. 1, taken in conjunction
with the process block diagram of FIG. 2. The environment depicted
in FIG. 3 exemplifies the condition of a speech source or talker 1
speaking in the presence of a noise source 2 which, for example,
may be the impulselike clacking of a typewriter. The talker 1 is
located equidistant from each of a plurality of microphones M.sub.2
M.sub.2 M.sub.3 M.sub.4. His speech signals, denoted S, therefore
reach each microphone simultaneously. For unequal microphone
distances, an appropriate delay function such as delay 3 in FIG. 2
is employed to place the desired signals S in phase at all
microphones. The undesired signal is denoted N.sub.1, N.sub.2,
N.sub.3, or N.sub.4, in accordance with which microphone it
impinges on; and is located "off-center" or at unequal distances
from the microphones.
Consider first the processing (in what is hereinafter called the
first stage) by the microphone pair M.sub.1 M.sub.2 of a
simultaneous burst from speech source 1 and noise source 2, each
burst consisting of one cycle. As depicted in FIG. 1, and assuming
equal amplitudes, the signal S reaches microphones M.sub.1 and
M.sub.2 in phase or simultaneously. In the instance of this
illustration, the noise signal N.sub.1 and the noise signal N.sub.2
thereafter reach the respective microphones at different times.
In the first stage, in accordance with this aspect of the
invention, the absolute value of the difference between the M.sub.1
and M.sub.2 microphone outputs is added to the sum of the outputs
of microphones M.sub.1 and M.sub.2. This process is accomplished by
first subtracting the signals S+N.sub.1 and S+N.sub.2 in difference
amplifier 4, which cancels the desired signal S altogether. The
remaining signal, N.sub.1 -N.sub.2, is full-wave rectified in
rectifier 5, producing the absolute value of the difference signal,
or .vertline.N.sub.1 -N.sub.2 .vertline.. The latter then is added
in adder 6 to the microphone output signals S+N.sub.1 and S+N.sub.2
and attenuated by a factor of two, in attenuator 10a, yielding the
first stage output signal:
E.sub.A =S+1/2(N.sub.1 +N.sub.2 +.vertline.N.sub.1 -N.sub.2
.vertline.) (1)
For the special case in which signals N.sub.1 and N.sub.2 do not
overlap in time, the output of the first stage is S+N.sub.1
.sup.++N.sub.2 .sup.+(where N.sub.1 .sup.+and N.sub.2 .sup.+are the
positive-going portions of N.sub.1 and N.sub.2). This term is seen
to consist of the undistorted desired signal and the unwanted
signal half-wave rectified.
So long as the talker 1 remains substantially on-center, the first
stage output consists of undistorted desired speech and rectified
noise. If the offcenter signal is speech, the result of
rectification is to distort it greatly and thus render it
unintelligible. Rectified speech contains many more high harmonics
than unrectified speech; and accordingly use of a low-pass filter
(not shown) on the output signal of the first stage can improve the
signal-to-noise ratio.
The two-stage process employing the process steps described above
is, pursuant to the invention, used with four microphone outputs to
both attenuate and distort an unwanted signal while reproducing the
wanted speech signal undistorted. FIG. 3 shows the third and fourth
microphones M.sub.3 and M.sub.4, the outputs of which are processed
exactly as in the first stage fashion described with respect to the
outputs of microphones M.sub.1 and M.sub.2. FIG. 4 depicts the
block diagram of the two-stage process including first-stage
processor A and first-stage processor B. Delay functions 3a, 3b and
3c are used. FIG. 5 shows exemplary waveforms. The second-stage
process is identical in steps to the first stages except that the
sign of the full-wave rectifier is opposite to that of the first
stages.
Accordingly, the output S+N.sub.3 of microphone M.sub.3 and the
output S+N.sub.4 of microphone M.sub.4 are processed in first-stage
processor B, whose output is expressed:
E.sub.B =S+1/2(N.sub.3 +N.sub.4 +.vertline.N.sub.3 -N.sub.4
.vertline.) (2)
If signals N.sub.3 and N.sub.4 do not overlap in time, E.sub.B
=S+N.sub.3 .sup.++N.sub.4 .sup.+. This term is subtracted in
difference amplifier 7 from the signal S+N.sub.1 .sup.++N.sub.2
.sup.+, which is the output of first-stage processor A. This
operation cancels the wanted signal S. The resulting signal then is
full-wave rectified in rectifier 8 which yields the signal
-.vertline.N.sub.1 .sup.++N.sub.2 .sup.+=N.sub.3 .sup.+-N.sub.4
.vertline.. This latter term is the absolute value of the
difference between the outputs of the two first stages. The term
has a negative sign. To this term is added, in adder 9, the
aforementioned respective two outputs of the first-stage processes
A and B. This sum then is attenuated by a factor of two in
attenuator 10b, and the result is passed to the output of the
second stage.
It is seen in FIG. 5 that as a result of the foregoing processing,
cancellation of the unwanted noise signal N is achieved through
cancellation of its manifestation N.sub.1, N.sub.2, N.sub.3, and
N.sub.4. The cancellation is complete if, as in the example, the
unwanted signals from the first two stages do not overlap in time.
However, the unwanted signals can overlap the desired signal since
the initial subtractive operation of each stage results in
elimination of the wanted signal in any case.
The two-stage process just described was found to produce
undistorted speech output when the talker was on-center. For a
talker located off-center, his processed output was distorted as
well as attenuated, and because of the distortion the subjective
suppression of the unwanted off-center source was even greater than
that due to the amplitude attenuation alone.
A test was conducted to compare the suppression observed for speech
with that for noise. A static source was substituted for the noise
source N in FIG. 3. The static consisted of 0.5 ms. pulses of
negative polarity occurring randomly in time, with a repetition
frequency of approximately 50 s.sup.-.sup.1. Under these
conditions, the probability of overlap is only about 2 percent. The
peak amplitude used was about the same as that present in the
speech used. Since the signal was only of one polarity, only one
stage such as first stage A, of processing was required. The static
source was connected to a conventional acoustic delay line and
output taps at 1 and 4 ms. were connected to the two inputs of
first stage A. For negligible overlap, the static was entirely
eliminated at the output of the processor.
The large suppression observed for an unwanted speech signal
implies that overlap in time of the unwanted speech manifestations
N.sub.1, N.sub.2, N.sub.3, N.sub.4 is not a problem. Consider again
the output of the first stage of processing as in first stage A:
E.sub.A =S+1/2(N.sub.1 +N.sub.2 +N.sub.1 -N.sub.2). The
contribution to the output from the unwanted source is
N=1/2(N.sub.1 +N.sub.2 +.vertline.N.sub.1 -N.sub.2 .vertline.). For
nonoverlapping signals, N is positive. For overlapping signals,
there are four cases: N.sub.1 and N.sub.2 both positive; N.sub.1
positive and N.sub.2 negative; N.sub.2 negative and N.sub.1
positive; and N.sub.1 and N.sub.2 both negative. Inspection of the
expression for N shows that N must be positive if N.sub.1 and
N.sub.2 are both positive, and negative if N.sub.1 and N.sub.2 are
both negative. However, in the two cases where N.sub.1 and N.sub.2
are of opposite sign, .vertline.N.sub.1 -N.sub.2
.vertline.=.vertline.N.sub.1 .vertline.+.vertline.N.sub.2
.vertline.>N.sub.1 +N.sub.2. Hence N is positive in these two
cases. Thus even if the unwanted signal overlaps at the two inputs,
N has the same sign at the output as in the nonoverlapping case
about 75 percent of the time. This result, along with the fact that
speech is nonoverlapping to a large extent at the processor inputs,
particularly for voiced sounds, is believed to account for the
large observed suppression of the unwanted signal.
The foregoing discussion has assumed that the wanted signal S was
of equal amplitude at all of the inputs of the signal processor, so
that complete cancellation of the wanted signal would result from
the first subtraction in each stage. If the signals are not of the
same amplitude, part of the signal will be distorted by the
subsequent full-wave rectification, and this distortion will be
added to the output signal. This problem can be overcome by
adjustment of the gains in the four channels to make the signal
amplitudes the same.
The output of the processor of FIG. 4 seems at first to be a
complicated function of all four of the inputs S+N.sub.1,
S+N.sub.2, S+N.sub.3, and S+N.sub.4. It has been realized, however,
that if only the instantaneous response of the FIG. 4 two-stage
processor to instantaneous inputs are considered, then the
instantaneous processor output is always exactly equal to four
times some one of the inputs. The two-stage processor of FIG. 4, in
effect, looks at only one input at a time and ignores all the
others.
This result can be seen by inspecting the output of any one of the
stages in the process. The inputs to the two-stage processor in
terms of the signals received by microphones M.sub.1 M.sub.2
M.sub.3 M.sub.4 can be expressed:
S+N.sub.1 =E.sub.1 (3) S+N.sub.2 =E.sub.2 (4) S+N.sub.3 =E.sub.3
(5)
S+N.sub.4 =E.sub.4 (6)
The inputs to the first stage A are E.sub.1 and E.sub.2. The output
of this stage then is:
E.sub.A =1/2(E.sub.1 +E.sub.2 +.vertline.E.sub.1 -E.sub.2
.vertline.) (7)
It can be easily seen that E.sub.A is equal to the greater of its
two inputs E.sub.1 and E.sub.2. Similarly E.sub.B, the output of
the first stage B, is equal to the greater of its two inputs
E.sub.3 and E.sub.4.
The inputs to the second stage are E.sub.A and E.sub.B and the
output is:
E=1/2(E.sub.A+E.sub.B -.vertline.E.sub.A -E.sub.B .vertline.)
(8)
It is evident that E, the output of the second stage is equal to
the lesser of its two inputs E.sub.A and E.sub.B.
Each of the stages, then, selects as output either the maximum or
the minimum of its two inputs depending on whether the sign of the
rectifier used is positive or negative.
A study of Equations 7 and 8 will reveal that the input which the
processor selects as output at any particular time is the
algebraically lesser of two quantities. E.sub.A and E.sub.B, where
E.sub.A is the greater of the inputs E.sub.1 and E.sub.2 to
first-stage processer A and E.sub.B is the greater of the inputs
E.sub.3 and E.sub.4 to first-stage processor B.
Assuming that no two inputs are equal, if the two greatest inputs
occur in the same first stage processor, then the input selected is
the third greatest. Otherwise the selected input is the second
greatest. Significantly, neither the greatest nor the smallest
input is ever chosen. If the inputs are uncorrelated gaussian
signals of equal standard deviation, the two-stage processor
selects the greater of the "middle two" inputs two-thirds of the
time and the lesser of the two, one-third of the time.
The processor thus is effective in discriminating against a signal
which at the instant is much stronger at one microphone than at the
others. If the signal from a noise source is "off-center" as the
noise source 2 in FIG. 3, and is of sufficient magnitude when it
reaches a given one of the microphones to cause that microphone's
output to be instantaneously the greatest or least (algebraically)
of the four outputs, that microphone output is not chosen. Rather,
one of the "middle two" is chosen.
The FIG. 4 block diagram of the two-stage processor is functionally
equivalent to the simplified block diagram of FIG. 6 where E.sub.1,
E.sub.2 constitute one input pair and E.sub.3, E.sub.4 constitute
the second. The first and second steps are replaced by maximum and
minimum operations respectively. Again, the output E is given by
Equations 7 and 8 and is equal to the lesser of two quantities,
E.sub.A and E.sub.B, where E.sub.A is the greater of E.sub.1 and
E.sub.2 ; and E.sub.B is the greater of E.sub.3 and E.sub.4.
A circuit that will quite simply perform the desired two-stage
process is schematically illustrated in FIG. 7, which functionally
follows the FIG. 6 diagram. The circuit is basically three sets of
diode OR gates. Diodes D.sub.1 and D.sub.2 are connected
respectively to the inputs E.sub.1 and E.sub.2. Diodes D.sub.3 and
D.sub.4 are connected respectively to the inputs E.sub.3 and
E.sub.4. Forward current bias I.sub.1 is applied to the diodes
D.sub.1 D.sub.2 and D.sub.3 D.sub.4. The two diode gates are
arranged conventionally to pass only the greater of their
respective inputs. The latter, termed again E.sub.A and E.sub.B,
are respective inputs to the third gate consisting of diodes
D.sub.A and D.sub.B which are conventionally biased by current
sources I.sub.1 and I.sub.2 to pass the lesser of E.sub.A and
E.sub.B. The biasing current I.sub.1 .about. 2I.sub.2. The
instantaneous output E of the FIG. 7 processor is identical to the
output of FIG. 4 processor.
To recapitulate, the four-input, two-stage processor thus far
described completely eliminates impulsive noise provided that the
length of the impulse is short enough and that the location of the
noise source relative to the microphone array is such that only one
microphone at a time is excited. When the interfering noise is
speech from an offcenter location, the process both distorts and
attenuates it, thus greatly reducing its intelligibility. The
processing is also effective in discriminating against a source of
sound which is much closer to one of the four microphones than it
is to the others.
It has, however, been further realized that the described processor
is actually one member of a general class of processors. Each class
member performs the ranking of the instantaneous (algebraic) values
of the microphone signals in ascending order. However, the rule for
selecting as the processor output some one or some combination of
the instantaneous inputs, differs from species to species.
For example, in the processing scheme described by Equation 7, when
the "on-center" talker is silent, the processor still will select
either the second or third greatest of the four inputs. During
these periods it may be desirable for the processor to select the
microphone output that is closest to zero. The undesired noise or
talkers then would be at a minimum during pauses of the wanted
talker. This process has been found to distort very severely any
offcenter sound source and render offcenter speech entirely
unintelligible.
A further processor is envisioned which, instead of sometimes
selecting the second largest input and otherwise selecting the
third largest, always selects one or the other of these inputs.
A still further case is the processor which computes the average of
the "middle two" of four unequal inputs. Advantageously the median
of five inputs can be selected and a further improvement is
realized by averaging the "middle three" of five inputs.
In terms of eliminating impulse-type noise, an advantage is gained
by using the median of five inputs, since then an arbitrary amount
of noise can be added at any two microphones with no effect on the
output. If the median of seven microphone outputs is used, noise at
any three of the microphones is eliminated.
The processor of FIG. 4 can eliminate impulsive offcenter noise
provided the impulse does not embrace more than one-third of the
microphone array at any instant; and provided further than the duty
cycle of the impulsive noise is not more than one-fourth.
Processing of the median output of, for example, five microphones
allows a longer impulse and a longer duty cycle. It is obvious that
the inventive signal-enhancing techniques described are applicable
in any environment where a desired signal is sought to be received
in the presence of an undesired signal or of noise. Such fields of
application include underwater sound detection, mobile radio
telephony, medicoacoustics and others.
The spirit of the invention is embraced in the scope of the claims
to follow.
* * * * *