U.S. patent application number 11/348434 was filed with the patent office on 2006-07-06 for methods and apparatus for echo cancellation using an adaptive lattice based non-linear processor.
Invention is credited to John O. JR. DellaMorte, Oguz Tanrikulu.
Application Number | 20060149542 11/348434 |
Document ID | / |
Family ID | 26819465 |
Filed Date | 2006-07-06 |
United States Patent
Application |
20060149542 |
Kind Code |
A1 |
Tanrikulu; Oguz ; et
al. |
July 6, 2006 |
Methods and apparatus for echo cancellation using an adaptive
lattice based non-linear processor
Abstract
A non-linear processor for managing a speech signal is
disclosed. The non-linear processor includes an adaptive lattice
filter that generates a prediction error from a residual echo of
the speech signal, a signum non-linearity unit that generates an
excitation component from the prediction error, an excitation
signal mixer that generates an excitation signal from the
excitation component and a noise component, an inverse lattice
filter that generates a modified residual echo from the excitation
signal, and a gain adjuster that adjusts a power of the modified
residual echo.
Inventors: |
Tanrikulu; Oguz; (Wellesley,
MA) ; DellaMorte; John O. JR.; (Sandwich,
MA) |
Correspondence
Address: |
TELLABS OPERATIONS, INC.
LEGAL DEPARTMENT
1415 WEST DIEHL ROAD
NAPERVILLE
IL
60563
US
|
Family ID: |
26819465 |
Appl. No.: |
11/348434 |
Filed: |
February 6, 2006 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10121437 |
Apr 11, 2002 |
7050545 |
|
|
11348434 |
Feb 6, 2006 |
|
|
|
60283321 |
Apr 12, 2001 |
|
|
|
Current U.S.
Class: |
704/233 |
Current CPC
Class: |
H04M 9/082 20130101 |
Class at
Publication: |
704/233 |
International
Class: |
G10L 15/20 20060101
G10L015/20 |
Claims
1. A method for managing a speech signal, comprising: generating an
excitation component for the speech signal; spectrally shaping a
noise component for the speech signal; and mixing the excitation
component with the noise component.
2. The method of claim 1, further comprising generating a
prediction error signal from the speech signal.
3. The method of claim 2, wherein generating the prediction error
signal comprises passing the speech signal through an adaptive
lattice filter.
4. The method of claim 3, wherein passing the speech signal through
the adaptive lattice filter comprises employing a gradient adaptive
lattice algorithm.
5. The method of claim 1, further comprising adjusting a power of
the speech signal.
6. The method of claim 5, wherein the power of the speech signal is
adjusted according to a background noise power measurement of the
speech signal.
7. The method of claim 1, wherein spectrally shaping the noise
component comprises shaping the noise component so that a spectral
envelope of the noise component tracks a background noise
signal.
8. The method of claim 1, wherein the noise component is a white
noise source.
9. The method of claim 1, wherein mixing the excitation component
with the noise component further comprises mixing the excitation
component with the noise component at proportions based on an echo
return loss enhancement measurement.
10. The method of claim 1, wherein mixing the excitation component
with the noise component further comprises mixing the excitation
component with the noise component at proportions based on
double-talk detection.
11. The method of claim 1, wherein mixing the excitation component
with the noise component further comprises mixing the excitation
component with the noise component at proportions based on
voice-activity detection.
12. An apparatus for managing a speech signal, comprising: a
double-talk detector that detects double-talk on the speech signal;
a non-linear processor that center-clips a residual echo from the
speech signal in response to a presence of double-talk on the
speech signal; an inverse filter that filters a noise source whose
spectral envelope tracks that of a background noise signal to make
filtered noise; and a noise injector that injects the filtered
noise onto the speech signal.
13. The method of claim 12, wherein the inverse filter is an
inverse lattice filter.
14. The apparatus of claim 12, further comprising a gain adjuster
that adjusts a power of the speech signal.
15. The apparatus of claim 14, wherein the gain adjuster adjusts
the power of the speech signal according to a background noise
power measurement of the speech signal.
16. The apparatus of claim 12, wherein the noise source is
stationary noise source.
17. The apparatus of claim 12, wherein the noise source is white
noise.
18. A computer-readable medium having stored thereon sequences of
instructions, the sequences of instructions including instructions
which, when executed by a processor, causes the processor to
perform: detecting presence of double-talk on a speech signal;
center-clipping a residual echo from the speech signal; filtering a
noise source whose spectral envelope tracks that of a background
noise signal to make filtered noise; and injecting the filtered
noise onto the speech signal.
19. The computer-readable medium of claim 18, further comprising
adjusting a power of the speech signal.
20. The computer-readable medium of claim 19, wherein the power is
adjusted according to a background noise power measurement of the
speech signal.
Description
RELATED APPLICATIONS
[0001] This patent application is a continuation of and claims
priority to the co-pending non-provisional patent application
having the assigned Ser. No. 10/121,437, filed on Apr. 11, 2002,
which claims priority to the provisional patent application having
the assigned Ser. No. 60/283,321 filed on Apr. 12, 2001, entitled
"BETTER SIGNAL TRANSPARENCY THROUGH ADAPTIVE LATTICE BASED
NON-LINEAR PROCESSOR (AL-NLP) FOR ECHO CANCELLATION".
FIELD OF THE INVENTION
[0002] The present invention relates to the field of speech signal
management. More specifically, the present invention relates to
echo cancellation in speech signal management.
BACKGROUND OF THE INVENTION
[0003] Adaptive methods such as adaptive filters are employed in
echo-cancellers because adaptive methods can adjust their
algorithms according to fluctuations in incoming speech signals.
However, adaptive methods employed in echo-cancellers (ECs) may
have problems removing echoes of speech signals completely due to
the fundamental limitations related with convergence speed and
steady-state performance. This is often true for normalized least
mean square (NLMS) based methods that are used in the industry. To
compensate for these problems, non-linear processors (NLPs) may
also be included in an adaptive method to further process a
residual echo signal.
[0004] NLPs use a non-linearity such as a center-clipper shown in
FIG. 1 to remove a residual echo signal. However, in removing a
residual echo signal, too much energy may be suppressed. For
example in FIG. 1, a center-clipper may remove the input signal
that falls into hatched region 101. To replace some of the
suppressed energy, different approaches may be taken. One approach
is to use a noise injection scheme to replace the suppressed
energy. This may involve simple injection of spectrally shaped or
spectrally matched noise.
[0005] While schemes based on the above ideas may be implemented in
practice and are improvements of prior approaches, their
performances leave room for improvement with the increased demand
for voice quality. FIG. 2 illustrates one of these schemes. FIG. 2
shows an echo canceller (EC) that includes an adaptive filter 201
and a non-linear processor (NLP) 202. Consider the EC operating in
a condition where the send-in (Sin) port at line 203 of the EC has
a low-level coherent background signal such as music, people
chatting, etc. The portions of the signal that have this background
signal typically pass through the NLP practically unaltered.
However, when there is echo and background signals simultaneously
present in the Sin port, the NLP while removing the residual echo
may also remove the background signal. Because the noise injection
scheme in the NLP may not completely restore the integrity of the
background signal to an acceptable level, a non-consistent
background may be produced.
[0006] Because many linear processing methods do not completely
separate the residual echo from the background signal, the
objective of further removing the residual echo while preserving
the integrity of the background signal creates opportunities to
improve these methods. Thus a method and apparatus for more
effectively managing speech signals that may include both a
residual echo and a background signal is needed.
SUMMARY OF THE INVENTION
[0007] A non-linear processor for managing a speech signal is
disclosed. The non-linear processor includes an adaptive lattice
filter that generates a prediction error from a residual echo of
the speech signal, a signum non-linearity unit that generates an
excitation component from the prediction error, an excitation
signal mixer that generates an excitation signal from the
excitation component and a noise component, an inverse lattice
filter that generates a modified residual echo from the excitation
signal, and a gain adjuster that adjusts a power of the modified
residual echo.
[0008] A method of processing a residual echo of a speech signal is
disclosed. The method includes generating a prediction error from
the residual echo, generating an excitation signal from the
prediction error and a noise component, and generating a modified
residual echo from the excitation signal.
[0009] A method of processing a speech single is disclosed. The
method includes center-clipping a residual echo from the speech
signal in response to detecting an absence of double-talk on the
speech signal, injecting onto the speech signal white noise that
matches a spectral envelope of a background signal of the speech
signal, and adjusting a power of the speech signal.
DESCRIPTION OF THE DRAWINGS
[0010] The present invention is illustrated by way of example and
not by way of limitation in the figures of the accompanying
drawings, in which like references indicate similar elements and in
which:
[0011] FIG. 1 is a graph illustrating center-clipping
non-linearity;
[0012] FIG. 2 is a block diagram of an echo canceller (EC) that
includes an adaptive filter and a non-linear processor (NLP);
[0013] FIG. 3 is a block diagram illustrating whitening of a speech
signal s(n), according to an exemplary embodiment of the present
invention;
[0014] FIG. 4 is a bock diagram of an adaptive lattice filter
according to an exemplary embodiment of the present invention;
[0015] FIG. 5 is a block diagram illustrating reconstruction of a
speech signal s(n), according to an exemplary embodiment of the
present invention;
[0016] FIG. 6 is a block diagram illustrating reconstruction of a
speech signal s(n) using a signum non-linearity unit, according to
an exemplary embodiment of the present invention;
[0017] FIG. 7 is a block diagram illustrating reconstruction and
modification of a speech signal s(n) after it has passed through a
signum non-linearity unit, according to an exemplary embodiment of
the present invention;
[0018] FIG. 8 is a block diagram implementing an exemplary
embodiment of the present invention;
[0019] FIG. 9 is a flow chart illustrating a method for managing
speech signals, according to an exemplary embodiment of the present
invention.
DETAILED DESCRIPTION
[0020] FIG. 3 illustrates a concept of an exemplary embodiment of
the present invention. It is a block diagram showing whitening of a
speech signal s(n). In FIG. 3, input speech signal s(n) at line 301
is filtered by filter L(z) 302 and s'(n) at line 303 is a
prediction error signal which is essentially whiter than the input.
The input speech signal s(n) can be described in the following
equation: s(n)=r.sub.r(n)+w(n) where r.sub.e(n) is residual echo
after cancellation and w(n) is the correlated near-end background
noise signal and s'(n) is the sum of the two. The optimal filter
L(z) is computed during portions of background noise and the
prediction error signal s'(n) is not completely white in general,
and how much the prediction error signal is whitened depends on the
order of L(z) used. An optimal filter L(z) essentially models the
spectral envelope of its input signal.
[0021] In obtaining the optimal coefficients for a filter L(z), an
exemplary embodiment of the present invention employs the Gradient
Adaptive Lattice (GAL) method, known to one ordinarily skilled in
the art and described in S. Haykin, Adaptive Filter Theory,
Prentice-Hall, 1991, to adapt the lattice coefficients. This method
minimizes the prediction error associated with an adaptive lattice
filter based on the method of stochastic gradient descent method.
An exemplary embodiment adaptive lattice filter L(z) is shown in
FIG. 4. The update equations governing the exemplary embodiment GAL
method are: f m .function. ( n ) = .times. f m - 1 .function. ( n )
+ k m .function. ( n ) .times. b m - 1 .function. ( n - 1 ) .sigma.
m - 1 2 .function. ( n ) = .times. .lamda. .times. .times. .sigma.
m - 1 2 .function. ( n - 1 ) + f m - 1 .function. ( n ) 2 k m
.function. ( n + 1 ) = .times. k m .function. ( n ) - .mu. .sigma.
m - 1 2 .function. ( n ) .times. b m - 1 .function. ( n - 1 )
.times. f m .function. ( n ) b m .function. ( n ) = .times. b m - 1
.function. ( n - 1 ) + k m .function. ( n ) .times. f m - 1
.function. ( n ) ##EQU1## where f.sub.m(n) and b.sub.m(n) are
respectively the forward and the backward prediction errors
associated with the m.sup.th-stage. Note that,
f.sub.0(n)=b.sub.0(n)=s(n), 0<.lamda.<1 and the lattice
coefficients |k.sub.m(n)|<1, m=0, . . . , P-1. This method is a
version of the GAL method in the sense that the cost function
minimized is the forward prediction error power only.
[0022] FIG. 5 illustrates the concept that if filter L(z) is a
linear filter with all of its zeros constrained inside the unit
circle, the signal s(n) could be synthesized back. In FIG. 5, a
prediction error signal s'(n) at line 501 is passed through an
inverse filter 1/L(z) 502 to reconstruct a speech signal s(n) at
line 503. However, the filtering illustrated by FIG. 5 is not a
final objective, since it could synthesize back the entire residual
echo as well as the background signal.
[0023] One way to solve this problem of synthesizing back the
entire residual echo along with the background signal is to use
white noise instead of prediction error signal s'(n) to drive an
inverse filter 1/L(z). As a result, the output signal will not
likely sound like the true background signal because the
correlation that cannot be whitened by a filter L(z) is still
present in prediction error signal s'(n). A white excitation signal
does not carry any correlation and a signal with similar
correlation properties is needed as an excitation component to an
inverse filter 1/L(z).
[0024] However, constructing a signal with similar correlation
properties as the prediction error signal s'(n) is computationally
intensive. The present invention provides another alternative in
which the prediction error signal s'(n) is passed through a signum
non-linearity unit. Embodiments of the present invention may use a
specific non-linearity such as a "signum function" or a variant
thereof. An exemplary embodiment of the present invention may
define a signum function as: s e ' .function. ( n ) = { 1 s '
.function. ( n ) .gtoreq. 0 - 1 s ' .function. ( n ) < 0
##EQU2## Thus, as illustrated in FIG. 6, after the prediction error
signal s'(n) at line 601 is passed through a signum non-linearity
unit 602, the output signal s'.sub.e(n) at line 603 becomes the
excitation component that is in turn passed through an inverse
filter 1/L(z). By using a signum non-linearity unit, the amplitude
of the residual echo in the prediction error signal s'(n) can be
essentially forced to be at the background level. One ordinarily
skilled in the art will understand how the residual echo in the
prediction error signal s'(n) can be essentially forced to be at
the background level. This decreases the intelligibility of the
residual echo after the signal is reconstructed or
re-synthesized.
[0025] FIG. 7 is a block diagram illustrating an exemplary
embodiment of the steps reconstructing and modifying a speech
signal s(n) after it passes through a signum non-linearity unit.
First, at line 701, excitation component s'.sub.e(n) is passed
through an excitation signal mixer 702. At the excitation signal
mixer, inputs 703, such as an echo return loss enhancement (ERLE)
measurement of the EC, double-talk detection from a double-talk
detector (DTD), and voice activity detection from a voice activity
detector (VAD) are used to determine how to mix an excitation
component s'.sub.e(n) with a noise component 704 and generate an
excitation signal. One ordinarily skilled in the art will
understand how an ERLE measurement of the EC, double-talk detection
from a double-talk detector (DTD), and voice activity detection
from a voice activity detector (VAD) are used to determine how to
mix an excitation component s'.sub.e(n) with a noise component 704
and generate an excitation signal. Next, the excitation signal is
passed through an inverse filter 1/L(z) 705 to generate a modified
residual echo. Finally, the modified residual echo is passed
through a gain adjuster 706 that adjusts a power of the modified
residual echo according to a background noise power measurement
707. One ordinarily skilled in the art will understand how the gain
adjuster adjusts a power of the modified residual echo according to
a background noise power measurement. The end result is the
reconstructed signal {tilde over (s)}(n) at line 708.
[0026] As illustrated in FIG. 7, an exemplary embodiment of the
present invention may employ an excitation signal mixer 702 to help
manage the residual echo in the excitation component s'.sub.e(n)
that is produced by a signum non-linearity unit. Even though a
signum non-linearity unit will remove almost the entire echo, some
audible residual may remain under certain conditions. Therefore,
the mixing performed by an excitation signal mixer may help mask
components under noise. During initial convergence of an EC, the
residual echo is much higher than the background signal. Under this
condition, a signum non-linearity unit will mainly capture the
harmonic properties due to the residual echo and when an inverse
lattice filter is applied the synthesized signal may have coherent
portions of residual echo. To prevent this from happening, the
excitation signal mixer computes optimal mixing by adding a noise
component such as a white noise source 704 (that is first passed
through an inverse lattice filter to shape its spectrum) to the
excitation component s'.sub.e(n) at the right proportion. For
instance, when the ERLE is high, the output of the signum
non-linearity unit will not have audible residual echo and it will
contain the harmonic nature of the true background noise.
Therefore, the optimal mixing by the excitation signal mixer is
such that the mixing amount will favor the output of the signum
non-linearity unit. When the ERLE is low, the output of the signum
non-linearity unit may have audible residual echo. In this case,
the optimal mixing by the excitation signal mixer is such that more
of the spectrally-shaped white noise will be used and the amount of
signal from the signum non-linearity unit output that goes into the
mixing will be reduced.
[0027] As illustrated in FIG. 7, an exemplary embodiment of the
present invention may employ a gain adjuster 706 to helps ensure
that reconstructed signal {tilde over (s)}(n) at line 708 is
synthesized at the correct level. The gain adjuster makes use of a
background noise power .sigma..sub.w.sup.2 measurement 707. An
exemplary embodiment may calculate a background noise power
.sigma..sub.w.sup.2 when the Sin port of the EC only receives
background noise and no speech signal, according to:
.sigma..sub.w.sup.2(n)=.beta..sigma..sub.w.sup.2(n-1)+(1-.beta.)w.su-
p.2(n), 0<<.beta.<1
[0028] FIG. 8 is a block diagram of modules implementing an
exemplary embodiment of the present invention. FIG. 8 illustrates
the concepts discussed thus far in taking an input speech signal
s(n), filtering it with an adaptive lattice filter and
reconstructing and modifying it to yield a reconstructed signal
{tilde over (s)}(n).
[0029] FIG. 9 is a flow chart implementing a method for managing
speech signals according to an embodiment of the present invention.
Some of the steps illustrated in these figures may be performed in
an order other than that which is described. It should be
appreciated that not all of the steps described are required to be
performed, that additional steps may be added, and that some of the
illustrated steps maybe substituted with other steps. This flow
chart illustrates how an exemplary embodiment of the present
invention may make use of double talk detection and voice activity
detection.
[0030] At step 902, double-talk in the incoming speech signal is
detected. If double-talk exists, step 905 is executed and thus an
exemplary embodiment of the present invention switches to a
traditional NLP with center-clipping and injects spectrally matched
noise by passing white noise through an inverse filter 1/L(z). The
equation governing this action is: {tilde over
(s)}(n)=[s(n)]+G[1/L(z).epsilon.(n)] where [.] corresponds to a
center-clipper, and E(n) is a stationary noise source whose
spectral envelope tracks that of the background noise signal, w(n),
and G[.] is a gain adjust function. Alternatively, other
embodiments of the present invention may pass other types of noise
through the inverse filter 1/L(z).
[0031] If at step 902 double-talk is not detected, an exemplary
embodiment executes step 903, where voice activity is detected. If
there is no voice activity, step 907 is executed where a background
noise power measurement .sigma..sub.w.sup.2 of the speech signal is
taken. Next, step 908 is executed where coefficients in the
adaptive lattice filter are adjusted by updating a GAL algorithm.
Finally, step 909 is executed and the input speech signal is left
unchanged by setting the reconstructed signal {tilde over (s)}(n)
to input speech signal's(n). Even though in this case, the input
speech signal s(n) is unchanged, the background noise power
.sigma..sub.w.sup.2 is still measured and the GAL algorithm is
still updated for later use such as the case where voice activity
is detected on the incoming speech signal.
[0032] If at step 902 double-talk is not detected and at step 903
voice activity is detected, an exemplary embodiment executes step
910 where the GAL algorithm is frozen or not updated and thus the
coefficients in the adaptive lattice filter are not adjusted. Next,
step 904 is executed where optimal mixing is computed by taking an
input from an ERLE measurement 901. Finally, step 906 is executed,
which includes the actions: (1) generating a prediction error s'(n)
by passing a residual echo of the speech signal s(n) through an
adaptive lattice filter L(z); (2) generating an excitation
component s'.sub.e(n) by passing the prediction error s'(n) through
a signum non-linearity unit; (3) mixing the excitation component
s'.sub.e(n) with a stationary random noise signal q(n) (where q(n)
may be obtained by passing a white noise source v(n) through an
inverse filter 1/L(z)); (4) modifying the residual echo by passing
the excitation component s'.sub.e(n) through an inverse filter
1/L(z); and (5) further modifying the residual echo by passing the
excitation component s'.sub.e(n) through a gain adjuster that
adjusts a power of the modified residual echo using a background
noise power measurement. One ordinarily skilled in the art will
understand how a gain adjuster adjusts a power of the modified
residual echo using a background noise power measurement. The
equations governing these actions are: s'(n)=L(z)s(n)
q(n)=1/L(z)v(n) s'.sub.e(n)=.delta.sgn[s'(n)]+(1-.delta.)q(n)
s'.sub.e(n)=.delta.sgn[s'(n)]+(1-.delta.)q(n) {tilde over
(s)}(n)=G[1/L(z)s'.sub.e(n)] where G[.] is a gain adjust
function.
[0033] In the foregoing description, the invention is described
with reference to specific example embodiments thereof. It will,
however, be evident that various modifications and changes may be
made thereto, in software, hardware or any combination thereof,
without departing from the broader spirit and scope of the present
invention. The specification and drawings are accordingly to be
regarded in an illustrative rather than in a restrictive sense.
* * * * *