U.S. patent application number 15/388323 was filed with the patent office on 2017-12-21 for automatic speech recognition de-reverberation.
The applicant listed for this patent is Adam Kupryjanow, Lukasz Kurylo, Piotr Lasota, Przemyslaw Maziewski. Invention is credited to Adam Kupryjanow, Lukasz Kurylo, Piotr Lasota, Przemyslaw Maziewski.
Application Number | 20170365271 15/388323 |
Document ID | / |
Family ID | 60659998 |
Filed Date | 2017-12-21 |
United States Patent
Application |
20170365271 |
Kind Code |
A1 |
Kupryjanow; Adam ; et
al. |
December 21, 2017 |
AUTOMATIC SPEECH RECOGNITION DE-REVERBERATION
Abstract
System and techniques for automatic speech recognition
de-reverberation are described herein. A portion of an audio stream
may be obtained. here, the portion of the audio stream is a proper
subset of the audio stream. A filter may be created by applying
Generalized Weighted Prediction Error (GWPE) to the portion of the
audio stream. The filter may be applied to the audio stream to
remove reverberation. The filtered version of the audio stream may
then be provided to an audio stream consumer.
Inventors: |
Kupryjanow; Adam; (Gdansk,
PL) ; Maziewski; Przemyslaw; (Gdansk, PL) ;
Kurylo; Lukasz; (Gdansk, PL) ; Lasota; Piotr;
(Gdansk, PL) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Kupryjanow; Adam
Maziewski; Przemyslaw
Kurylo; Lukasz
Lasota; Piotr |
Gdansk
Gdansk
Gdansk
Gdansk |
|
PL
PL
PL
PL |
|
|
Family ID: |
60659998 |
Appl. No.: |
15/388323 |
Filed: |
December 22, 2016 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
62350507 |
Jun 15, 2016 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G10L 21/0216 20130101;
H04R 31/006 20130101; G10L 15/20 20130101; G10L 15/22 20130101;
G10L 21/0364 20130101; G10L 2021/02082 20130101; H04R 2201/401
20130101; G10L 15/30 20130101; G10L 25/21 20130101; H04R 2201/403
20130101; G10L 21/034 20130101; G10L 25/51 20130101; H04R 2420/07
20130101; G10L 2021/02166 20130101; H04R 1/04 20130101; H04R 3/005
20130101; H04R 1/2876 20130101; G10L 21/0232 20130101; H04R 1/406
20130101 |
International
Class: |
G10L 21/0216 20130101
G10L021/0216 |
Claims
1. A system for automatic speech recognition de-reverberation, the
system comprising: a sampler to obtain a portion of an audio
stream, the portion of the audio stream being a proper subset of
the audio stream; a signal processor to create a filter by applying
Generalized Weighted Prediction Error (GWPE) to the portion of the
audio stream; a multiplexer to apply the filter to the audio stream
to remove reverberation; and an interlink to provide a filtered
version of the audio stream to an audio stream consumer.
2. The system of claim 1, wherein the signal processor is in a
first pipeline to create the filter and the multiplexer is in a
second pipeline to apply the filter, the first and second pipelines
arranged to execute in parallel.
3. The system of claim 1, wherein, to obtain the portion of the
audio stream, the sampler buffers the audio stream for a fixed time
period.
4. The system of claim 3, wherein the signal processor includes a
loop to repetitively create the filter with subsequent fixed time
periods.
5. The system of claim 3, wherein fixed time period is an audio
frame.
6. The system of claim 1, wherein, to create the filter, the signal
processor combines a current GWPE application to the audio stream
with a previously created filter.
7. The system of claim 6, wherein, to combine the current GWPE
application to the audio stream with a previously created filter,
the signal processor adds the current GWPE application as a first
term to the previously created filter as a second term.
8. The system of claim 7, wherein, to combine the current GWPE
application to the audio stream with a previously created filter,
the signal processor applies a first scaling factor to the first
term and a second scaling factor to the second term prior to the
adding.
9. At least one machine readable medium including instructions for
automatic speech recognition de-reverberation, the instructions,
when executed by a machine, cause the machine to perform operations
comprising: obtaining a portion of an audio stream, the portion of
the audio stream being a proper subset of the audio stream;
creating a filter by applying Generalized Weighted Prediction Error
(GWPE) to the portion of the audio stream; applying the filter to
the audio stream to remove reverberation; and providing a filtered
version of the audio stream to an audio stream consumer.
10. The at least one machine readable medium of claim 9, wherein
creating the filter occurs in a first pipeline and applying the
filter occurs in a second pipeline, the first and second pipelines
executing in parallel on a device.
11. The at least one machine readable medium of claim 9, wherein
obtaining the portion of the audio stream includes buffering the
audio stream for a fixed time period.
12. The at least one machine readable medium of claim 11, wherein
the operations include repeating creating the filter with a
subsequent fixed time period.
13. The at least one machine readable medium of claim 11, wherein
the fixed time period is an audio frame.
14. The at least one machine readable medium of claim 9, wherein
creating the filter includes combining a current GWPE application
to the audio stream with a previously created filter.
15. The at least one machine readable medium of claim 14, wherein
combining the current GWPE application to the audio stream with a
previously created filter includes adding the current GWPE
application as a first term to the previously created filter as a
second term.
16. The at least one machine readable medium of claim 15, wherein
combining the current GWPE application to the audio stream with a
previously created filter includes applying a first scaling factor
to the first term and a second scaling factor to the second term
prior to the adding.
17. A method for automatic speech recognition de-reverberation, the
method comprising: obtaining a portion of an audio stream, the
portion of the audio stream being a proper subset of the audio
stream; creating a filter by applying Generalized Weighted
Prediction Error (GWPE) to the portion of the audio stream;
applying the filter to the audio stream to remove reverberation;
and providing a filtered version of the audio stream to an audio
stream consumer.
18. The method of claim 17, wherein creating the filter occurs in a
first pipeline and applying the filter occurs in a second pipeline,
the first and second pipelines executing in parallel on a
device.
19. The method of claim 17, wherein obtaining the portion of the
audio stream includes buffering the audio stream for a fixed time
period.
20. The method of claim 19, comprising repeating creating the
filter with a subsequent fixed time period.
21. The method of claim 19, wherein the fixed time period is an
audio frame.
22. The method of claim 17, wherein creating the filter includes
combining a current GWPE application to the audio stream with a
previously created filter.
23. The method of claim 22, wherein combining the current GWPE
application to the audio stream with a previously created filter
includes adding the current GWPE application as a first term to the
previously created filter as a second term.
24. The method of claim 23, wherein combining the current GWPE
application to the audio stream with a previously created filter
includes applying a first scaling factor to the first term and a
second scaling factor to the second term prior to the adding.
Description
CLAIM OF PRIORITY
[0001] This patent application claims the benefit of priority,
under 35 U.S.C. .sctn.119, to U.S. Provisional Application Ser. No.
62/350,507, titled "FAR FIELD AUTOMATIC SPEECH RECOGNITION" and
filed on Jun. 15, 2016, the entirety of which is hereby
incorporated by reference herein.
TECHNICAL FIELD
[0002] Embodiments described herein generally relate to automatic
speech recognition (ASR) and more specifically to ASR
de-reverberation.
BACKGROUND
[0003] ASR involves a machine-based collection of techniques to
understand human languages. ASR is interdisciplinary, often
involving microphone, analog to digital conversion, frequency
processing, database, and artificial intelligence technologies to
convert the spoken word into textual or machine readable
representations of not only what said (e.g., a transcript) but also
what was meant (e.g., semantic understanding) by a human speaker.
Far field ASR involves techniques to decrease a word error rate
(WER) in utterances made a greater distance to a microphone, or
microphone array, than traditionally accounted for in ASR
processing pipelines. Such distance often decreases the signal to
noise ratio (SNR) and thus increases WER in traditional ASR
systems. As used herein, far field ASR involves distances more than
half meter from the microphone.
BRIEF DESCRIPTION OF THE DRAWINGS
[0004] In the drawings, which are not necessarily drawn to scale,
like numerals may describe similar components in different views.
Like numerals having different letter suffixes may represent
different instances of similar components. The drawings illustrate
generally, by way of example, but not by way of limitation, various
embodiments discussed in the present document.
[0005] FIG. 1 is an example of a smart home gateway housing,
according to an embodiment.
[0006] FIG. 2 is a block diagram of an example high accuracy (HA)
real-time de-reverberation device, according to an embodiment.
[0007] FIG. 3 is a block diagram of an example low latency (II)
real-time de-reverberation device, according to an embodiment.
[0008] FIG. 4 is a block diagram of an example low complexity and
high accuracy (LH) real-time de-reverberation device, according to
an embodiment.
[0009] FIG. 5 is an example of a method for automatic speech
recognition de-reverberation, according to an embodiment.
[0010] FIG. 6 is a block diagram illustrating an example of a
machine upon which one or more embodiments may be implemented.
DETAILED DESCRIPTION
[0011] Embodiments and examples herein general described a number
of systems, devices, and techniques for ASR de-reverberation. It is
understood, however, that the systems, devices, and techniques are
examples illustrating the underlying concepts.
[0012] Automatic speech recognition (ASR) often performs poorly if
the user (or other sound source) is changing positions in the
so-called far field (e.g., between one and five meters away from
the microphone array). For example, for distances changing from 0.5
to 5 m the ASR word error rate (WER) for traditional techniques
often grows from 4% to 18.5%. Due to issues of environmental
acoustic conditions (e.g., reverberation or noise) that are
exacerbated in far field ASR, near field (e.g., distances less than
one half meter) ASR processing techniques perform poorly in the far
field. Often far field conditions cause signals captured by
microphones to be reverberant. In fact, when dealing with the far
field ASR, reverberations may become the main parasitic factor
decreasing ASR performance.
[0013] Reverb removal is not a trivial task because its
characteristics depend on many factors, such as reverberation time
(RT), distance between the sound source (e.g., user) and the
microphone array, microphone array/device position, among other
factors. Accordingly, even if a reverberation characteristic is
known at one point in a room, for example, the characteristics may
not be helpful to remove reverberation when a signal is captured in
a different location (even if the location changed only a few
millimeters).
[0014] In a reverberant scenario, many near-field pre-processing
techniques that are used for acceptable ASR performance start to
fail. What is needed is a de-reverberation technique to facilitate
far field ASR performance improvements, such as, helping
beam-formers to work properly in the reverberant conditions.
[0015] A framework for reducing reverberation is the generalized
weighted prediction error (GWPE) de-reverberation technique.
Although GWPE may lead to large WER improvements, it is expensive
(e.g., in time, power, or device complexity) to compute. In tests,
GWPE's real-time factor (RTF) is greater than 2.8 for a modern
Intel.RTM. i7 processor on an eight microphone array. Here, the RTF
is a ratio between the required time of computation and computed
signal duration. Thus, when the RTF is higher than 1.0, the
measured processing will not be real-time and the computation will
take longer than the signal lasts. Because GWPE does not have an
RTF less than or equal to one on most consumer devices, GWPE
generally cannot be used in real-time solutions.
[0016] To address the issues with GWPE in real-time processing,
several examples of systems and techniques are presented herein.
Examples are able to run in real-time with, for example, an eight
microphone array (such as that illustrated in FIG. 1). Some of
these techniques improve WER at the five meter distance from 17.2%
to 9.3% in clean conditions (e.g., without noises) and from 51.5%
to 33.2% in noisy conditions. Removing reverberation helps improve
ASR engine performance, beam-former techniques, and estimation of
signal's source position, thus generally increasing the performance
of ASR systems.
[0017] The devices or systems described below allow processing of
an eight or less channel signal in real-time (e.g., RTF<=1.0).
Further, in an example, some optimizations in the GWPE technique
result in reduced computational complexity, allowing wider use of
GWPE in a variety of devices with different computational
capabilities.
[0018] FIG. 1 is an example of a device 100 including a smart home
gateway housing 105, according to an embodiment. As illustrated,
the circles atop the housing are lumens 110 behind which are housed
microphones (as illustrated there are eight microphones). The
dashed lines illustrate microphones in a linear arrangement 115 as
well as in a circular arrangement 120. The device 100 includes a
sampler 125, a signal processor 130, a multiplexer 140, and an
interlink 145. In an example, the device includes a data store 135
to provide one or more buffers. All of these components are
implemented in electronic hardware, such as that described below
e.g., circuits).
[0019] The sampler 125 is arranged to obtain a portion of an audio
stream. Here, the portion of the audio stream being a proper subset
of the audio stream. To obtain the portion, the sampler 125 may
receive the portion or may retrieve the portion from, for example,
a microphone. In an example, obtaining the portion of the audio
stream includes buffering the audio stream for a fixed time period.
In an example, the fixed time period (i.e. buffer length) is a
second. In an example, the fixed time period is an audio frame. In
an example, the audio frame length is thirty two milliseconds. The
sampler 125 thus operates to obtain discrete audio samples.
[0020] The signal processor 130 is arranged to create a filter by
applying GWPE to the portion of the audio stream. In an example,
the signal processor 130 is arranged to calculate the frequency of
domain data by applying an overlap and add (OLA) procedure combined
with a Fast Fourier Transform (FFT) prior to creating the filter. A
description of GWPE and its application in a number of examples is
provided below with respect to FIGS. 2-4. Thus, instead of
operating an entire utterance, GWPE is applied only to the portion
obtained by the sampler 125 (e.g., a buffered signal). Reducing the
size of the buffered segment of the signal reduces the
computational complexity of traditional GWPE while still
maintaining enough de-reverberation performance to benefit ASR
processing. In an example, creating the filter occurs in a first
pipeline and applying the filter occurs in a second pipeline. In an
example, the first and second pipelines are arranged to execute in
parallel on the device 100. Parallel execution here means actual
simultaneous (e.g., not sequential as in a single core processor)
execution of the pipelines. In an example, the signal processor 130
is also arranged to repeatedly create and apply the filter with a
subsequent fixed time period (e.g., buttered portions of signal).
Thus, the signal processor 130 creates and applies filters for
respective samples obtained by the sampler 125.
[0021] In an example, creating the filter includes the signal
processor 130 to combine a current GWPE application to the audio
stream with a previously created filter. Here, GWPE application
refers to the result of applying GWPE to the audio stream, in an
example, combining the current GWPE application to the audio stream
with a previously created filter includes adding the current GWPE
application as a first term to the previously created filter as a
second term. In an example, combining the current GWPE application
to the audio stream with a previously created filter includes
applying a first scaling factor to the first term and a second
scaling factor to the second term prior to the adding. In an
example, the second scaling factor is between zero and one. In an
example, the first scaling factor is one minus the second scaling
factor.
[0022] The data store 135 provides a buffer to introduce a delay to
the audio stream prior to applying the filter. This delay permits
the processing buffer to fill with the required amount of signal.
In an example, the delay is higher than 40 milliseconds and depends
inter alia on the processor compute power.
[0023] The multiplexer 140 is arranged to apply the filter to the
audio stream to remove reverberation. Thus, the multiplexer 140
accepts both the filter and the complex audio spectrum as inputs.
The multiplexer 140 then applies the filter to the audio stream,
calculates Inverse FFT(IFFT) and perform the OLA procedure to
produce a signal with some (or all) of the reverberations removed.
This signal may be referred to as a clean audio signal, filtered
audio signal, or the like.
[0024] The interlink 145 is arranged to provide a filtered version
of the audio stream to an audio stream consumer. Thus, the
interlink 145 interfaces the present pipeline to other processing
as part of a far field, or other, ASR system. Although far field
ASR may greatly benefit from the device 100, the device 100 may
also be helpful in other ASR situations, such as near field, to
reduce reverberations.
[0025] FIGS. 2-4 illustrate several examples of configurations for
a variety of the operations described above. These examples
describe changes to a traditional GWPE application to meet the
real-time processing used by many current devices while still
maintaining a high degree of reverberation removal for ASR
processing applications.
[0026] GWPE was originally designed in such a way that it required
a whole recording of an utterance to properly design the
de-reverberation filter. Thus, GWPE was not appropriate for
real-time signal processing because processing must begin prior to
an utterance's completion, denying the knowledge contained in the
end of the recording that was used in the traditional GWPE
implementation. To address the limitations of GWPE, three examples
are described below. The first example is here labeled high
accuracy (HA), the second example is here labeled low latency (LL),
and the example is here labeled low-complexity with high accuracy
(LH). The first and third examples (HA and LH) estimate the
de-reverberation filter in a parallel to the channel processing.
However, the role of the main thread is different between HA and
LH. LL performs all calculations in a single (e.g., main) thread
but uses estimates of the signal statistics instead of using the
signal's actual (e.g., real) values.
[0027] To illustrate how these three examples improve on GWPE, the
GWPE technique is introduced and then modifications to GWPE in each
of the examples are explained.
[0028] GWPE operates in the frequency domain. GWPE operates on a
signal with sample rate of 16 kHz and uses 32 ms frames with an 8
ms shift. GWPE treats every frequency bin independently. Thus, GWPE
processes 257 independent frequency bins. As used herein, a
frequency bin is represented by/and a number of the frame (e.g.,
index of the frame in a sample) by t. GWPE takes as an input of M
channels and provides M channels of output. GWPE is a blind
de-reverberation technique because all of the statistics needed by
GWPE are obtained directly from the input signal. The GWPE
de-reverberation operation is defined by the following
equation:
X ^ l ( t ) = Y l ( t ) - .tau. = .DELTA. .DELTA. + K l - 1 G ^ l *
( .tau. ) Y l ( t - .tau. ) , .A-inverted. t .di-elect cons. T ( 1
) ##EQU00001##
where: [0029] {circumflex over (X)}.sub.l(t) is an estimate of a
dry signal (e.g., without reverberations) for the time frame t from
timespan T and frequency bin l, [0030] Y.sub.l(t) is the signal
captured by a microphone array; [0031] G*.sub.l(.tau.) is an
estimate of the de-reverberation filter; [0032] K.sub.l is a filter
length; [0033] .DELTA. is a time delay; and [0034] {circumflex over
(X)}.sub.l(t), Y.sub.l(t), and G*.sub.l(.tau.) are vectors of
length M.
[0035] G*.sub.l(.tau.) may be estimated using the following: [0036]
1, initialization: G*.sub.l(.tau.)=0 for all .tau. with
.DELTA..ltoreq..tau..ltoreq.(.DELTA.+K.sub.l-1). [0037] 2.
de-reverberation using equation (1). [0038] 3. spatial correlation
matrix estimation:
[0038] (t)=E({circumflex over (X)}.sub.l(t){circumflex over
(X)}*.sub.l(t)).A-inverted..sub.t.epsilon..tau. (2) [0039] 4.
weighted correlation matrix/vector calculation:
[0039] R ^ l = t .di-elect cons. .tau. .psi. l ( t _ - .DELTA. ) (
t ) - 1 .psi. l * ( t _ - .DELTA. ) ( 3 ) r l ^ = t .di-elect cons.
.tau. .psi. l ( t _ - .DELTA. ) ( t ) - 1 Y l ( t ) ( 4 )
##EQU00002## [0040] 5. filter update:
[0040] .sub.l={circumflex over (R)}.sub.l.sup.-1r.sub.l (5) [0041]
6. convergence check
[0042] The complexity of the above technique is high. Given an
eight channel input signal, the filter length is four. With 257
frequency bins, every filter update involves solving 257 linear
equation systems, each of size 32 by 32 (R) (matrix) and 32 by 8
(r) (vector). From the perspective of signal quality it is
important to update filter frequently enough to keep up (e.g.,
align) with reverberation characteristic changes.
[0043] GWPE updates the filter for an entire utterance. Thus, GWPE
is inappropriate for working real-time working solutions due to:
[0044] the latency it will introduce [0045] misalignment between
the input signal and the filter in a case when the user moves
during the utterance speaking [0046] resource requirements related
to the memory needed to buffer the signal
[0047] FIG. 2 is a block diagram of an HA real-time
de-reverberation device 205, according to an embodiment. To address
some of the GWPE issues noted above, HA de-reverberation uses a
short time span for filter estimation. For example, the time for
estimation (T_est) may be equal to one second (1000 ms) Here, the
filter is updated every T_est ms.
[0048] The elements of the HA device 205 include a main thread 215
and a parallel thread 210. The main thread 215 includes a storage
device 235, a signal delay buffer 240, and a switcher 245. The
parallel thread 210 includes a signal statistics calculator 220, a
filter calculator 225, and a reverberation removal block 230. These
components are all implemented in electronic hardware (e.g.,
circuits). The operations of the components proceed as follows:
[0049] 1. The main thread 215 buffers the input signal in the
storage device 235 and delays 240 samples at the beginning of
processing. [0050] 2. When the buffer is filled (e.g., when 1000 ms
of data has been collected), the GWPE procedure is performed in the
parallel thread 210 using the technique described above, including
calculating the signal statistics 220 and calculating the filter
225. [0051] 3. When the GWPE procedure is finished, the
reverberation removal block 230 provides a signal with the
reverberation removed to the switcher 245 to output. The filter
update (e.g., blocks 220 and 225) may then be started for a new
data chunk.
[0052] Thus, the processing proceeds with the latest (e.g., last)
GWPE de-reverberation without imposing a delay for the entire
utterance. An advantage of this technique includes its high
accuracy of reverberation reduction. In an example, the delay
introduced is:
.DELTA.T=(1+RTF)T.sub.est (6)
Where T_est is the time span used for the filter estimation and the
RTF obtained from the GWPE implementation on the sample. In
experimental results, the RTF approached 0.3 for this example.
[0053] FIG. 3 is a block diagram of an example LL real-time
de-reverberation device 305, according to an embodiment. This
additional example is here called the low-latency real-time
de-reverberation. It uses an estimate of a weighted correlation
matrix and vector ({circumflex over (R)}.sub.l and {circumflex over
(.tau.)}.sub.l) and does not wait for filter convergence. The
device 305 includes a single thread 310 rather than the parallel
threads described with respect to the I-IA real-time
de-reverberation device 205 described above. The single thread
includes a signal statistics estimation block 220 rather than the
signal statistics calculation block described above, but otherwise
also includes a filter calculator 320 and reverberation removal
block 325. All of illustrated processing elements are implemented
in electronic hardware. Operations of the device 305 proceed as
follows: [0054] 1. initialization: G.sub.l(t)=0 for all .tau.
values with .DELTA..ltoreq..tau..ltoreq..DELTA.+K.sub.l-1 [0055] 2.
De-reverberation using equation (1) (e.g., block 325) [0056] 3.
Spatial correlation matrix estimation using equation (2) (e.g.,
block 315) [0057] 4. Weighted correlation matrix/vector calculation
(e.g., block 320):
[0057] R ^ l ( t ) = { t .di-elect cons. .tau. .psi. l ( t _ -
.DELTA. ) ( t ) - 1 .psi. l * ( t _ - .DELTA. ) , for t = T min
.alpha. r l ^ ( t - a ) + ( a - .alpha. ) t .di-elect cons. .tau.
.psi. l ( t _ - .DELTA. ) ( t ) - 1 .psi. l * ( t _ - .DELTA. ) ,
for t > T min ( 7 ) r ^ l = { t .di-elect cons. .tau. .psi. l (
t _ - .DELTA. ) ( t ) - 1 Y l ( t ) .alpha. r l ^ ( t - 1 ) + ( 1 -
.alpha. ) t .di-elect cons. .tau. .psi. l ( t _ - .DELTA. ) ( t ) -
1 Y l ( t ) ( 8 ) ##EQU00003## [0058] 5. Filter update (e.g., block
320):
[0058] .sub.l={right arrow over (R)}.sub.l.sup.-1r.sub.l, f or
t.gtoreq.T.sub.min
where .alpha. is a smoothing factor (typically in a range from 0.9
to 0.999) and T_min is an initialization time span (typically in a
range from 300 to 500 milliseconds ms)). This technique may
introduce delay caused by the fast Fourier transform (FFT) and the
overlap-add (OLA) procedures. For example, the delay is 40 ms when
the frame size equals 32 ms and a frame shift of 8 ms. An
additional benefit of this example includes lower memory
requirements because calculating a new values of {right arrow over
(R)}.sub.l and {circumflex over (r)}.sub.l requires only the last T
signal frame to be buffered whereas the HA example may use T_est+T
frames to be buffered. Additionally, because the filter is updated
for every new frame, the precision of the de-reverberation filter
may perform better than the HA example. Experimentally, a WER for
clean speech of 10.5% for the HA example was achieved and a WER of
9.3% for the LL example was achieved.
[0059] FIG. 4 is a block diagram of an example LH real-time
de-reverberation device 405, according to an embodiment. This
example uses a similar foundation illustrated above between the
threads of the HA example (FIG. 2), but uses the LL
de-reverberation procedure albeit in a parallel thread. The delay
introduced by this example may be set arbitrarily depending on the
available computing power. Thus, the device 405 includes a main
thread 430 and a parallel filter estimation thread 410. The main
thread 430 includes a signal estimation block 435 which buffers
results in a storage device 425. The main thread includes another
storage area 445 (which may be the same physical device as storage
device 425 in an example) to buffer the audio signal for use by the
reverberation removal block 420 of the parallel thread 410.
Additionally, the main thread 430 may include a delay 440 block to
delay the audio signal into the switcher 450 to allow the filter to
be processed prior to outputting the filtered signal. The parallel
thread 410 includes the filter calculator 415 similar to that of
the LL example to operates on the signal statistic estimates to
produce the de-reverberation filter. This then is provided to the
reverberation removal block 420 to perform the filtering.
[0060] The higher .DELTA.T (e.g., the greater the time-frame
sample) is set the lower the compute complexity becomes because the
delay will often define how frequently the filter is updated. Thus,
for .DELTA.T=0, this example will be equivalent in compute
complexity to the LL example described with respect to FIG. 3.
[0061] FIG. 5 is an example of a method 500 for automatic speech
recognition de-reverberation, according to an embodiment. The
operations of the method 500 are implemented and executed upon
electronic hardware, such as that described above and below (e.g.,
circuits).
[0062] At operation 505, a portion of an audio stream is obtained.
In an example, the portion of the audio stream is a proper subset
of the audio stream. In an example, obtaining the portion of the
audio stream includes buffering the audio stream for a fixed time
period. In an example, the fixed time period is a second. In an
example, the fixed time period is an audio frame. In an example,
the audio frame length is thirty two milliseconds. In an example,
the method 500 may be extended to include repeating (e.g.,
repeatedly) creating the filter with a subsequent fixed time
period.
[0063] At operation 510, a filter is created by applying GWPE to
the portion of the audio stream. In an example, creating the filter
occurs in a first pipeline and applying the filter occurs in a
second pipeline. In an example, the first and second pipelines
execute in parallel on a device.
[0064] In an example, creating the filter includes combining a
current GWPE application to the audio stream with a previously
created filter. In an example, combining the current GWPE
application to the audio stream with a previously created filter
includes adding the current GWPE application as a first term to the
previously created filter as a second term. In an example,
combining the current GWPE application to the audio stream with a
previously created filter includes applying a first scaling factor
to the first term and a second scaling factor to the second term
prior to the adding. In an example, the second scaling factor is
between zero and one. In an example, the first scaling factor is
one minus the second scaling factor.
[0065] At operation 515, the filter is applied to the audio stream
to remove reverberation from the audio stream to produce a filtered
version of the audio stream.
[0066] At operation 520, the filtered version of the audio stream
is provided to an audio stream consumer.
[0067] The operations of the method 500 may be optionally extended
to include introducing a delay to the audio stream prior to
applying the filter. In an example, the delay is 40
milliseconds.
[0068] FIG. 6 illustrates a block diagram of an example machine 600
upon which any one or more of the techniques (e.g., methodologies)
discussed herein may perform. In alternative embodiments, the
machine 600 may operate as a standalone device or may be connected
(e.g., networked) to other machines. In a networked deployment, the
machine 600 may operate in the capacity of a server machine, a
client machine, or both in server-client network environments. In
an example, the machine 600 may act as a peer machine in
peer-to-peer (P2P) (or other distributed) network environment. The
machine 600 may be a personal computer (PC), a tablet PC, a set-top
box (STB), a personal digital assistant (PDA), a mobile telephone,
a web appliance, a network router, switch or bridge, or any machine
capable of executing instructions (sequential or otherwise) that
specify actions to be taken by that machine. Further, while only a
single machine is illustrated, the term "machine" shall also be
taken to include any collection of machines that individually or
jointly execute a set (or multiple sets) of instructions to perform
any one or more of the methodologies discussed herein, such as
cloud computing, software as a service (SaaS), other computer
cluster configurations.
[0069] Examples, as described herein, may include, or may operate
by, logic or a number of components, or mechanisms. Circuitry is a
collection of circuits implemented in tangible entities that
include hardware (e.g., simple circuits, gates, logic, etc.).
Circuitry membership may be flexible over time and underlying
hardware variability. Circuitries include members that may, alone
or in combination, perform specified operations when operating. In
an example, hardware of the circuitry may be immutably designed to
carry out a specific operation (e.g., hardwired). In an example,
the hardware of the circuitry may include variably connected
physical components (e.g., execution units, transistors, simple
circuits, etc.) including a computer readable medium physically
modified (e.g., magnetically, electrically, moveable placement of
invariant massed particles, etc.) to encode instructions of the
specific operation. In connecting the physical components, the
underlying electrical properties of a hardware constituent are
changed, for example, from an insulator to a conductor or vice
versa. The instructions enable embedded hardware (e.g., the
execution units or a loading mechanism) to create members of the
circuitry in hardware via the variable connections to carry out
portions of the specific operation when in operation. Accordingly,
the computer readable medium is communicatively coupled to the
other components of the circuitry when the device is operating. In
an example, any of the physical components may be used in more than
one member of more than one circuitry. For example, under
operation, execution units may be used in a first circuit of a
first circuitry at one point in time and reused by a second circuit
in the first circuitry, or by a third circuit in a second circuitry
at a different time.
[0070] Machine (e.g., computer system) 600 may include a hardware
processor 602 (e.g., a central processing unit (CPU), a graphics
processing unit (GPU), a hardware processor core, or any
combination thereof), a main memory 604 and a static memory 606,
some or all of which may communicate with each other via an
interlink (e.g., bus) 608. The machine 600 may further include a
display unit 610, an alphanumeric input device 612 (e.g., a
keyboard), and a user interface (UI) navigation device 614 (e.g., a
mouse). In an example, the display unit 610, input device 612 and
UI navigation device 614 may be a touch screen display. The machine
600 may additionally include a storage device (e.g., drive unit)
616, a signal generation device 618 (e.g., a speaker), a network
interface device 620, and one or more sensors 621, such as a global
positioning system (UPS) sensor, compass, accelerometer, or other
sensor. The machine 600 may include an output controller 628, such
as a serial (e.g., universal serial bus (USB), parallel, or other
wired or wireless (e.g., infrared (IR), near field communication
(NFC), etc) connection to communicate or control one or more
peripheral devices (e.g., a printer, card reader, etc.).
[0071] The storage device 616 may include a machine readable medium
622 on which is stored one or more sets of data structures or
instructions 624 (e.g., software) embodying or utilized by any one
or more of the techniques or functions described herein. The
instructions 624 may also reside, completely or at least partially,
within the main memory 604, within static memory 606, or within the
hardware processor 602 during execution thereof by the machine 600.
In an example, one or any combination of the hardware processor
602, the main memory 604, the static memory 606, or the storage
device 616 may constitute machine readable media.
[0072] While the machine readable medium 622 is illustrated as a
single medium, the term "machine readable medium" may include a
single medium or multiple media (e.g., a centralized or distributed
database, and/or associated caches and servers) configured to store
the one or more instructions 624.
[0073] The term "machine readable medium" may include any medium
that is capable of storing, encoding, or carrying instructions for
execution by the machine 600 and that cause the machine 600 to
perform any one or more of the techniques of the present
disclosure, or that is capable of storing, encoding or carrying
data structures used by or associated with such instructions.
Non-limiting machine readable medium examples may include
solid-state memories, and optical and magnetic media. In an
example, a massed machine readable medium comprises a machine
readable medium with a plurality of particles having invariant
(e.g., rest) mass. Accordingly, massed machine-readable media are
not transitory propagating signals. Specific examples of massed
machine readable media may include: non-volatile memory, such as
semiconductor memory devices (e.g., Electrically Programmable
Read-Only Memory (EPROM), Electrically Erasable Programmable
Read-Only Memory (EEPROM)) and flash memory devices; magnetic
disks, such as internal hard disks and removable disks;
magneto-optical disks; and CD-ROM and DVD-ROM disks.
[0074] The instructions 624 may further be transmitted or received
over a communications network 626 using a transmission medium via
the network interface device 620 utilizing any one of a number of
transfer protocols (e.g., frame relay, internet protocol (IP),
transmission control protocol (TCP), user datagram protocol (UDP),
hypertext transfer protocol (HTTP), etc.). Example communication
networks may include a local area network (LAN), a wide area
network (WAN), a packet data network (e.g., the Internet), mobile
telephone networks (e.g., cellular networks), Plain Old Telephone
(POTS) networks, and wireless data networks (e.g., Institute of
Electrical and Electronics Engineers (IEEE) 802.11 family of
standards known as Wi-Fi.RTM., IEEE 802.16 family of standards
known as WiMax.RTM.), IEEE 802.15.4 family of standards,
peer-to-peer (P2P) networks, among others. In an example, the
network interface device 620 may include one or more physical jacks
(e.g., Ethernet, coaxial, or phone jacks) or one or more antennas
to connect to the communications network 626. In an example, the
network interface device 620 may include a plurality of antennas to
wirelessly communicate using at least one of single-input
multiple-output (SIMO), multiple-input multiple-output (MIMO), or
multiple-input single-output (MISO) techniques. The term
"transmission medium" shall be taken to include any intangible
medium that is capable of storing, encoding or carrying
instructions for execution by the machine 600, and includes digital
or analog communications signals or other intangible medium to
facilitate communication of such software.
Additional Notes & Examples
[0075] Example 1 is a system for automatic speech recognition
de-reverberation, the system comprising: a sampler to obtain a
portion of an audio stream, the portion of the audio stream being a
proper subset of the audio stream; a signal processor to create a
filter by applying Generalized Weighted Prediction Error (GWPE) to
the portion of the audio stream; a multiplexer to apply the filter
to the audio stream to remove reverberation; and an interlink to
provide a filtered version of the audio stream to an audio stream
consumer.
[0076] In Example 2, the subject matter of Example 1 optionally
includes wherein the processor is in a first pipeline to create the
filter and the multiplexer is in a second pipeline to apply the
filter, the first and second pipelines arranged to execute in
parallel.
[0077] In Example 3, the subject matter of any one or more of
Examples 1-2 optionally include wherein, to obtain the portion of
the audio stream, the sampler buffers the audio stream for a fixed
time period.
[0078] In Example 4, the subject matter of Example 3 optionally
includes wherein the fixed time period is a second.
[0079] In Example 5, the subject matter of any one or more of
Examples 3-4 optionally include wherein the signal processor
includes a loop to repetitively create the filter with subsequent
fixed time periods.
[0080] In Example 6, the subject matter of any one or more of
Examples 3-5 optionally include wherein the fixed time period is an
audio frame.
[0081] In Example 7, the subject matter of Example 6 optionally
includes wherein the audio frame length is thirty two
milliseconds.
[0082] In Example 8, the subject matter of any one or more of
Examples 1-7 optionally include wherein, to create the filter, the
signal processor combines a current GWPE application to the audio
stream with a previously created filter.
[0083] In Example 9, the subject matter of Example 8 optionally
includes wherein, to combine the current GWPE application to the
audio stream with a previously created filter, the signal processor
adds the current GWPE application as a first term to the previously
created filter as a second term.
[0084] In Example 10, the subject matter of Example 9 optionally
includes wherein, to combine the current GWPE application to the
audio stream with a previously created filter, the signal processor
applies a first scaling factor to the first term and a second
scaling factor to the second term prior to the adding.
[0085] In Example 11, the subject matter of Example 10 optionally
includes wherein the second scaling factor is between zero and one
and wherein the first scaling factor is one minus the second
scaling factor.
[0086] In Example 12, the subject matter of any one or more of
Examples 1-11 optionally include a buffer to introduce a delay to
the audio stream prior to applying the filter.
[0087] In Example 13, the subject matter of Example 12 optionally
includes wherein the delay is eight milliseconds.
[0088] Example 14 is at least one machine readable medium including
instructions for automatic speech recognition de-reverberation, the
instructions, when executed by a machine, cause the machine to
perform operations comprising: obtaining a portion of an audio
stream, the portion of the audio stream being a proper subset of
the audio stream; creating a filter by applying Generalized
Weighted Prediction Error (GWPE) to the portion of the audio
stream; applying the filter to the audio stream to remove
reverberation; and providing a filtered version of the audio stream
to an audio stream consumer.
[0089] In Example 15, the subject matter of Example 14 optionally
includes wherein creating the filter occurs in a first pipeline and
applying the filter occurs in a second pipeline, the first and
second pipelines executing in parallel on a device.
[0090] In Example 16, the subject matter of any one or more of
Examples 14-15 optionally include wherein obtaining the portion of
the audio stream includes buffering the audio stream for a fixed
time period.
[0091] In Example 17, the subject matter of Example 16 optionally
includes wherein the fixed time period is a second.
[0092] In Example 18, the subject matter of any one or more of
Examples 16-17 optionally include wherein the operations include
repeating creating the filter with a subsequent fixed time
period.
[0093] In Example 19, the subject matter of any one or more of
Examples 16-18 optionally include wherein the fixed time period is
an audio frame.
[0094] In Example 20, the subject matter of Example 19 optionally
includes wherein the audio frame length is thirty two
milliseconds.
[0095] In Example 21, the subject matter of any one or more of
Examples 14-20 optionally include wherein creating the filter
includes combining a current GWPE application to the audio stream
with a previously created filter.
[0096] In Example 22, the subject matter of Example 21 optionally
includes wherein combining the current GWPE application to the
audio stream with a previously created filter includes adding the
current GWPE application as a first term to the previously created
filter as a second term.
[0097] In Example 23, the subject matter of Example 22 optionally
includes wherein combining the current GWPE application to the
audio stream with a previously created filter includes applying a
first scaling factor to the first term and a second scaling factor
to the second term prior to the adding.
[0098] In Example 24, the subject matter of Example 23 optionally
includes wherein the second scaling factor is between zero and one
and wherein the first scaling factor is one minus the second
scaling factor.
[0099] In Example 25, the subject matter of any one or more of
Examples 14-24 optionally include introducing a delay to the audio
stream prior to applying the filter.
[0100] In Example 26, the subject matter of Example 25 optionally
includes wherein the delay is eight milliseconds.
[0101] Example 27 is a device for automatic speech recognition
de-reverberation, the device comprising: means for obtaining a
portion of an audio stream, the portion of the audio stream being a
proper subset of the audio stream; means for creating a filter by
applying Generalized Weighted Prediction Error (GWPE) to the
portion of the audio stream; means for applying the filter to the
audio stream to remove reverberation; and means for providing a
filtered version of the audio stream to an audio stream
consumer.
[0102] In Example 28, the subject matter of Example 27 optionally
includes wherein the means for creating the filter occurs in a
first pipeline and the means for applying the filter occurs in a
second pipeline, the first and second pipelines executing in
parallel on the device.
[0103] In Example 29, the subject matter of any one or more of
Examples 27-28 optionally include wherein the means for obtaining
the portion of the audio stream includes means for buffering the
audio stream for a fixed time period.
[0104] In Example 30, the subject matter of Example 29 optionally
includes wherein the fixed time period is a second.
[0105] In Example 31, the subject matter of any one or more of
Examples 29-30 optionally include means for repeating creating the
filter with a subsequent fixed time period.
[0106] In Example 32, the subject matter of any one or more of
Examples 29-31 optionally include wherein the fixed time period is
an audio frame.
[0107] In Example 33, the subject matter of Example 32 optionally
includes wherein the audio frame length is thirty two
milliseconds.
[0108] In Example 34, the subject matter of any one or more of
Examples 27-33 optionally include wherein the means for creating
the filter includes means for combining a current GWPE application
to the audio stream with a previously created filter.
[0109] In Example 35, the subject matter of Example 34 optionally
includes wherein the means for combining the current GWPE
application to the audio stream with a previously created filter
includes means for adding the current GWPE application as a first
term to the previously created filter as a second term.
[0110] In Example 36, the subject matter of Example 35 optionally
includes wherein the means for combining the current GWPE
application to the audio stream with a previously created filter
includes means for applying a first scaling factor to the first
term and a second scaling factor to the second term prior to the
adding.
[0111] In Example 37, the subject matter of Example 36 optionally
includes wherein the second scaling factor is between zero and one
and wherein the first scaling factor is one minus the second
scaling factor.
[0112] In Example 38, the subject matter of any one or more of
Examples 27-37 optionally include wherein the means for applying
the filter to the audio stream includes means for introducing a
delay to the audio stream prior to applying the filter.
[0113] In Example 39, the subject matter of Example 38 optionally
includes wherein the delay is eight milliseconds.
[0114] Example 40 is a method for automatic speech recognition
de-reverberation, the method comprising: obtaining a portion of an
audio stream; the portion of the audio stream being a proper subset
of the audio stream; creating a filter by applying Generalized
Weighted Prediction Error (GWPE) to the portion of the audio
stream; applying the filter to the audio stream to remove
reverberation; and providing a filtered version of the audio stream
to an audio stream consumer.
[0115] In Example 41, the subject matter of Example 40 optionally
includes wherein creating the filter occurs in a first pipeline and
applying the filter occurs in a second pipeline; the first and
second pipelines executing in parallel on a device.
[0116] In Example 42, the subject matter of any one or more of
Examples 40-41 optionally include wherein obtaining the portion of
the audio stream includes buffering the audio stream for a fixed
time period.
[0117] In Example 43, the subject matter of Example 42 optionally
includes wherein the fixed time period is a second.
[0118] In Example 44, the subject matter of any one or more of
Examples 42-43 optionally include repeating creating the filter
with a subsequent fixed time period.
[0119] In Example 45, the subject matter of any one or more of
Examples 42-44 optionally include wherein the fixed time period is
an audio frame.
[0120] In Example 46, the subject matter of Example 45 optionally
includes wherein the audio frame length is thirty two
milliseconds.
[0121] In Example 47, the subject matter of any one or more of
Examples 40-46 optionally include wherein creating the filter
includes combining a current GWPE application to the audio stream
with a previously created filter.
[0122] In Example 48, the subject matter of Example 47 optionally
includes wherein combining the current GWPE application to the
audio stream with a previously created filter includes adding the
current GWPE application as a first term to the previously created
filter as a second term.
[0123] In Example 49, the subject matter of Example 48 optionally
includes wherein combining the current GWPE application to the
audio stream with a previously created filter includes applying a
first scaling factor to the first term and a second scaling factor
to the second term prior to the adding.
[0124] In Example 50, the subject matter of Example 49 optionally
includes wherein the second scaling factor is between zero and one
and wherein the first scaling factor is one minus the second
scaling factor.
[0125] In Example 51, the subject matter of any one or more of
Examples 40-50 optionally include introducing a delay to the audio
stream prior to applying the filter.
[0126] In Example 52, the subject matter of Example 51 optionally
includes wherein the delay is eight milliseconds.
[0127] Example 53 is a system comprising means to perform any of
the methods 40-52.
[0128] Example 54 is at least one machine readable medium including
instructions that, when executed by a machine, cause the machine to
perform any of methods 40-52.
[0129] Example 55 is at least one machine readable medium including
instructions for de-reverberation of an audio signal, the
instructions, when executed by a machine, causing the machine to
perform operations comprising: performing Generalized Weighted
Prediction Error (GWPE) in a first pipeline; and performing signal
processing in a second pipeline, the second pipeline and first
pipeline executing in parallel, the second pipeline applying the
output of the first pipeline to remove reverberation in an audio
signal processed by the second pipeline.
[0130] In Example 56, the subject matter of Example 55 optionally
includes buffering the audio signal in a buffer; providing contents
of the buffer every second to the first pipeline; and clearing the
buffer after providing the contents.
[0131] In Example 57, the subject matter of any one or more of
Examples 55-56 optionally include wherein the first pipeline
includes iteratively: calculating signal statistics; calculating a
de-reverb filter; and applying the de-reverb filter to remove
reverberation.
[0132] Example 58 is a method for de-reverberation of an audio
signal, the method comprising: performing Generalized Weighted
Prediction Error (GWPE) in a first pipeline; and performing signal
processing in a second pipeline, the second pipeline and first
pipeline executing in parallel, the second pipeline applying the
output of the first pipeline to remove reverberation in an audio
signal processed by the second pipeline.
[0133] In Example 59, the subject matter of Example 58 optionally
includes buffering the audio signal in a buffer; providing contents
of the buffer every second to the first pipeline; and clearing the
buffer after providing the contents.
[0134] In Example 60, the subject matter of any one or more of
Examples 58-59 optionally include wherein the first pipeline
iteratively includes: calculating signal statistics; calculating a
de-reverb filter; and applying the de-reverb filter to remove
reverberation.
[0135] Example 61 is a system comprising means to perform any of
the methods 58-60.
[0136] Example 62 is at least one machine readable medium including
instructions that, when executed by a machine, cause the machine to
perform any of methods 58-60.
[0137] Example 63 is a system for de-reverberation of an audio
signal, the system comprising: means for performing Generalized
Weighted Prediction Error (GWPE) in a first pipeline; and means for
performing signal processing in a second pipeline, the second
pipeline and first pipeline executing in parallel, the second
pipeline applying the output of the first pipeline to remove
reverberation in an audio signal processed by the second
pipeline.
[0138] In Example 64, the subject matter of Example 63 optionally
includes means for buffering the audio signal in a buffer; means
for providing contents of the butter every second to the first
pipeline; and means for clearing the buffer after providing the
contents.
[0139] In Example 65, the subject matter of any one or more of
Examples 63-64 optionally include wherein the first pipeline
includes means for iteratively: calculating signal statistics;
calculating a de-reverb filter; and applying the de-reverb filter
to remove reverberation.
[0140] Example 66 is at least one machine readable medium including
instructions for de-reverberation of an audio signal, the
instructions, when executed by a machine, causing the machine to
perform operations comprising: estimating signal statistics for an
audio signal; performing Generalized Weighted Prediction Error
(GWPE) using the estimated signal statistics; estimating a spatial
correlation matrix; creating weighted matrix and vector inputs from
the spatial correlation matrix estimation; and updating a de-reverb
filter with the weighted matrix and vector.
[0141] In Example 67, the subject matter of Example 66 optionally
includes are performed inline to other audio signal processing.
[0142] In Example 68, the subject matter of any one or more of
Examples 66-67 optionally include wherein only one signal frame is
buffered at a time to input into the operations.
[0143] Example 69 is a method for de-reverberation of an audio
signal, the method comprising: estimating signal statistics for an
audio signal; performing Generalized Weighted Prediction Error
(GWPE) using the estimated signal statistics; estimating a spatial
correlation matrix; creating weighted matrix and vector inputs from
the spatial correlation matrix estimation; and updating a de-reverb
filter with the weighted matrix and vector.
[0144] In Example 70, the subject matter of Example 69 optionally
includes are performed inline to other audio signal processing.
[0145] In Example 71, the subject matter of any one or more of
Examples 69-70 optionally include wherein only one signal frame is
buffered at a time to input into the operations.
[0146] Example 72 is a system comprising means to perform any of
the methods 69-71.
[0147] Example 73 is at least one machine readable medium including
instructions that, when executed by a machine, cause the machine to
perform any of methods 69-71.
[0148] Example 74 is a system for de-reverberation of an audio
signal, the system comprising: means for estimating signal
statistics for an audio signal; means for performing Generalized
Weighted Prediction Error (GWPE) using the estimated signal
statistics; means for estimating a spatial correlation matrix;
means for creating weighted matrix and vector inputs from the
spatial correlation matrix estimation; and means for updating a
de-reverb filter with the weighted matrix and vector.
[0149] In Example 75, the subject matter of Example 74 optionally
includes are performed inline to other audio signal processing.
[0150] In Example 76, the subject matter of any one or more of
Examples 74-75 optionally include wherein only one signal frame is
buffered at a time to input into the operations.
[0151] Example 77 is at least one machine readable medium including
instructions for de-reverberation of an audio signal, the
instructions, when executed by a machine, causing the machine to
perform operations comprising: performing de-reverb filter updating
in a first pipeline; and performing signal processing in a second
pipeline, the second pipeline and first pipeline executing in
parallel, the second pipeline applying the output of the first
pipeline to remove reverberation in an audio signal processed by
the second pipeline.
[0152] In Example 78, the subject matter of Example 77 optionally
includes wherein the de-reverb filter updating includes: estimating
signal statistics for the audio signal; performing Generalized
Weighted Prediction Error (GWPE) using the estimated signal
statistics; estimating a spatial correlation matrix; creating
weighted matrix and vector inputs from the spatial correlation
matrix estimation; and updating a de-reverb filter with the
weighted matrix and vector.
[0153] In Example 79, the subject matter of any one or more of
Examples 77-78 optionally include wherein only one signal frame is
buffered at a time to input into the first pipeline.
[0154] Example 80 is a method for de-reverberation of an audio
signal, the method comprising: performing de-reverb filter updating
in a first pipeline; and performing signal processing in a second
pipeline, the second pipeline and first pipeline executing in
parallel, the second pipeline applying the output of the first
pipeline to remove reverberation in an audio signal processed by
the second pipeline.
[0155] In Example 81, the subject matter of Example 80 optionally
includes wherein the de-reverb filter updating includes: estimating
signal statistics for the audio signal; performing Generalized
Weighted Prediction Error (GWPE) using the estimated signal
statistics; estimating a spatial correlation matrix; creating
weighted matrix and vector inputs from the spatial correlation
matrix estimation; and updating a de-reverb filter with the
weighted matrix and vector.
[0156] In Example 82, the subject matter of any one or more of
Examples 80-81 optionally include wherein only one signal frame is
buffered at a time to input into the first pipeline.
[0157] Example 83 is a system comprising means to perform any of
the methods 80-82.
[0158] Example 84 is at least one machine readable medium including
instructions that, when executed by a machine, cause the machine to
perform any of methods 80-82.
[0159] Example 85 is a system for de-reverberation of an audio
signal, the system comprising: means for performing dc-reverb
filter updating in a first pipeline; and means for performing
signal processing in a second pipeline, the second pipeline and
first pipeline executing in parallel, the second pipeline applying
the output of the first pipeline to remove reverberation in an
audio signal processed by the second pipeline.
[0160] In Example 86, the subject matter of Example 85 optionally
includes wherein the de-reverb filter updating includes: means for
estimating signal statistics for the audio signal; means for
performing Generalized Weighted Prediction Error (GWPE) using the
estimated signal statistics; means for estimating a spatial
correlation matrix; means for creating weighted matrix and vector
inputs from the spatial correlation matrix estimation; and means
for updating a de-reverb filter with the weighted matrix and
vector.
[0161] In Example 87, the subject matter of any one or more of
Examples 85-86 optionally include wherein only one signal frame is
buffered at a time to input into the first pipeline.
[0162] The above detailed description includes references to the
accompanying drawings, which form a part of the detailed
description. The drawings show, by way of illustration, specific
embodiments that may be practiced. These embodiments are also
referred to herein as "examples." Such examples may include
elements in addition to those shown or described. However, the
present inventors also contemplate examples in which only those
elements shown or described are provided. Moreover, the present
inventors also contemplate examples using any combination or
permutation of those elements shown or described (or one or more
aspects thereof), either with respect to a particular example (or
one or more aspects thereof), or with respect to other examples (or
one or more aspects thereof) shown or described herein.
[0163] All publications, patents, and patent documents referred to
in this document are incorporated by reference herein in their
entirety, as though individually incorporated by reference. In the
event of inconsistent usages between this document and those
documents so incorporated by reference, the usage in the
incorporated reference(s) should be considered supplementary to
that of this document; for irreconcilable inconsistencies, the
usage in this document controls.
[0164] In this document, the terms "a" or "an" are used, as is
common in patent documents, to include one or more than one,
independent of any other instances or usages of "at least one" or
"one or more." In this document, the term "or" is used to refer to
a nonexclusive or, such that "A or B" includes "A but not B," "B
but not A," and "A and B," unless otherwise indicated. In the
appended claims, the terms "including" and "in which" are used as
the plain-English equivalents of the respective terms "comprising"
and "wherein." Also, in the following claims, the terms "including"
and "comprising" are open-ended, that is, a system, device,
article, or process that includes elements in addition to those
listed after such a term in a claim are still deemed to fall within
the scope of that claim. Moreover, in the following claims, the
terms "first," "second," and "third," etc. are used merely as
labels, and are not intended to impose numerical requirements on
their objects.
[0165] The above description is intended to be illustrative, and
not restrictive. For example, the above-described examples (or one
or more aspects thereof) may be used in combination with each
other. Other embodiments may be used, such as by one of ordinary
skill in the art upon reviewing the above description. The Abstract
is to allow the reader to quickly ascertain the nature of the
technical disclosure and is submitted with the understanding that
it will not be used to interpret or limit the scope or meaning of
the claims. Also, in the above Detailed. Description, various
features may be grouped together to streamline the disclosure. This
should not be interpreted as intending that an unclaimed disclosed
feature is essential to any claim. Rather, inventive subject matter
may lie in less than all features of a particular disclosed
embodiment. Thus, the following claims are hereby incorporated into
the Detailed Description, with each claim standing on its own as a
separate embodiment. The scope of the embodiments should be
determined with reference to the appended claims, along with the
full scope of equivalents to which such claims are entitled.
* * * * *