U.S. patent application number 13/955186 was filed with the patent office on 2014-09-18 for apparatus and method for power efficient signal conditioning for a voice recognition system.
This patent application is currently assigned to Motorola Mobility LLC. The applicant listed for this patent is Motorola Mobility LLC. Invention is credited to Kevin J. Bastry, Joel A. Clark, Plamen A. Ivanov, Mark A. Jasiuk, Tenkasi V. Ramabadran, Jincheng Wu.
Application Number | 20140278393 13/955186 |
Document ID | / |
Family ID | 51531813 |
Filed Date | 2014-09-18 |
United States Patent
Application |
20140278393 |
Kind Code |
A1 |
Ivanov; Plamen A. ; et
al. |
September 18, 2014 |
Apparatus and Method for Power Efficient Signal Conditioning for a
Voice Recognition System
Abstract
A disclosed method includes monitoring an audio signal energy
level while having a plurality of signal processing components
deactivated and activating at least one signal processing component
in response to a detected change in the audio signal energy level.
The method may include activating and running a voice activity
detector on the audio signal in response to the detected change
where the voice activity detector is the at least one signal
processing component. The method may further include activating and
running the noise suppressor only if a noise estimator determines
that noise suppression is required. The method may activate and
runs a noise type classifier to determine the noise type based on
information received from the noise estimator and may select a
noise suppressor algorithm, from a group of available noise
suppressor algorithms, where the selected noise suppressor
algorithm is the most power consumption efficient.
Inventors: |
Ivanov; Plamen A.;
(Schaumburg, IL) ; Bastry; Kevin J.; (Milwaukee,
WI) ; Clark; Joel A.; (Woodridge, IL) ;
Jasiuk; Mark A.; (Chicago, IL) ; Ramabadran; Tenkasi
V.; (Oswego, IL) ; Wu; Jincheng; (Naperville,
IL) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Motorola Mobility LLC |
Libertyville |
IL |
US |
|
|
Assignee: |
Motorola Mobility LLC
Libertyville
IL
|
Family ID: |
51531813 |
Appl. No.: |
13/955186 |
Filed: |
July 31, 2013 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61827797 |
May 28, 2013 |
|
|
|
61776793 |
Mar 12, 2013 |
|
|
|
61798097 |
Mar 15, 2013 |
|
|
|
Current U.S.
Class: |
704/233 |
Current CPC
Class: |
G10L 2021/02161
20130101; G10L 15/20 20130101; G10L 21/0216 20130101; G10L 21/0364
20130101; G10L 25/21 20130101; G10L 15/28 20130101 |
Class at
Publication: |
704/233 |
International
Class: |
G10L 15/20 20060101
G10L015/20 |
Claims
1. A method comprising: monitoring an audio signal energy level
with a plurality of signal processing components deactivated; and
activating at least one signal processing component of the
plurality of signal processing components in response to a detected
change in the audio signal energy level.
2. The method of claim 1, wherein activating at least one signal
processing component comprises: activating and running a voice
activity detector on the audio signal in response to the detected
change in the audio energy level, the voice activity detector being
one of the plurality of signal processing components.
3. The method of claim 2, further comprising: activating and
running a noise estimator in response to voice being detected in
the audio signal by the voice activity detector.
4. The method of claim 3, further comprising: determining, by the
noise estimator, that noise suppression is required for the audio
signal; and activating and running a noise suppressor on the audio
signal in response to the noise estimator determination.
5. The method of claim 3, further comprising: activating and
running a noise type classifier to determine a noise type based on
information received from the noise estimator; and selecting a
noise suppressor algorithm based on the determined noise type.
6. The method of claim 3, further comprising: determining, by the
noise estimator, that noise suppression is not required for the
audio signal; and performing voice recognition on the audio signal
without activating a noise suppressor.
7. The method of claim 1, further comprising: activating at least
one additional microphone to receive the audio signal in response
to the detected change in the audio signal energy level.
8. The method of claim 7, further comprising: deactivating the at
least one additional microphone and returning to a single
microphone configuration in response to voice not being detected in
the audio signal by a voice activity detector or a second detected
change in the audio signal energy level.
9. The method of claim 1, further comprising: calculating, by an
energy estimator, a long term energy baseline and a short term
deviation wherein monitoring the audio signal energy level is
performed by the energy estimator.
10. The method of claim 9, further comprising: buffering the audio
signal in response to a detected short term deviation.
11. An apparatus comprising: a noise suppressor; a voice activity
detector; and an energy estimator, operatively coupled to the voice
activity detector, the energy estimator operative to monitor an
audio signal energy level with at least the noise suppressor and
the voice activity detector deactivated and to activate at least
the voice activity detector in response to a detected change in the
audio signal energy level.
12. The apparatus of claim 11, further comprising: a noise
estimator, operatively coupled to the voice activity detector,
wherein the voice activity detector is operative to activate the
noise estimator in response to voice being detected in the audio
signal.
13. The apparatus of claim 12, further comprising: a buffer,
operatively coupled to the energy estimator, the buffer operative
to receive a buffer control signal from the energy estimator and to
buffer the audio signal in response to the buffer control signal,
the energy estimator operative to send the buffer control signal in
response to the detected change in the audio signal energy
level.
14. The apparatus of claim 13, further comprising: a switch,
operatively coupled to the noise suppressor and to the noise
estimator to receive a switch control signal, the switch operative
to change over between a noise suppressed audio signal output from
the noise suppressor and a buffered audio signal output from the
buffer, according to the switch control signal; and wherein the
noise estimator is operative to send the switch control signal.
15. The apparatus of claim 14, further comprising: a noise
suppressor algorithms selector, operatively coupled to the noise
estimator and to the noise suppressor, the noise suppressor
algorithms selector operative to activate and run the noise
suppressor in response to a noise estimator control signal sent
when the noise estimator determines that noise suppression is
required.
16. The apparatus of claim 15, further comprising: a noise type
classifier, operatively coupled to the noise estimator and to the
noise suppressor algorithms selector, the noise type classifier
operative to activate and run in response to a control signal from
the noise estimator, and operative to determine noise type based on
information received from the noise estimator.
17. The apparatus of claim 16, wherein the noise suppressor
algorithms selector is further operative to select a noise
suppressor algorithm based on the noise type determined by the
noise type classifier.
18. The apparatus of claim 14, where the noise estimator is further
operative to determine that noise suppression is not required and
send the switch control signal to change over from the noise
suppressed audio signal output from the noise suppressor to the
buffered audio signal output from the buffer.
19. The apparatus of claim 11, further comprising: a plurality of
microphones; and microphone configuration logic operative to turn
each microphone on or off; and wherein the energy estimator is
further operative to control the microphone configuration logic to
turn on at least one additional microphone in response to the
detected change in the audio signal energy level.
20. The apparatus of claim 19, wherein the voice activity detector
is operative to deactivate the at least one additional microphone
and return to a single microphone configuration in response to
voice not being detected in the audio signal by the voice activity
detector.
21. The apparatus of claim 14, further comprising: voice command
recognition logic, having an input operatively coupled to the
switch.
22. The apparatus of claim 21, further comprising: a transceiver
having an input operatively coupled to the switch.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims priority to U.S. Provisional
Patent Application No. 61/827,797, filed May 28, 2013, entitled
"APPARATUS AND METHOD FOR POWER EFFICIENT SIGNAL CONDITIONING IN A
VOICE RECOGNITION SYSTEM," and further claims priority to U.S.
Provisional Patent Application No. 61/798,097, filed Mar. 15, 2013,
entitled "VOICE RECOGNITION FOR A MOBILE DEVICE," and further
claims priority to U.S. Provisional Pat. App. No. 61/776,793, filed
Mar. 12, 2013, entitled "VOICE RECOGNITION FOR A MOBILE DEVICE,"
all of which are assigned to the same assignee as the present
application, and all of which are hereby incorporated by reference
herein in their entirety.
FIELD OF THE DISCLOSURE
[0002] The present disclosure relates generally to voice signal
processing and more particularly to voice signal processing for
voice recognition systems.
BACKGROUND
[0003] Mobile devices such as, but not limited to, mobile phones,
smart phones, personal digital assistants (PDAs), tablets, laptops
or other electronic devices, etc., increasingly include voice
recognition systems to provide hands free voice control of the
devices. Although voice recognition technologies have been
improving, accurate voice recognition remains a technical
challenge.
[0004] A particular challenge when implementing voice recognition
systems on mobile devices is that, as the mobile device moves or is
positioned in certain ways, the acoustic environment of the mobile
device changes accordingly thereby changing the sound perceived by
the mobile device's voice recognition system. Voice sound that may
be recognized by the voice recognition system under one acoustic
environment may be unrecognizable under certain changed conditions
due to mobile device motion or positioning. Various other
conditions in the surrounding environment can add noise, echo or
cause other acoustically undesirable conditions that also adversely
impact the voice recognition system.
[0005] More specifically, the mobile device acoustic environment
impacts the operation of signal processing components such as
microphone arrays, noise suppressors, echo cancellation systems and
signal conditioning that is used to improve voice recognition
performance. Such signal processing operations for voice
recognition improvement are not power efficient and increase the
drain on battery power. Because users expects voice recognition
systems to be available as needed, various voice recognition system
programs, processes or services may be required to run continuously
resulting in further increased power consumption.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] FIG. 1 is a schematic block diagram of an apparatus in
accordance with the embodiments.
[0007] FIG. 2 is a flow chart providing an example method of
operation of the apparatus of FIG. 1 in accordance with various
embodiments.
[0008] FIG. 3 is a flow chart showing a method of operation related
to voice signal detection in accordance with various
embodiments.
[0009] FIG. 4 is a flow chart showing a method of operation related
to selection of signal processing in accordance with various
embodiments.
[0010] FIG. 5 is a flow chart showing a method of operation in
accordance with various embodiments.
[0011] FIG. 6 is a flow chart showing a method of operation in
accordance with various embodiments.
DETAILED DESCRIPTION
[0012] Briefly, the disclosed embodiments detect when conditions
require the use of accurate, and thus less power efficient, signal
processing to assist in voice recognition. Such power intensive
signal processing is turned off or otherwise disabled to conserve
battery power for as long as possible. The disclosed embodiments
achieve a progressive increase of accuracy by running more
computationally efficient signal processing on fewer resources and
making determinations of when to invoke more sophisticated signal
processing based on detected changes of conditions. More
particularly, based on information obtained from signal
observations, decisions may be made to power-off hardware that is
not needed. In other words, when conditions improve from the
standpoint of voice recognition performance, the amount of signal
processing is ramped down which results in more efficient use of
resources and decreased battery power consumption.
[0013] Among other advantages of the disclosed embodiments, power
consumption is minimized by optimizing voice recognition system
operation in every software and hardware layer, including switching
off non-essential hardware, running power efficient signal
processing and relying on accurate, less power efficient signal
processing only when needed to accommodate acoustic environment
conditions.
[0014] A disclosed method of operation includes monitoring an audio
signal energy level with a plurality of signal processing
components deactivated, and activating at least one signal
processing component of the plurality of signal processing
components in response to a detected change in the audio signal
energy level. The method may further include activating and running
a voice activity detector on the audio signal in response to the
detected change in the audio energy level where the voice activity
detector is one of the signal processing components that is
otherwise kept deactivated. In one embodiment, a method of
operation includes monitoring an audio signal energy level while
having a noise suppressor deactivated to conserve battery power,
buffering the audio signal in response to a detected change or
increase in the audio energy level, activating and running a voice
activity detector on the audio signal in response to the detected
change or increase in the audio energy level and activating and
running a noise estimator in response to voice being detected in
the audio signal by the voice activity detector. In some
embodiments, the method may further include activating and running
the noise suppressor only if the noise estimator determines that
noise suppression is required. In some embodiments, the method may
further include activating and running a noise type classifier to
determine the noise type based on information received from the
noise estimator and selecting a noise suppressor algorithm, from a
group of available noise suppressor algorithms, based on the noise
type. The selected noise suppressor algorithm may also be selected
based on power consumption efficiency for the noise type. The
method may further include determining, by the noise estimator,
that noise suppression is not required, and performing voice
recognition on the buffered audio signal without activating the
noise suppressor.
[0015] The method may also include applying gain to the buffered
audio signal prior to performing voice recognition. The method may
include activating additional microphones to receive audio in
response to the detected increase in the audio energy level. The
method of operation may deactivate the additional microphones and
return to a single microphone configuration in response to voice
not being detected in the audio signal by the voice activity
detector. The energy estimator calculates a long term energy
baseline and a short term deviation from it, and monitors the audio
signal energy level while having a noise suppressor, or other
signal processing components, deactivated to conserve battery
power. The method of operation may include buffering the audio
signal in response to a detected short term deviation.
[0016] A disclosed apparatus in one embodiment includes a noise
suppressor, a voice activity detector and an energy estimator that
is operatively coupled to the voice activity detector. The energy
estimator is operative to monitor an audio signal energy level with
at least the noise suppressor and the voice activity detector
deactivated. Upon detecting a change in the audio signal energy
level, the voice activity detector is operative to activate at
least the voice activity detector in response to the detected
change. In one embodiment, a disclosed apparatus includes voice
recognition logic, a noise suppressor operatively coupled to the
voice recognition logic, an energy estimator operative to monitor
an audio signal energy level while the noise suppressor is
deactivated to conserve battery power, and a voice activity
detector operatively coupled to the energy estimator. The voice
activity detector is operative to activate in response to a first
activation control signal from the energy estimator. A noise
estimator is operatively coupled to the voice activity detector.
The noise estimator is operative to activate in response to a
second activation control signal from the voice activity
detector.
[0017] In various embodiments, the apparatus may include a buffer
that is operatively coupled to the voice recognition logic and the
energy estimator. The buffer is operative to receive a control
signal from the energy estimator and to buffer the audio signal in
response to the control signal. The energy estimator may be further
operative to send the first activation control signal to the voice
activity detector in response to a detected change or increase in
the audio signal energy level. The voice activity detector is
operative to send the second activation control signal to the noise
estimator in response to detecting voice in the audio signal.
[0018] In various embodiments, the apparatus may include a switch
that is operatively coupled to the voice recognition logic, the
noise suppressor and the noise estimator. The noise estimator may
actuate the switch to switch the audio signal sent to the voice
recognition logic from a buffered audio signal to a noise
suppressed audio signal output by the noise suppressor. The
apparatus may further include a noise suppressor algorithms
selector, operatively coupled to the noise estimator and to the
noise suppressor. The noise suppressor algorithms selector
operative to activate and run the noise suppressor in response to a
noise estimator control signal sent when the noise estimator
determines that noise suppression is required.
[0019] In some embodiments, the apparatus may further include a
noise type classifier, operatively coupled to the noise estimator
and to the noise suppressor algorithms selector. The noise type
classifier is operative to activate and run in response to a
control signal from then noise estimator, and is operative to
determine noise type based on information received from the noise
estimator. The noise suppressor algorithms selector may be further
operative to select a noise suppressor algorithm, from a group of
available noise suppressor algorithms, where the selected noise
suppressor algorithm is the most power consumption efficient for
the noise type. The noise estimator may also be operative to
determine that noise suppression is not required and actuate the
switch to switch the audio signal sent to the voice recognition
logic from a noise suppressed audio signal output by the noise
suppressor to a buffered audio signal.
[0020] In some embodiments, the apparatus includes a plurality of
microphones and microphone configuration logic that includes, among
other things, switch logic operative to turn each microphone on or
off. The energy estimator is further operative to control the
microphone configuration logic to turn on additional microphones in
response to a detected change or increase in the audio signal
energy level. The voice activity detector may be further operative
to deactivate the additional microphones and return to a single
microphone configuration, or to a low power mode of that
microphone, in response to voice not being detected in the audio
signal by the voice activity detector.
[0021] Turning now to the drawings, FIG. 1 is a schematic block
diagram of an apparatus 100 which is a voice recognition system in
accordance with various embodiments. The apparatus 100 may be
incorporated into and used in various battery-powered electronic
devices that employ voice-recognition. That is, the apparatus 100
may be used in any of various mobile devices such as, but not
limited to, a mobile telephone, smart phone, camera, video camera,
tablet, laptop, audio recorder or some other battery-powered
electronic device, etc.
[0022] It is to be understood that the FIG. 1 schematic block
diagram is limited, for the purpose of clarity, to showing only
those components useful to describe the features and advantages of
the various embodiments, and to describe how to make and use the
various embodiments to those of ordinary skill. It is therefore to
be understood that various other components, circuitry, and devices
etc. may be present in order to implement an apparatus and that
those various other components, circuitry, devices, etc., are
understood to be present by those of ordinary skill. For example,
the apparatus may include inputs for receiving power from a power
source and a power bus that may be connected to a battery housed
within one of the various battery powered electronic devices such
as mobile devices, etc. to provide power to the apparatus 100 or to
distribute power to the various components of the apparatus
100.
[0023] Another example is that the apparatus 100 may also include
an internal communication bus, for providing operative coupling
between the various components, circuitry, and devices. The
terminology "operatively coupled" as used herein refers to coupling
that enables operational and/or functional communication and
relationships between the various components, circuitry, devices
etc. described as being operatively coupled and may include any
intervening items (i.e. buses, connectors, other components,
circuitry, devices etc.) used to enable such communication such as,
for example, internal communication buses such as data
communication buses or any other intervening items that one of
ordinary skill would understand to be present. Also, it is to be
understood that other intervening items may be present between
"operatively coupled" items even though such other intervening
items are not necessary to the functional communication facilitated
by the operative coupling. For example, a data communication bus
may be present in various embodiments and may provide data to
several items along a pathway along which two or more items are
operatively coupled, etc. Such operative coupling is shown
generally in FIG. 1 described herein.
[0024] In FIG. 1 the apparatus 100 may include a group of
microphones 110 that provide microphone outputs and that are
operatively coupled to microphone configuration logic 120. Although
the example of FIG. 1 shows three microphones, the embodiments are
not limited to three microphones and any number of microphones may
be used in the embodiments. It is to be understood that the group
of microphones 110 are shown using a dotted line in FIG. 1 because
the group of microphones 110 is not necessarily a part of the
apparatus 100. In other words, the group of microphones 110 may be
part of a mobile device or some other device into which the
apparatus 100 is incorporated. In that case, the apparatus 100 is
operatively coupled to the group of microphones 110, which are
located within the mobile device, via a suitable communication bus
or suitable connectors, etc., such that the group of microphones
110 are operatively coupled to the microphone configuration logic
120.
[0025] The microphone configuration logic 120 may include various
front end processing, such as, but not limited to, signal
amplification, analog-to-digital conversion/digital audio sampling,
echo cancellation, etc., which may be applied to the microphone M1,
M2, M3 outputs prior to performing additional, less power efficient
signal processing such as noise suppression. The microphone
configuration logic 120 may also include switch logic operatively
coupled to the group of microphones 110 and operative to respond to
control signals to turn each of microphones M1, M2 or M3 on or off
so as to save power consumption by not using the front end
processing of the microphone configuration logic 120 for those
microphones that are turned off. Additionally, in some embodiments,
the microphone configuration logic 120 may be operative to receive
control signals from other components of the apparatus 100 to
adjust front end processing parameters such as, for example,
amplifier gain.
[0026] The microphone configuration logic 120 is operatively
coupled to a history buffer 130, to provide the three microphone
outputs M1, M2 and M3 to the history buffer 130. Microphone
configuration logic 120 is also operatively coupled to an energy
estimator 140 and provides a single microphone output M3 to the
energy estimator 140. The energy estimator 140 is operatively
coupled to the history buffer 130 and to a voice activity detector
150. The energy estimator 140 provides a control signal 115 to the
history buffer 130, a control signal 117 to the voice activity
detector 150 and a control signal 121 to the microphone
configuration logic 120.
[0027] The voice activity detector 150 is also operatively coupled
to the microphone configuration logic 120 to receive the microphone
M3 output and to provide a control signal 123 to microphone
configuration logic 120. The voice activity detector 150 is further
operatively coupled to a signal-to-noise ratio (SNR) estimator 160
and provides a control signal 119. The signal-to-noise ratio (SNR)
estimator 160 is operatively coupled to the history buffer 130, a
noise type classifier 170, a noise suppressor algorithms selector
180, and a switch 195. In the various embodiments, the various
signal processing components such as the voice activity detector
150, SNR estimator 160, noise type classifier 170, noise suppressor
algorithms selector 180 and noise suppressor 190 are kept in a
deactivated state until needed and are progressively activated
according to various decisions which may also be made
progressively. Likewise, activated signal components are
progressively deactivated when no longer needed in accordance with
the embodiments.
[0028] The SNR estimator 160 receives a buffered voice signal 113
from the history buffer 130 and provides control signal 127 to the
switch 195, control signal 129 to noise type classifier 170, and
control signal 135 to the noise suppressor algorithms selector 180.
The noise type classifier 170 is operatively coupled to the history
buffer 130, the SNR estimator 160 and the noise suppressor
algorithms selector 180.
[0029] The noise type classifier 170 receives a buffered voice
signal 111 from the history buffer 130 and provides a control
signal 131 to the noise suppressor algorithms selector 180. The
noise suppressor algorithms selector 180 is operatively coupled to
the SNR estimator 160, the noise type classifier 170, the
microphone configuration logic 120, a noise suppressor 190 and
system memory 107. The noise suppressor algorithms selector 180
provides a control signal 125 to the microphone configuration logic
120 and a control signal 137 to a noise suppressor 190. The noise
suppressor algorithms selector 180 is also operatively coupled to
system memory 107 by a read-write connection 139.
[0030] The noise suppressor 190 receives the buffered voice signal
111 from the history buffer 130 and provides a noise suppressed
voice signal 133 to the switch 195. The noise suppressor 190 may
also be operatively coupled to system memory 107 by a read-write
connection 143 in some embodiments. The switch 195 is operatively
coupled to the noise suppressor 190 and to automatic gain control
(AGC) 105, and provides voice signal 141 to the AGC 105. Voice
command recognition logic 101 is operatively coupled to AGC 105 and
to the system control 103, which may be any type of voice
controllable system control depending on the mobile device such as,
but not limited to, a voice controlled dialer of a mobile
telephone, a video recorder system control, an application control
of a mobile telephone, smartphone, tablet, laptop, etc., or any
other type of voice controllable system control. The AGC 105
adjusts the voice signal 141 received from the switch 195 and
provides a gain adjusted voice signal 145 to the voice command
recognition logic 101. The voice command recognition logic 101
sends a control signal 147 to the system control 103 in response to
detected command words or command phrases received on the voice
signal 145. In some embodiments, a transceiver 197 may also be
present and may be operatively coupled to receive either the gain
adjusted voice signal 145 as shown, or to receive the voice signal
141. The transceiver 197 may be a wireless transceiver for wireless
communication using any wireless technology and may utilize the
received voice signal as an uplink (i.e. send) transmission portion
of a wireless duplex communication channel in embodiments where the
apparatus 100 is used in a mobile telephone or smartphone, or etc.
In alternative embodiments, either the gain adjusted voice signal
145 or the voice signal 141 may also be provided to a transceiver
external to the apparatus 100 using appropriate connectivity
between the apparatus 100 and a device into which the apparatus 100
is incorporated. In some embodiments, the transceiver 197 may be
used to transmit voice commands to a remote voice command
recognition system.
[0031] The system memory 107 is a non-volatile, non-transitory
memory, and may be accessible by other components of the apparatus
100 for various settings, stored applications, etc. In some
embodiments system memory 107 may store a database of noise
suppression algorithms 109, which may be accessed by noise
suppressor algorithms selector 180, over read-write connection 139.
In some embodiments, the noise suppressor 190 access system memory
107 over read-write connection 143 and may retrieve selected noise
suppression algorithms from the database of noise suppression
algorithms 109 for execution.
[0032] The switch 195 is operative to respond to the control signal
127 from the SNR estimator 160, to switch its output voice signal
141 between the buffered voice signal 111 and the noise suppressor
190 noise suppressed voice signal 133. In other words, switch 195
operates as a changeover switch. The output voice signal 141 from
switch 195 is provided to the AGC 105.
[0033] The disclosed embodiments employ voice activity detector 150
to distinguish voice activity from noise and accordingly enable the
voice command recognition logic 101 and noise reduction as needed
to improve voice recognition performance. The embodiments also
utilize a low power noise estimator, SNR estimator 160, to
determine when to enable or disable noise reduction thereby saving
battery power. For example, under low noise conditions, the noise
reduction can be disabled accordingly. Also, some microphones may
be turned off during low noise conditions which also conserves
battery power.
[0034] Various actions may be triggered or invoked in the
embodiments based on voice activity or other criteria that
progressively ramp up the application of signal processing
requiring increased power consumption. For example, the voice
activity detector 150 may trigger operation of noise suppressor 190
or may send control signal 123 to the microphone configuration
logic 120 to increase front end processing gain, rather than invoke
the noise suppressor 190, initially for low noise conditions.
[0035] For a high noise environment, dual-microphone noise
reduction may be enabled. For low noise environments, a single
microphone may be used, and the energy estimator 140 may create a
long term energy base line from which rapid deviations will trigger
the noise suppressor 190 and voice activity detector (VAD) 150 to
analyze the voice signal and to decide when noise reduction should
be applied. For example, an absolute ambient noise measurement may
be used to decide if noise reduction should be applied and, if so,
the type of noise reduction best suited for the condition. That is,
because the noise suppressor algorithms selected will impact power
consumption, selectively running or not running certain noise
suppressor algorithms serves to minimize battery power
consumption.
[0036] Thus, the energy estimator 140 is operative to detect
deviations from a baseline that may be an indicator of voice being
present in a received audio signal, received, for example, from
microphone M3. If such deviations are detected, the energy
estimator 140 may send control signal 117 to activate VAD 150 to
determine if voice is actually present in the received audio
signal.
[0037] An example method of operation of the apparatus 100 may be
understood in view of the flowchart of FIG. 2. The method of
operation begins in operation block 201 which represents a default
state in which the microphone configuration logic 120 is controlled
to use a single microphone configuration in order to conserve
battery power. Any front end processing of the microphone
configuration logic 120 for other microphones of the group of
microphones 110 is therefore turned off. In operation block 203 the
energy estimator 140 determines an energy baseline. The energy
estimator 140 first calculates the signal level and long term power
estimates, and short-term deviation from the long-term baseline.
Short-term deviations exceeding a threshold invoke powering
multiple microphones and buffering the signals.
[0038] Specifically, in decision block 205, the energy estimator
140 monitors the audio output from one microphone such as
microphone M3 and looks for changes in the audio signal energy
level. If an observed short-term deviation exceeds the threshold in
decision block 205, the energy estimator 140 sends control signal
121 to the microphone configuration logic 120 to turn on at least
one additional microphone as shown in operation block 207. In
operation block 213, the energy estimator 140 also sends control
signal 115 to history buffer 130 to invoke buffering of audio
signals from the activated microphones since the buffered audio may
need to have noise suppression applied in operation block 229.
Also, in operation block 209, energy estimator 140 sends control
signal 117 to VAD 150 to activate VAD 150 to determine if speech is
present in the M3 audio signal. If the observed short-term
deviation observed by the energy estimator 140 does not exceed the
threshold in decision block 205, the energy estimator 140 continues
to monitor the single microphone as in operation block 201. In
other words, the energy estimator 140 operates to monitor an audio
signal from at least one of the microphones while other signal
processing components remain deactivated. A deactivated signal
processing component is one that is powered down or placed in a low
power mode such as a sleep state where the signal processing
component operates with either no power consumption or with reduced
power consumption. The signal processing component is therefore
activated when it is either powered up or is awaken from a low
power mode such as a sleep state.
[0039] In decision block 211, if the VAD 150 does not detect
speech, the VAD 150 sends control signal 123 to the microphone
configuration logic 120 and returns the system to a lower power
state. For example, in operation block 231, the control signal 123
may turn off any additional microphones so that only a single
microphone is used. If voice (i.e. speech activity) is detected in
decision block 211, then VAD 150 sends control signal 119 to
activate SNR estimator 160. In operation block 215, the SNR
estimator 160 proceeds to estimate short-term signal-to-noise ratio
and signal levels in order to determine if de-noising is
needed.
[0040] If noise reduction is not needed in decision block 217, the
SNR estimator 160 may send control signal 127 to the switch 195 to
maintain the apparatus 100 in a low power state, i.e. bypassing and
not using the noise suppressor 190. The apparatus 100 may also be
returned to a single microphone mode of operation. For example, the
noise suppressor algorithms selector 180 may send control signal
125 to the microphone configuration logic 120 to switch off any
additional microphones. In operation block 219, the voice signal
141 is provided to the AGC 105 and is gained up to obtain the level
required and the gain adjusted voice signal 145 is sent to the
voice command recognition logic 101. In operation block 221, the
voice command recognition logic 101 and, if command words or
command phrases are detected, may send control signal 147 to the
system control 103. The method of operation then ends. If noise
reduction is determined to be necessary by the SNR estimator 160 in
decision block 217, then the SNR estimator 160 sends control signal
129 to activate noise type classifier 170 as shown in operation
block 223.
[0041] In operation block 223, the noise type classifier 170
receives the buffered voice signal 111, and may also receive
signal-to-noise ratio information from SNR estimator 160 via
control signal 129. The noise type classifier 170 assigns a noise
type and sends the noise type information by control signal 131 to
noise suppressor algorithms selector 180. The noise suppressor
algorithms selector 180 may also receive information from SNR
estimator 160 via control signal 135. In operation block 225, the
noise suppressor algorithms selector 180 proceeds to select an
appropriate noise suppressor algorithm for the observed conditions
(i.e. observed SNR and noise type). This may be accomplished, in
some embodiments, by accessing system memory 107 over read-write
connection 139. The system memory 107 may store the database of
noise suppression algorithms 109 and any other useful information
such as an associated memory table that can be used to compare
observed SNR and noise types to select a suitable noise suppression
algorithm. The noise suppressor algorithms selector 180 may then
send control signal 137 to activate noise suppressor 190 and to
provide a pointer to the location in system memory 107 of the
selected noise suppression algorithm. In operation block 227, the
noise suppressor algorithms selector 180 may also send control
signal 125 to the microphone configuration logic to make any
adjustments that might be needed in relation to the selected noise
suppressor algorithm.
[0042] In operation block 229, the noise suppressor 190 may access
system memory 107 and the database of noise suppression algorithms
109 over read-write connection 143 to access the selected noise
suppression algorithm and execute it accordingly. The SNR estimator
160 will also send control signal 127 to switch 195 to switch to
receive the noise suppressed voice signal 133 output from noise
suppressor 190, rather than the buffered voice signal 111. Instead,
the noise suppressor 190 receives the buffered voice signal 111,
applies the selected noise suppression algorithm and provides the
noise suppressed voice signal 133 to switch 195. The method of
operation then again proceeds to operation block 219 where the
voice signal 141 is provided to the AGC 105 and is gained up to
obtain the level required and the gain adjusted voice signal 145 is
sent to the voice command recognition logic 101. In operation block
221, the voice command recognition logic 101 operates on the gain
adjusted voice signal 145 and the method of operation ends as
shown. The apparatus 100 may then return to single microphone
operation and the method of operation beginning at operation block
201 may continue.
[0043] Initially, in the embodiments, a noise suppressor algorithm
is invoked based on the attempt to determine the type of noise
present in the environment, based on the noise type, and signal to
noise ratio. As the noise conditions worsen, different noise
algorithms can be used, with progressively increased complexity and
power consumption cost. As discussed above with respect to decision
block 211, the system returns to low power state after a negative
VAD 150 decision or, in some embodiments after some time-out
period.
[0044] In another embodiment, the apparatus 100 may run a
continuous single microphone powered, long-term noise
estimator/classifier which can store a set of noise estimates to be
used by the noise reduction system to aid speed up convergence. In
yet another embodiment, a continuously run VAD may be employed to
look for speech activity. In both embodiments, the apparatus will
remain in an elevated power state returning from voice recognition
invocation into VAD estimation.
[0045] It is to be understood that the various components,
circuitry, devices etc. described with respect to FIG. 1 including,
but not limited to, those described using the term "logic," such as
the microphone configuration logic 120, history buffer 130, energy
estimator 140, VAD 150, SNR estimator 160, noise type classifier
170, noise suppressor algorithms selector 180, noise suppressor
190, switch 195, AGC 105, voice command recognition logic 101, or
system control 103 may be implemented in various ways such as by
software and/or firmware executing on one or more programmable
processors such as a central processing unit (CPU) or the like, or
by ASICs, DSPs, FPGAs, hardwired circuitry (logic circuitry), or
any combinations thereof.
[0046] Also, it is to be understood that the various "control
signals" described herein with respect to FIG. 1 and the various
aforementioned components, may be implemented in various ways such
as using application programming interfaces (APIs) between the
various components. Therefore, in some embodiments, components may
be operatively coupled using APIs rather than a hardware
communication bus if such components are implemented as by software
and/or firmware executing on one or more programmable processors.
For example, the noise suppressor algorithms selector 180 and the
noise suppressor 190 may be software and/or firmware executing on a
single processor and may communicate and interact with each other
using APIs.
[0047] Additionally, operations involving the system memory 107 may
be implemented using pointers where the components such as, but not
limited to, the noise suppressor algorithms selector 180 or the
noise suppressor 190, access the system memory 107 as directed by
control signals which may include pointers to memory locations or
database access commands that access the database of noise
suppression algorithms 109. In other words, such operations may be
accomplished in the various embodiments using application
programming interfaces (APIs).
[0048] Further methods of operation of various embodiments are
illustrated by the flowcharts of FIG. 3 and FIG. 4. FIG. 3 is a
flow chart showing a method of operation related to voice signal
detection in accordance with various embodiments. In operation
block 301, an apparatus uses a microphone signal level as a measure
to determine if pre-processing is needed. In operation block 303,
the apparatus runs a detector for energy deviations from a long
term base-line and invokes VAD/noise estimators to make decisions
as to when voice recognition logic should operate. In operation
block 305, the apparatus detects the need for signal conditioning
based on a low-power noise estimator (i.e. by running the noise
estimator only). In operation block 307, the apparatus uses a VAD
to determine voice activity from noise and to determine to whether
or not to run noise suppression, or voice recognition, and runs one
or the other only when needed. In operation block 309, the
apparatus will classify the noise type, and based on noise type,
will invoke appropriate noise suppression or other appropriate
signal conditioning.
[0049] FIG. 4 is a flow chart showing a method of operation related
to selection of signal processing in accordance with various
embodiments. In operation block 401, the apparatus determines which
microphones are not needed (as well as any associated circuitry
such as amplifiers, A/D converters etc.) and turns off the
microphones (and any associated circuitry) accordingly. In
operation block 403, the apparatus uses a single microphone for
continuously running triggers/estimators. In operation block 405,
the apparatus uses an ultra-low-power microphone for monitoring
only (or uses lower power mode for one of the microphones). In
operation block 407, the apparatus stores data in a history buffer,
and when triggered processes only data in the history buffer,
rather than continuously. That is, the history buffer maintains an
audio signal of interest while decisions are made as to whether
voice is present in the audio signal and, subsequently, whether
further signal processing components should be invoked such as
noise suppression. If further signal processing components such as
the noise suppressor are not required, the buffered audio signal
may be sent directly to the voice command recognition logic 101. In
operation block 409, the apparatus uses no noise suppression (in
quiet conditions), single-microphone noise suppression (for example
in favorable SNR and noise types), multiple-microphone noise
suppression as per conditions observed and when needed only. In
operation block 411, the apparatus determines signal level and SNR
dependency, maximizes gain in high SNR conditions (i.e. if
favorable conditions exist apply gain to boost signal, rather than
de-noise signal). In operation block 413, the apparatus uses voice
recognition specially trained with power-efficient noise-reduction
pre-processing algorithm, and runs the power efficient noise
reduction front end on the portable (i.e. a mobile device in which
the apparatus is incorporated). In operation block 415, the
apparatus uses long-term noise estimates to configure apparatus
components such as voice recognition and signal conditioning
components, and uses the short-term estimate to select optimal
configurations and switch between those.
[0050] The flowcharts of FIG. 5 and FIG. 6 provide methods of
operation for the various embodiments described above. In FIG. 5,
operation block 501, an audio signal energy level is monitored
while having other signal processing components deactivated. In
operation block 503, at least one of the other signal processing
components is activated in response to a detected change in the
audio signal energy level. For example, if the energy level
changes, this may be an indication that a device operator is
speaking and attempting to command the device. In response, a VAD
may be activated as the at least one other signal processing
component in some embodiments. If the VAD detects the presence of
voice in the audio signal, further signal processing components,
such as a noise suppressor, may be activated. In another
embodiment, a noise estimator may be activated initially using the
assumption that voice is present in the audio signal.
[0051] The flowchart of FIG. 6 provides a method of operation where
a VAD is activated in response to changes in the audio signal level
as shown in operation block 601. Other signal processing components
are deactivated initially. In operation block 603, if voice is
detected by the VAD, other signal processing components are
activated in order to analyze the audio signal and determine if
noise suppression should be applied or not. Noise suppression is
then either applied, or not applied, accordingly. In operation
block 605, various audio signal processing components are either
activated or deactivated as audio signal conditions change or when
voice is no longer detected. For example, the apparatus may by
returned from a multi-microphone configuration to a single,
low-power microphone configuration and noise suppressors, etc. may
be deactivated.
[0052] While various embodiments have been illustrated and
described, it is to be understood that the invention is not so
limited. Numerous modifications, changes, variations, substitutions
and equivalents will occur to those skilled in the art without
departing from the scope of the present invention as defined by the
appended claims.
* * * * *