U.S. patent application number 11/172045 was filed with the patent office on 2006-01-05 for microphone initialization enhancement for speech recognition.
This patent application is currently assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION. Invention is credited to Adam Pieter De Leeuw, Steven Groeger, Stuart John Hayton.
Application Number | 20060004573 11/172045 |
Document ID | / |
Family ID | 32843359 |
Filed Date | 2006-01-05 |
United States Patent
Application |
20060004573 |
Kind Code |
A1 |
De Leeuw; Adam Pieter ; et
al. |
January 5, 2006 |
Microphone initialization enhancement for speech recognition
Abstract
A method and arrangement for improved speech recognition in a
telephonically challenging speakerphone in-car environment. The
method includes receiving a signal from a microphone representative
of speech to be recognised, performing detection of a transition in
the signal indicative of switch on of the microphone, and, in
response to the detection, performing speech recognition on the
signal with reduced contribution from an initial portion thereof.
The initial portion may be treated as optional speech, the speech
recognition may be performed with a predetermined redundant sound,
and a user may be requested to speak the predetermined redundant
sound when speech recognition has fallen below a predetermined
threshold. Thus, recognition may be made possible when otherwise it
would not be possible, recognition match scoring will be increased
as the low weighting given by deleted initial sounds will be
eliminated and therefore confusion of the recognised phrase will be
reduced.
Inventors: |
De Leeuw; Adam Pieter;
(Southampton, GB) ; Groeger; Steven; (Poole,
GB) ; Hayton; Stuart John; (Waterlooville,
GB) |
Correspondence
Address: |
AKERMAN SENTERFITT
P. O. BOX 3188
WEST PALM BEACH
FL
33402-3188
US
|
Assignee: |
INTERNATIONAL BUSINESS MACHINES
CORPORATION
Armonk
NY
|
Family ID: |
32843359 |
Appl. No.: |
11/172045 |
Filed: |
June 30, 2005 |
Current U.S.
Class: |
704/251 ;
704/E11.005; 704/E15.039 |
Current CPC
Class: |
G10L 25/87 20130101;
G10L 15/20 20130101 |
Class at
Publication: |
704/251 |
International
Class: |
G10L 15/04 20060101
G10L015/04 |
Foreign Application Data
Date |
Code |
Application Number |
Jul 1, 2004 |
GB |
0414711.2 |
Claims
1. A method of speech recognition for use with a system having a
microphone, comprising: receiving a signal from the microphone
representative of speech to be recognized when the system is in a
state where the microphone has been turned off; performing
detection of a transition in the signal indicative of switch on of
the microphone; and in response to the detection performing speech
recognition on the signal with reduced contribution from an initial
portion thereof.
2. The method of claim 1, wherein the step of performing speech
recognition comprises treating the initial portion as optional
speech that is acceptable to have absent from a received utterance
within the signal.
3. The method of claim 2, wherein the initial portion comprises at
least one of first word and a received and terminal portion of the
first word of a spoken series of words contained within said
signal.
4. The method of claim 3, further comprising the step of:
establishing an utterance initiating word for the system, wherein
said first word is said utterance initiating word.
5. The method of claim 1, wherein the step of performing speech
recognition comprises performing speech recognition with a
predetermined redundant sound.
6. The method of claim 5, wherein said predetermined redundant
sound is user configurable.
7. The method of claim 5 further comprising detecting when speech
recognition has fallen below a predetermined threshold, and in
response thereto requesting a user to speak the predetermined
redundant sound.
8. The method of claim 1, wherein the system is a telephone speaker
phone system.
9. The method of claim 8, wherein the system is an in-car
system.
10. The method of claim 1, further comprising: establishing a
configurable parameter for initial word recognition sensitivity,
wherein a quantity by which a contribution of the initial portion
is reduced is dependent upon a value of said configurable
parameter.
11. A speech recognition arrangement for use with a system having a
microphone, comprising: means for receiving a signal from the
microphone representative of speech to be recognized; means for
performing detection of a transition in the signal indicative of
switch on of the microphone; and means for performing, in response
to the detection, speech recognition on the signal with reduced
contribution from an initial portion thereof.
12. The arrangement of claim 11, wherein the means for performing
speech recognition comprises means for treating the initial portion
as optional speech.
13. The arrangement of claim 11, wherein the means for performing
speech recognition comprises performing speech recognition with a
predetermined redundant sound.
14. The arrangement of claim 13 further comprising means for
detecting when speech recognition has fallen below a predetermined
threshold, and in response thereto requesting a user to speak the
predetermined redundant sound.
15. The arrangement of claim 11, wherein the system is a telephone
speakerphone system.
16. The arrangement of claim 15, wherein the system is an in-car
system.
17. A speech recognition method comprising: a speech recognition
system receiving a speech input when the speech recognition system
is in a state where a microphone input is turned off; identifying
an initial portion of the speech input; speech recognizing said
speech input in a manner that the identified initial portion has a
reduced contribution compared to other portions of the speech
input.
18. The speech recognition method of claim 17, further comprising:
before said speech recognizing step, discarding said initial
portion of the speech input, wherein the initial portion has a
reduced contribution because it is not speech recognized.
19. The speech recognition method of claim 17, further comprising:
identifying the initial portion as a portion of the speech input
proceeding a first detected pause in the speech input.
20. The speech recognition method of claim 17, wherein the initial
portion comprises at least one of a first spoken word within the
speech input and a received and terminal portion of the first
spoken word within the speech input.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of United Kingdom Patent
Application No. 0414711.2 filed Jul. 1, 2004.
FIELD OF THE INVENTION
[0002] This invention relates to speech recognition, and more
particularly to a microphone initialization enhancement for
automated speech recognition systems.
BACKGROUND OF THE INVENTION
[0003] In the field of this invention it is known that many speaker
phones and mobile phone car installations suppress the microphone
while sound is being played out in order to eliminate echo and
feedback. According to conventional teachings, the microphone
remains in an off state until a relatively high volume audio signal
is received. This microphone enablement technique causes the
microphone to remain in an off state when a user speaks in a low to
medium volume. Because the speaker phone or mobile kit does not
react until after a high energy audio signal is received, any low
level sounds at the start of the utterance can be lost.
Accordingly, the system only receives a truncated part of the
utterance resulting in a speech recognition of the utterance being
unsuccessful.
[0004] This approach has the disadvantage that the spoken phrase is
highly likely to be rejected outright or possibly confused with
other candidate phrases. A need therefore exists for method of
increasing speech recognition performance in such systems wherein
the abovementioned disadvantage(s) may be alleviated.
SUMMARY OF THE INVENTION
[0005] In accordance with a first aspect of the present invention
there is provided a method of speech recognition for use with a
system having a microphone. The method includes receiving a signal
from the microphone representative of speech to be recognized,
performing detection of a transition in the signal indicative of
switch on of the microphone, and, in response to the detection,
performing speech recognition on the signal with reduced
contribution from an initial portion thereof.
[0006] In accordance with a second aspect of the present invention
there is provided a speech recognition system for use with a system
having a microphone. The system includes means for receiving a
signal from the microphone representative of speech to be
recognized, means for performing detection of a transition in the
signal indicative of switch on of the microphone, and means for
performing, in response to the detection, speech recognition on the
signal with reduced contribution from an initial portion
thereof.
[0007] It should be noted that various aspects of the invention can
be implemented as a program for controlling computing equipment to
implement the functions described herein, or a program for enabling
computing equipment to perform processes corresponding to the steps
disclosed herein. This program may be provided by storing the
program in a magnetic disk, an optical disk, a semiconductor
memory, any other recording medium, or can also be provided as a
digitally encoded signal conveyed via a carrier wave. The described
program can be a single program or can be implemented as multiple
subprograms, each of which interact within a single computing
device or interact in a distributed fashion across a network
space.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] One method of improved speech recognition with speaker
phones and car kits incorporating the present invention will now be
described, by way of example only, with reference to the
accompanying drawings, in which:
[0009] FIG. 1 shows a schematic illustration of a prior art
speakerphone speech recognition system;
[0010] FIG. 2 shows a block schematic diagram depicting a
speakerphone speech recognition system incorporating the present
invention; and
[0011] FIG. 3 shows a schematic illustration of a speakerphone
speech recognition system.
DESCRIPTION OF PREFERRED EMBODIMENTS
[0012] Referring firstly to FIG. 1, in a known speech recognition
system (not shown), the system receives a waveform 10 resulting
from speaking of the words "Balance of account" 20. Because, as
discussed above, the system microphone (not shown) is typically
suppressed while sound is being played out in order to eliminate
echo and feedback, the system does not react until after a high
energy audio signal is received; therefore, sounds at the beginning
of the utterance are lost. Thus, the recognition system hears
"lance of account" which does not contain all of the sounds
expected to be heard in "Balance of account". Typically, such a
speech recognition system is looking for a specific sequence of
sounds and will either not match the expected phrase or will score
the match with a low probability.
[0013] Referring now to FIG. 2, a telephone based automated speech
recognition system for use by a user 100 in challenging telephony
environments (such as in-car) includes a speakerphone arrangement
200 having a controller 201 which controls operation of a speaker
202 and microphone 203. A speech recognition arrangement 300 is
coupled to the speakerphone arrangement 200 and has a speech
detector 301, a speech recognition controller 302 and a speech
recognition engine 303.
[0014] The speech detector 301 serves to automatically detect and
enable the following mode of operation: the speech detector 301
identifies if the profile of the audio energy received from the
speakerphone arrangement 200 is silence or close to silence
followed by a high energy edge (rapid transition caused by the
microphone 203 switching on); then, this situation will be notified
by the speech recognition controller 302 to the speech recognition
engine 303.
[0015] The speech recognition engine 303 will process the utterance
but will, in light of the signal from the speech detector 301,
automatically modify its behavior to not expect to match the
initial sounds of the utterance. In effect, the initial sounds will
become an optional part of the utterance.
[0016] Another example is to say that the recognition engine will
accept deletions of sounds at the start of the utterance. Exactly
how much (time) of the utterance might be allowed to be missing may
be established by tuning the system for optimum recognition.
[0017] Alternatively, the grammars and call flows of the speech
recognition engine may be constructed to optionally accept a
redundant word (such as "please") at the start of every utterance.
This word is defined with special purpose "sound sequences" for
this method which allow all possible tail ends of the utterance to
match.
[0018] When the system detects repeated failures, it suggests to
the user that they use the "microphone enabling" word which has the
effect that the first thing the user says to interrupt the system
is not required to match.
[0019] Good recognition can proceed with the significant portion of
the grammar after this redundant word.
[0020] The core of this alternative mode of operation is
illustrated with reference to FIG. 3. As shown in FIG. 3, the
speech recognition arrangement 300 receives a waveform 410
resulting from speaking of the words "Please balance of account".
Because, as discussed above, the microphone 203 is typically
suppressed while sound is being played out in order to eliminate
echo and feedback, the system does not react until a high energy
audio signal is received. Thus, the recognition system receives
"ease balance of account", but it is expecting phrases such as:
[0021] "Please balance of account" or [0022] "lease balance of
account" or [0023] "ease balance of account"--(Match) or [0024] "se
balance of account" or [0025] "balance of account" Therefore the
desired phrase is successfully recognised.
[0026] It will be understood that this method can be implemented
without modification to speech recognition system software, only
the speech recognition controller being new in this
alternative.
[0027] It will be understood that in this alternative, the special
word ("please" in the above example) can be redefined to match
alternate "baseform" sequences (as known in WVS--WebSphere.TM.
Voice Server--terminology) which, in this case, are the various
truncated endings of the sound. The special word will not be usable
in other grammars where it is not permissible to accept truncated
utterances.
[0028] It will be understood that the above-described preferred
embodiment's method of increasing speech recognition performance
with speaker phones and car kits provides the following
advantages.
[0029] While the reliability of recognition may be reduced when
compared with a good (microphone unsuppressed or full duplex)
telephony situation, especially with short utterances, recognition
will be made possible when it is often impossible without this
method.
[0030] Recognition match scoring will be increased as the low
weighting given by deleted initial sounds will be eliminated and
therefore confusion of the recognized phrase will be reduced.
[0031] The method described with reference to FIG. 3 has the
advantage that no modifications are required to the system software
but requires the cooperation of the user and may be most
appropriate to expert and high value systems where the user has a
lot to gain by making the system work for themselves.
[0032] The present invention may be realized in hardware, software,
or a combination of hardware and software. The present invention
may be realized in a centralized fashion in one computer system or
in a distributed fashion where different elements are spread across
several interconnected computer systems. Any kind of computer
system or other apparatus adapted for carrying out the methods
described herein is suited. A typical combination of hardware and
software may be a general purpose computer system with a computer
program that, when being loaded and executed, controls the computer
system such that it carries out the methods described herein.
[0033] The present invention also may be embedded in a computer
program product, which comprises all the features enabling the
implementation of the methods described herein, and which when
loaded in a computer system is able to carry out these methods.
Computer program in the present context means any expression, in
any language, code or notation, of a set of instructions intended
to cause a system having an information processing capability to
perform a particular function either directly or after either or
both of the following: a) conversion to another language, code or
notation; b) reproduction in a different material form.
[0034] This invention may be embodied in other forms without
departing from the spirit or essential attributes thereof.
Accordingly, reference should be made to the following claims,
rather than to the foregoing specification, as indicating the scope
of the invention.
* * * * *