U.S. patent application number 12/327594 was filed with the patent office on 2009-04-30 for system and method for generating a phrase pronunciation.
This patent application is currently assigned to Dictaphone Corporation. Invention is credited to Jill Carrier, William F. Cote.
Application Number | 20090112587 12/327594 |
Document ID | / |
Family ID | 34891016 |
Filed Date | 2009-04-30 |
United States Patent
Application |
20090112587 |
Kind Code |
A1 |
Cote; William F. ; et
al. |
April 30, 2009 |
SYSTEM AND METHOD FOR GENERATING A PHRASE PRONUNCIATION
Abstract
A system and method for a speech recognition technology that
allows language models to be customized through the addition of
special pronunciations for components of phrases, which are added
to the factory language models during customization. It allows
components of a phrase to have different pronunciations inside
customer-added phrases than are specified for those isolated
components in the factory language models.
Inventors: |
Cote; William F.; (Carlisle,
MA) ; Carrier; Jill; (Dorchester, MA) |
Correspondence
Address: |
KELLEY DRYE & WARREN LLP
400 ALTLANTIC STREET , 13TH FLOOR
STAMFORD
CT
06901
US
|
Assignee: |
Dictaphone Corporation
Stratford
CT
|
Family ID: |
34891016 |
Appl. No.: |
12/327594 |
Filed: |
December 3, 2008 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11069203 |
Feb 28, 2005 |
|
|
|
12327594 |
|
|
|
|
60547801 |
Feb 27, 2004 |
|
|
|
Current U.S.
Class: |
704/244 ;
704/E15.004 |
Current CPC
Class: |
G10L 13/08 20130101;
G10L 15/187 20130101 |
Class at
Publication: |
704/244 ;
704/E15.004 |
International
Class: |
G10L 15/06 20060101
G10L015/06 |
Claims
1.-17. (canceled)
18. A method in a computer system comprising a language model, a
background dictionary, and at least one preexisting pron component
list, for adding phrase pronunciations to a language model, said
method comprising the steps of: inputting at least one phrase to be
added to the language model; determining if said at least one
phrase is contained in said language model; if said at least one
phrase is not contained in said language model, determining if said
at least one phrase is contained in said background dictionary,
and, if so, adding background dictionary pronunciation to said
language model; if said at least one phrase is not contained in
said language model or said background dictionary, parsing said at
least one phrase into an ordered set of tokens in accordance with
certain rules sequentially associating prons with each said token
of said ordered set of tokens, generating a pron derived phrase
pronunciation from said prons and adding said pron derived phrase
pronunciation to said language model.
19. A method, in accordance with claim 18, wherein the step of
generating said pron derived phrase pronunciation, further
comprises, for each said token of said ordered set of tokens: a)
sequentially determining if each said associated pron is in said
preexisting pron component list, and, if so, obtaining
pronunciation from said preexisting pron component list; b) if said
associated pron is not in said preexisting pron component list,
determining if said associated pron is in a preexisting language
model, and, if so, adding said language model pron to said
preexisting pron component list; c) if said associated pron is not
in preexisting pron component list or said preexisting language
model, determining if said associated pron is in preexisting
background dictionary, and, if so, adding said background
dictionary pron to said preexisting pron component list; d) if said
associated pron is not in said preexisting pron component list,
said preexisting language model or said preexisting background
dictionary, generating a guess pron, and adding said guess pron to
said preexisting pron component list; e) if there is an additional
token in said ordered set of tokens, repeating steps a) to d); and,
f) if there are no additional tokens in said ordered set of tokens,
generating said pron derived phrase pronunciation by combining said
associated pron pronunciations as obtained from said preexisting
pron component list, in sequence.
20. A method for adding phrase pronunciations to a language model,
in accordance with claim 18, wherein said pron component list
includes punctuations or formatting that is present in the said at
least one phrase but is silent in the pronunciation of said at
least one phrase.
21. A method for adding phrase pronunciations to a language model,
in accordance with claim 19, wherein said pron component list
selected from a plurality of lists in accordance with the position
of the said token within the said at least one phrase.
22. A method for adding phrase pronunciations to a language model,
in accordance with claim 18, wherein said certain rules comprise
breaking up the said phrase into tokens at certain boundaries.
23. A method for adding phrase pronunciations to a language model,
in accordance with claim 22, wherein said certain boundaries
comprise white spaces and/or punctuation.
24. A method for adding phrase pronunciations to a language model,
in accordance with claim 18, wherein said certain rules comprise
looking for the longest match in said preexisting language model or
said preexisting background dictionary.
25. A method for adding phrase pronunciations to a language model,
in accordance with claim 19, wherein said preexisting pron
component lists comprise an initial pron component list and a
non-initial pron component list.
26. A tangible computer usable medium having computer readable
instructions stored thereon for execution by a processor and
comprising a language model, a background dictionary, and at least
one preexisting pron component list to perform a method comprising:
inputting at least one phrase to be added to the language model;
determining if said at least one phrase is contained in said
language model; if said at least one phrase is not contained in
said language model, determining if said at least one phrase is
contained in said background dictionary, and, if so, adding
background dictionary pronunciation to said language model; if said
at least one phrase is not contained in said language model or said
background dictionary, parsing said at least one phrase into an
ordered set of tokens in accordance with certain rules,
sequentially associating prons with each said token of said ordered
set of tokens, generating a pron derived phrase pronunciation from
said prons, and adding said pron derived phrase pronunciation to
said language model.
27. A tangible computer usable medium, in accordance with claim 26,
to perform a method wherein the step of generating said pron
derived phrase pronunciation further comprises, for each said token
of said ordered set of tokens: a) sequentially determining if each
said associated pron is in said preexisting pron component list,
and, if so, obtaining pronunciation from said preexisting pron
component list; b) if said associated pron is not in said
preexisting pron component list, determining if said associated
pron is in a preexisting language model, and, if so, adding said
language model pron to said preexisting pron component list; c) if
said associated pron is not in preexisting pron component list or
said preexisting language model, determining if said associated
pron is in preexisting background dictionary, and, if so, adding
said background dictionary pron to said preexisting pron component
list; d) if said associated pron is not in said preexisting pron
component list, said preexisting language model or said preexisting
background dictionary, generating a guess pron, and adding said
guess pron to said preexisting pron component list; e) if there is
an additional token in said ordered set of tokens, repeating steps
a) to d); and, f) if there are no additional tokens in said ordered
set of tokens, generating said pron derived phrase pronunciation by
combining said associated pron pronunciations as obtained from said
preexisting pron component list, in sequence.
28. A computer usable medium, in accordance with claim 27, wherein
said pron component list includes punctuations or formatting that
is present in the text but is silent in the pronunciation of said
at least one phrase.
29. A computer usable medium, in accordance with claim 27, wherein
said pron component list selected from a plurality of lists in
accordance with the position of the said token within the said at
least one phrase.
30. A computer usable medium, in accordance with claim 26, wherein
said certain rules comprise breaking up the said phrase into tokens
at certain boundaries.
31. A computer usable medium, in accordance with claim 30, wherein
said certain boundaries comprise white spaces and/or
punctuation.
32. A computer usable medium, in accordance with claim 26, wherein
said certain rules comprise looking for the longest match in said
preexisting language model or said preexisting background
dictionary.
33. A computer usable medium, in accordance with claim 27, wherein
said preexisting pron component lists comprise an initial pron
component list and a non-initial pron component list.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. patent
application Ser. No. 11/069,203, filed Feb. 28, 2005, and claims
priority from co-pending U.S. Provisional Patent Application Ser.
No. 60/547,801, entitled "SYSTEM AND METHOD FOR GENERATING A PHRASE
PRONUNCIATION," filed Feb. 27, 2004, which co-pending application
is hereby incorporated by reference in its entirety.
[0002] This application relates to co-pending U.S. patent
application Ser. No. 10/413,405, entitled, "INFORMATION CODING
SYSTEM AND METHOD", filed Apr. 15, 2003; co-pending U.S. patent
application Ser. No. 10/447,290, entitled, "SYSTEM AND METHOD FOR
UTILIZING NATURAL LANGUAGE PATIENT RECORDS", filed on May 29, 2003;
co-pending U.S. patent application Ser. No. 10/448,317, entitled,
"METHOD, SYSTEM, AND APPARATUS FOR VALIDATION", filed on May 30,
2003; co-pending U.S. patent application Ser. No. 10/448,325,
entitled, "METHOD, SYSTEM, AND APPARATUS FOR VIEWING DATA", filed
on May 30, 2003; co-pending U.S. patent application Ser. No.
10/448,320, entitled, "METHOD, SYSTEM, AND APPARATUS FOR DATA
REUSE", filed on May 30, 2003; co-pending U.S. patent Application
Ser. No. 10/953,448, entitled, "SYSTEM AND METHOD FOR DATA DOCUMENT
SECTION SEGMENTATIONS", filed on Sep. 30, 2004; co-pending U.S.
patent application Ser. No. 10/953,474, entitled, "SYSTEM AND
METHOD FOR POST PROCESSING SPEECH RECOGNITION OUTPUT," filed on
Sep. 29, 2004; co-pending U.S. patent application Ser. No.
10/953,471, entitled, "SYSTEM AND METHOD FOR MODIFYING A LANGUAGE
MODEL AND POST-PROCESSOR INFORMATION", filed on Sep. 29, 2004;
co-pending U.S. patent application Ser. No. 10/951,291, entitled,
"SYSTEM AND METHOD FOR CUSTOMIZING SPEECH RECOGNITION INPUT AND
OUTPUT", filed on Sep. 27, 2004; co-pending U.S. patent application
Ser. No. 11/007,626, entitled "SYSTEM AND METHOD FOR ACCENTED
MODIFICATION OF A LANGUAGE MODEL" filed on Dec. 8, 2004; co-pending
U.S. patent application Ser. No. 10/787,889, entitled, "SYSTEM
METHOD, AND APPARATUS FOR PREDICTION USING MINIMAL AFFIX PATTERNS",
filed on Feb. 27, 2004; and co-pending U.S. Provisional Patent
Application 60/547,797, entitled, "SYSTEM AND METHOD FOR
NORMALIZATION OF A STRING OF WORDS", filed on Feb. 27, 2004, all of
which co-pending applications are hereby incorporated by reference
in their entirety.
BACKGROUND OF THE INVENTION
[0003] The present invention relates generally to a system and
method for producing an optimal language model for performing
speech recognition.
[0004] Today's speech recognition technology enables a computer to
transcribe spoken words into computer recognized text equivalents.
Speech recognition is the process of converting an acoustic signal,
captured by a transducive element, such as a microphone or a
telephone, to a set of text words in a document. This process can
be used for numerous applications including transcription, data
entry and word processing The development of speech recognition
technology is primarily focused on accurate speech recognition,
which is a formidable task due to the wide variety of
pronunciations, phrases, accents, and speech characteristics. In
particular, previous attempts to transcribe phrases accurately have
been met with limited success.
[0005] The key to speech recognition technology is the language
model. Today's state of the art speech recognition tools utilize a
factory (or out-of-the-box) language model, which is often
customized to produce a site-specific language model. Further,
site-specific users of speech recognition systems customize factory
language models by including site-specific names and phrases. A
site-specific language model might include, for example, the names
of doctors, hospitals, or medical departments of a specific site
using speech recognition technology. Unfortunately, factory
language models include few names and phrases and previous attempts
to provide phrase customization did not produce customized language
models that accurately recognize phrases during speech
recognition.
[0006] Previous efforts to solve this problem involved customizing
a language model by adding phrases and corresponding phrase
pronunciations to the language model. The phrase pronunciations for
the added phrase were created as a combination of pronunciations of
the components or elements of the phrase As such, a phrase to be
added to the language model would be initially broken down into
components. For each component, the language model would be
searched for a matching component and corresponding pronunciation.
If all components were found in the language model, the
corresponding pronunciations for each component of the phrase would
be concatenated to form pronunciations of the new multi-word
phrase. The new phrase was then added, together with its
corresponding pronunciations, to the language model.
[0007] If any components were not found in the language model, a
background dictionary was searched for the components. Any
component tokens still not found in either the language model or
the background dictionary were sent to a pronunciation guesser
module, where component pronunciations were guessed based on their
orthography (spelling). Phrase pronunciations were then formed for
that phrase by combining all pronunciations from the language
model, background dictionary, or guesser module. The new phrase was
then added, together with its corresponding pronunciations, to the
language model.
[0008] However, problems occur when phrase components are
pronounced differently when part of a phrase. For example, the
ampersand sign is pronounced as `and` in a phrase but as
`ampersand` in the language model. Some previous systems attempted
to solve this problem by adding additional pronunciations to
problematic words instead of adding phrase pronunciations.
Unfortunately, if "&" in the language model is given an
additional pronunciation of `and`, then when an ordinary phrase
such as "bacon and eggs" is dictated, it may be transcribed with an
ampersand instead of an "and". Conversely, if "&" is not given
an additional pronunciation of `and`, then when the phrase "Brigham
& Women's Hospital" is added to the language model, it would
receive the pronunciation `Brigham ampersand women's hospital` in
the language model. This is a problem because `Brigham &
Women's Hospital` is actually pronounced as `Brigham and women's
hospital.`
[0009] Additional problems occur when elements of a dictated phrase
are not pronounced, that is, are silent. Previous systems failed to
provide transcription for any silent or unspoken aspect of a
phrase. For instance, a slash is used in many phrases but silent
when pronounced. For example, "OB/GYN" is a phrase pronounced
`OBGYN`. However, under traditional systems, the slash would not be
recognized or transcribed unless the dictator actually spoke
`slash`, despite the fact that doctors and hospitals expect the
transcribed text of a medical report to include the slash in
"OB/GYN".
[0010] Another problem with silent elements of a phrase includes
well-known formatting or terms of the trade that are shortened or
abbreviated for convenience when spoken. For example, the phrase
"WISC (Revised)" is a phrase that is dictated for convenience in
the medical fields as `WISC Revised`, without specifically
dictating the parentheses around `Revised`. Traditional systems
would require that the phrase in the language model have a
pronunciation including the parentheses. This approach requires
that the parentheses be awkwardly dictated in order for the
automatic transcription to include the parentheses.
[0011] Additionally, traditional systems resulted in prohibitively
large numbers of permutations of possible phrase pronunciations for
many phrases. This is the result of each phrase component having
multiple pronunciations in the language model. When combining the
pronunciations from each phrase component, the number of possible
combinations grows rapidly. Therefore, previous systems added a
huge number of possible pronunciations for a long phrase where one
or maybe two pronunciations would be sufficient for automatic
recognition of a long phrase.
[0012] Previous systems also failed to identify context based
pronunciations in a phrase. For example, the phrases "St. Mulbery"
and "Mulbery St." contain the component `St.` but the first phrase
refers to a saint and the second phrase refers to a street. A
typical language model includes both `street` and `saint`
pronunciations for the component `St.`. Therefore, in previous
systems when the phrase "St. Mulbery" was added to the language
model, the system would inefficiently provide both the `saint
Mulbery` and `street Mulbery` pronunciations.
[0013] Therefore, there exists a need for a speech recognition
technology that updates a language model with phrases that can be
accurately recognized and transcribed.
SUMMARY OF THE INVENTION
[0014] The present invention includes a system and method for a
speech recognition technology that allows language models to be
customized through the addition phrase pronunciations through the
use of special pronunciations for components of phrases. The steps
of the method may include generating a list of pron components,
whose pronunciations differ when they occur in a phrase and
assigning at least one pron to each pron component. The steps may
also include determining the pronunciation of a phrase, by
tokenizing the phrase by generating a list of tokens corresponding
to the phrase. Determining the phrase pronunciation may include
determining a pron for each of the list of tokens and assembling
the pronunciation of the phrase based upon a combination each pron.
Finally, the system may add the phrase and the pronunciation of the
phrase to the language model.
[0015] Another aspect of the present invention may include
identifying initial and non-initial tokens of a phrase. The present
invention may include generating a phonetic transcription for each
pron component based on a literal phonetic transcription or
referencing a phonetic transcription from the language model.
[0016] Another aspect of the present invention may include
determining a pron for each token by searching a pron component
list. The pron component list may include both an initial pron
component list and a non-initial pron component list.
[0017] Another aspect of the present invention may include
searching the language model and/or the background dictionary for a
pron. The present invention may also include a pron guesser for
guessing the pron for a token.
[0018] In another aspect, the present invention includes a system
for adding phrase pronunciations to a language model including a
computer with a computer code mechanism for processing a list of
pron components whose pronunciations differ when they occur in a
phrase, assigning at least one pron to each pron component,
determining the pronunciation of a first phrase by first tokenizing
the first phrase by generating a list of tokens corresponding to
the first phrase, then determining a pron for each of the list of
tokens, then assembling the pronunciation of the first phrase based
on a combination of each pron, and adding the first phrase and the
pronunciation of the first phrase to a language model; a language
model electronically accessible by the computer code mechanism; and
a tokenizer for generating a list of tokens corresponding to the
first phrase, the tokenizer being in electronic communication with
the computer code mechanism. In some embodiments the pron
components list includes non-initial components. In some
embodiments, the pron components list includes initial
components.
[0019] In still another embodiment the system includes a background
dictionary electronically accessible by the computer code
mechanism, wherein the computer code mechanism searches the
background dictionary to determine a pron for each token.
[0020] In another embodiment the system includes a pron guesser in
electronic communication with the computer code mechanism, wherein
the computer code mechanism applies the pron guesser to determine a
pron for each token.
BRIEF DESCRIPTION OF THE DRAWINGS
[0021] While the specification concludes with claims particularly
pointing out and distinctly claiming the present invention, it is
believed the same will be better understood from the following
description taken in conjunction with the accompanying drawings,
which illustrate, in a non-limiting fashion, the best mode
presently contemplated for carrying out the present invention, and
in which like reference numerals designate like parts throughout
the Figures, wherein:
[0022] FIG. 1 shows an architecture view of the system and method
for modifying a language model in accordance with prior art;
and
[0023] FIG. 2 shows an architecture view of the system and method
for modifying a language model in accordance with certain teachings
of the present disclosure.
DETAILED DESCRIPTION
[0024] The present disclosure will now be described more fully with
reference to the Figures in which an embodiment of the present
disclosure is shown. The subject matter of this disclosure may,
however, be embodied in many different forms and should not be
construed as being limited to the embodiments set forth herein.
[0025] Referring to FIG. 1, an architecture view shows a previously
known system or method for the creation of a multiword phrase
pronunciation and for the modification of a language model in
accordance with the prior art. The method begins with step 10
initializing the steps of the system.
[0026] A list of phrases to be added to the language model is fed
into the system in step 15. Each phrase from the input list is
presented to the system in step 20, and proceeds all the way thru
to the end at step 85, at which point the pronunciations created
for each phrase are added to the language model. The system is
repeated for each phrase in the input list until we have added
pronunciations for all the phrases to the language model.
[0027] In step 20, a phrase is compared against the language model
to determine if the phrase already exists in the language model. If
so, the pronunciation or pronunciations associated with the phrase
are collected from the language model in step 25 and provided to
step 75.
[0028] If the phrase is not located in the language model in step
20, the background dictionary is searched in step 30. If a match to
the phrase is found in the background dictionary, the pronunciation
or pronunciations associated with the phrase are collected from the
background dictionary in step 35 and provided to step 75.
[0029] It should be noted that words in the language model may have
multiple pronunciations associated with a given word or phrase.
Likewise, words in the background dictionary may also have multiple
pronunciations associated with a given word or phrase. Therefore,
if a word or phrase is located in the language model or background
dictionary, multiple pronunciations may be provided to step 75 or a
given phrase or component of a phrase.
[0030] If the phrase is not found in either the language model or
the background dictionary, the phrase is broken into smaller parts
or phrasal components if possible. Step 40 determines if the phrase
can be parsed into a first component and a second component at the
first space or punctuation mark. Step 45 determines if the phrase
includes more that one part and if so, step 50 begins a recursive
loop on the first part or component of the phrase.
[0031] Step 50 sends the first component back to step 20 to
initiate the loop on the first component. Step 20 determines if the
first component exists in the language model. If a matching
component is found in the language model, then the pronunciation of
the first component is retrieved from the language model and
delivered to step 75.
[0032] If a match is not found in the language model, then step 30
determines if the first component is in the background dictionary
in step 30. If the first component is found in the background
dictionary, then its pronunciation is retrieved from the background
dictionary and delivered to step 75.
[0033] If a match is not found in either the language model or the
background dictionary, then step 40 determines if the first
component may be broken down any further into smaller components.
As the first component was removed from the phrase on the initial
pass through the system, the first component cannot be broken into
smaller parts and therefore step 45 will determine that there is no
more than one part of the first component.
[0034] When any phrasal component passing through the system cannot
be broken into smaller parts and cannot be matched in either the
language model or the background dictionary, the pronunciation of
the phrasal component will be guessed in step 60. It should be
noted that the pronunciation guesser in step 60 may guess multiple
pronunciations and that those pronunciations will be passed forward
to step 75.
[0035] Once pronunciations for the first component are delivered to
step 75 from the language Model, the background dictionary, or the
pronunciation guesser, the recursive loop of step 50 is finished
and the recursive loop on the second part of the phrase in step 55
is sent to step 20.
[0036] The second part passed through steps 20, 25, 30, and 35 as
described above. If a pronunciation is found for the second part,
then the pronunciation or pronunciations are delivered to step 75.
However, if no pronunciations are found, then the second part is
analyzed in step 40 to determine if the second part of the phrase
contains smaller components that can be individually passed through
the system as the first component.
[0037] If the second part does not contain any smaller components
and no match for the second part is found in either the language
model or the background dictionary, then step 60 guesses the
pronunciation of the second part. The guessed pronunciations are
delivered to step 75. Step 75 combines the pronunciations from each
phrasal component. Step 80 writes the phrase and the pronunciations
to the language model and step 85 ends the system.
[0038] If the second part does contain multiple parts, then step 45
will determine that there is more than one part and proceed to step
50 where the first component of the second part will be sent to
step 20. The recursive loops of steps 50 and 55 will repeat the
above described steps with respect to FIG. 1, specifically
repeating the recursive loop steps 50, 55 and 65 until each
individual phrasal component is identified and corresponding
pronunciations assigned and delivered to step 75.
[0039] When all the components or parts have corresponding
pronunciations assigned and delivered to step 75, the
pronunciations are combined. The pronunciations from the top level
call and all recursive calls are combined in step 75 and added to
the language model in step 80 to be used by subsequent passes
through the system. Once the phrase and corresponding
pronunciations are written to the language model in step 80, the
system is ended in step 85.
[0040] It should be noted that when the pronunciations are combined
in step 75, the number of phrase pronunciations could multiply very
quickly if each component or part is associated with multiple
corresponding pronunciations. Therefore, the number of permutations
of possible phrase pronunciations to be written to the language
model may be prohibitively large for a long multi-part phrase with
multiple pronunciations for each part of the phrase.
[0041] Referring to FIG. 2, an architecture view shows a system or
method for the creation of a multiword phrase pronunciation and for
the modification of a language model in accordance with an
embodiment of the present invention. The method begins with step
100 initializing the steps.
[0042] As with the system shown in FIG. 1, an input list of phrases
to be added to the language model is provided to the system in step
101. It should be noted that the phrases may be entered on an
individual basis or entered as a group, sequentially passing
through the system.
[0043] Each phrase from the input list is presented to the system
in step 102, and proceeds through the system to the end at step
135, at which point the pronunciations for each phrase are written
to the language model. The process is repeated for each phrase in
the input list until we have added pronunciations to the language
model for every phrase in the input list.
[0044] In step 102, a phrase is compared against the language model
to determine if the phrase already exists in the language model. If
so, the pronunciation or pronunciations associated with the phrase
are collected from the language model in step 103 and provided to
step 130.
[0045] If the phrase is not located in the language model in step
102, the background dictionary is searched in step 104. If a match
to the phrase is found in the background dictionary, the
pronunciation or pronunciations associated with the phrase are
collected from the background dictionary in step 105 and provided
to step 130.
[0046] If the phrase is not found in either the language model or
the background dictionary, a tokenizer breaks up the phrase into
phrasal components or `tokens` in step 110. These tokens are
delivered to step 120, where a loop begins that sequentially
processes each token of the phrase.
[0047] It should be noted that the tokenizer parses a phrase
according to certain rules. Primarily, the tokenizer breaks up a
phrase into phrasal elements or tokens at certain boundaries,
looking for the longest match in the language model or background
dictionary. For instance, the phrase "ham & eggs" has 3 phrasal
elements and the tokenizer would break the phrase up into three
tokens: "ham," "&," and "eggs." However, the phrase "San
Francisco Chronicle" contains two phrasal elements: "San Francisco"
and "Chronicle." The element "San Francisco" is one element because
a match exists in the language model for "San Francisco." The
tokenizer may also parse a phrase simply by white space or
punctuation.
[0048] Step 120 controls the system looping the tokens from the
tokenizer. Each token is provided in turn to step 125. Step 125
determines if additional tokens have not passed through the system.
If a token has not passed through the system, the token is
delivered to step 140. If every token has passed through the
system, step 125 would direct the system to step 130.
[0049] For each token, a pron component list is searched for a
match. The pron component list includes pron components or tokens
that are pronounced differently when part of a phrase. The pron
component list includes these tokens and corresponding
pronunciations. The corresponding pronunciations in the pron
component list, language model, and background dictionary are
referred to as prons. The prons located in the pron component list
are the pronunciations of how tokens are pronounced in a phrase For
example, the token "&" would have a pron of `and` in the pron
component list but a pronunciation of `ampersand` in the language
model.
[0050] The pron component list may also include components that are
not pronounced differently but require fewer pronunciations to be
recognized by a speech recognition system when part of a phrase.
For example, "and" only needs one, maybe two, pronunciations to be
recognized as part of a phrase as opposed to the many more
pronunciations that are typically found in a language model for the
token `and`. Therefore, the token `and` may be present in the pron
component list with only one pron of `and`. The pron component list
may also include punctuations or formatting that is present in the
text of the phrase but is silent in the spoken phrase. In this
situation, if the phrase `OB/GYN` was a phrase to be added to the
language model, the token `/` would have a silent pron.
[0051] It should be noted that prons may be specified in the pron
component list as literal phonetic transcriptions of their
corresponding tokens, or prons may referenced their corresponding
tokens in the language model, where the phonetic transcription is
looked up by referencing that token in the language model.
[0052] To provide additional recognition accuracy, an initial pron
component list may be searched for a match to the first token of
every phrase. This initial pron component list may be used to
identify the unique pronunciations of tokens when they occur at the
start of a phrase. Therefore, the pron component list and the
initial pron component list may be substantially identical except
for those tokens that have different prons when they occur at the
start of a phrase. For example, `St.` is a token that changes prons
depending on whether the token occurs at the start of the phrase.
`St.` has a pron of `saint` when it occurs at the start of a phrase
and `street` or `saint` when it occurs elsewhere in a phrase.
[0053] The embodiment of FIG. 2 utilizes an initial pron component
list. However, the system shown in FIG. 2 might also be
accomplished with only a pron component list and remain within the
scope of the invention. Therefore in FIG. 2, step 140 determines if
the token passing through the loop is the first token in the
phrase. If so, then the first token is delivered to step 150 where
a list of initial pronunciation components or `pron` components may
be searched to determine if the first token is in the initial pron
component list. If a match to the first token is found, then the
corresponding initial pron component is retrieved from the initial
pron component list in step 155 and added to a global set of prons
being collected for each of the tokens in the phrase in step
160.
[0054] If the first token of the phrase is not located in the
initial pron component list, then the first token is delivered to
step 181. Step 181 determines if the first token is in the language
model and if so, retrieves the pronunciations from the language
model in step 182. Step 183 adds the pronunciations to the global
set or prons being collected for each of the tokens in the
phrase.
[0055] If the first token is not located in the language model,
then the first token is delivered to step 185. Step 185 determines
if the first token is in the background dictionary and if so,
retrieves the pronunciations from the background dictionary in step
190. Step 195 adds the pronunciations to the global set or prons
being collected for each of the tokens in the phrase.
[0056] If a match is not found in the initial pron component list
or the language model or the background dictionary, then step 200
guesses the pronunciation for the first token. Step 205 adds the
guessed pronunciation to the global set of prons being collected
for each of the tokens in the phrase.
[0057] Once the first token is assigned a pronunciation by the
system, steps 165 and 120 return the system to step 125 where the
second token proceeds through the system. Step 140 determines that
second token should proceed to step 170, which determines if the
second token is present in the pron component list If a match of
the second token is found in the pron component list, then the
corresponding pron is retrieved from the pron component list in
step 175 and added to the global set of prons being collected for
each of the tokens in the phrase in step 180.
[0058] If the second token is not located in the pron component
list, then the second token is delivered to step 181. Step 181
determines if the second token is in the language model and if so,
retrieves the pronunciation from the language model in step 182.
Step 183 adds the pronunciation to the global set of prons being
collected for each of the tokens in the phrase.
[0059] If the second token is not located in the language model,
then the token is delivered to step 185. Step 185 determines if the
token is in the background dictionary and if so, retrieves the
pronunciation from the background dictionary in step 190. Step 195
adds the pronunciation to the global set of prons being collected
for each of the tokens in the phrase
[0060] If a match is not found in the pron component list or the
language model or the background dictionary, then step 200 guesses
the pronunciation for the second token. Step 205 adds the guessed
pronunciation to the global set of prons being collected for each
of the tokens in the phrase
[0061] Once the second token is assigned a pronunciation by the
system, steps 165 and 120 return the loop to step 125. Step 125
determines whether there are additional tokens in the phrase that
have not passed through the system shown in FIG. 2. It should be
noted that each additional token of the phrase passes through the
system in the same manner as described above with respect to the
second token. It should also be noted that the system may perform
as many loops as necessary to process every token in the phrase and
compile a pronunciation for every token in the phrase. For example,
a phrase with four tokens will make four loops through the system
and a phrase with ten tokens will make ten loops through the
system.
[0062] It should be noted that as prons are added to the global set
of prons being collected for each token of the phrase, the
pronunciation for the phrase is combined token by token. Once every
token is assigned a corresponding pronunciation, a pronunciation
for the entire phrase is created from the combined pronunciations,
and there are no additional tokens to be processed, step 125 will
indicate that the system is finished and deliver the phrase and
corresponding phrase pronunciations to Step 130. Step 130 will then
write the phrase and the corresponding phrase pronunciation to the
language model for use during automatic speech recognition. After
the language model is updated, the system ends with step 135.
[0063] It should be noted that after a phrase and corresponding
phrase pronunciations are written to the language model, the next
phrase from the input list is processed from step 102 to step 135.
Multiple phrases may be processed and automatically assigned
pronunciations until each phrase in the input list is assigned
pronunciations and written in the language model. Thus, phrases may
be individually added to the language model as described above with
reference to FIG. 2 or multiple phrases may be added to the
language model at one time by repeating the step 102 through step
135 for each phrase in the input list.
[0064] A computer system for implementing the methods described
above will now be described. Such a computer system has a computer
with a computer code mechanism capable of processing a list of pron
components whose pronunciations differ when they occur in a phrase.
The computer code mechanism assigns at least one pron to each pron
component. The computer code mechanism then determines the
pronunciation of a phrase by providing the phrase to a tokenizer in
electronic communication with the computer code mechanism. The
computer code mechanism then determines a pron for each of the list
of tokens provided by the tokenizer and assembles the pronunciation
of the phrase based on a combination of each of the prons. The
computer code mechanism then adds the phrase and the pronunciation
of the phrase to a language model electronically accessible by the
computer code mechanism.
[0065] Optionally, the computer code mechanism may be capable of
generate a phonetic transcription for each pron component when
assigning a pron to each pron component. In generating a phonetic
transcription, the computer code mechanism optionally may reference
an item in the language model. The computer code mechanism
optionally may specify a literal phonetic transcription when
generating a phonetic transcription.
[0066] Optionally, the computer code mechanism may also be capable
of processing a pron component list containing initial or
non-initial components.
[0067] The computer system also includes a language model
electronically accessible by the computer code mechanism. After the
computer code mechanism completes determining the pronunciation of
a phrase, the computer code mechanism adds the phrase and its
pronunciation to the language model. Optionally, the language model
may be capable of being referenced by the computer code mechanism
when the computer code mechanism generates a phonetic
transcription. The language model optionally may be capable of
being searched by the computer code mechanism in order to determine
a pron.
[0068] The computer system further includes a tokenizer. The
tokenizer is in electronic communication with the computer code
mechanism and generates a list of tokens corresponding to a phrase
provided by the computer code mechanism. The tokenizer then
provides the list of tokens to the computer code mechanism. The
tokenizer may also identify an initial or a non-initial token.
[0069] Optionally, the computer system may include a background
dictionary electronically accessible by the computer code
mechanism. If such a background dictionary is available, it may be
searched by the computer code mechanism in order to determine a
pron.
[0070] Optionally, the computer system may further include a pron
guesser. The pron guesser, if present, is in electronic
communication with the computer code mechanism and is capable of
being applied to a token in order to determine a pron.
[0071] It will be apparent to one of skill in the art that
described herein is a novel system and method for modifying a
language model. While the invention has been described with
reference to specific embodiments, it is not limited to these
embodiments. The invention may be modified or varied in many ways
and such modifications and variations as would be obvious to one of
skill in the art are within the scope and spirit of the invention
and are included within the scope of the following claims.
* * * * *