U.S. patent application number 11/431894 was filed with the patent office on 2007-11-15 for systems and methods for fast and memory efficient machine translation using statistical integrated phase lattice.
Invention is credited to Stanley Chen, Yuqing Gao, Bowen Zhou.
Application Number | 20070265826 11/431894 |
Document ID | / |
Family ID | 38686195 |
Filed Date | 2007-11-15 |
United States Patent
Application |
20070265826 |
Kind Code |
A1 |
Chen; Stanley ; et
al. |
November 15, 2007 |
Systems and methods for fast and memory efficient machine
translation using statistical integrated phase lattice
Abstract
A phrase-based translation system and method includes a
statistically integrated phrase lattice (SIPL) (H) which represents
an entire translational model. An input (I) is translated by
determining a best path through an entire lattice (S) by performing
an efficient composition operation between the input and the SIPL.
The efficient composition operation is performed by a multiple
level search where each operand in the efficient composition
operation represents a different search level.
Inventors: |
Chen; Stanley; (Port
Jefferson, NY) ; Gao; Yuqing; (Mount Kisco, NY)
; Zhou; Bowen; (Ossining, NY) |
Correspondence
Address: |
KEUSEY, TUTUNJIAN & BITETTO, P.C.
20 CROSSWAYS PARK NORTH, SUITE 210
WOODBURY
NY
11797
US
|
Family ID: |
38686195 |
Appl. No.: |
11/431894 |
Filed: |
May 10, 2006 |
Current U.S.
Class: |
704/2 |
Current CPC
Class: |
G06F 40/44 20200101 |
Class at
Publication: |
704/002 |
International
Class: |
G06F 17/28 20060101
G06F017/28 |
Goverment Interests
GOVERNMENT RIGHTS
[0001] This invention was made with Government support under
Contract No.: NBCH2030001 awarded by Defense Advanced Research
Projects Agency (DARPA) The Government has certain rights in this
invention.
Claims
1. A phrase-based translation method, comprising: providing a
statistically integrated phrase lattice (SIPL) (H) which represents
an entire translational model; and translating an input (I) by
determining a best path through an entire lattice (S) by performing
an efficient composition operation between the input and the SIPL,
wherein the efficient composition operation is performed by a
multiple level search where each operand in the efficient
composition operation represents a different search level.
2. The method as recited in claim 1, wherein the SIPL comprises
multiple finite state transducers computed separately and prior to
the translating step.
3. The method as recited in claim 2, wherein the finite state
transducers include at least one language model (L) and at least
one translation model (M).
4. The method as recited in claim 3, further comprising computing
the at least one translation model (M) offline, wherein the at
least one translation model includes a word-to-phrase sequencer
(P), a phrase translation transducer (T), a target language
phrase-to-word transducer (W).
5. The method as recited in claim 1, wherein providing a
statistically integrated phrase lattice (SIPL) includes providing
the SIPL as a chain of conditional probabilities wherein portions
of the chain include finite state machines.
6. The method as recited in claim 5, wherein the finite state
machines include determinizable transducers.
7. The method as recited in claim 1, wherein translating includes
performing a state traversal search across the entire lattice (S)
wherein each of multiple levels of the multiple level search is
searched simultaneously.
8. The method as recited in claim 1, wherein multiple levels for
the multiple level search include a level for the input (I), and at
least one level for the SIPL.
9. The method as recited in claim 1, wherein multiple levels
include a level for the input (I), a level for a translation model
and a level for language model.
10. The method as recited in claim 1, wherein the best path is
determined based on negative log probability cost.
11. The method as recited in claim 1, wherein translating further
comprises merging active search states of two or more of the input
(I), the language model (L) and the translation model (M) when the
states are identical.
12. The method as recited in claim 1, wherein translating further
comprises pruning states to balance between speed and accuracy.
13. The method as recited in claim 1, wherein the method is run on
a portable device.
14. The method as recited in claim 13, wherein the portable device
includes less than 20 MB of operating memory.
15. A computer program product comprising a computer useable medium
including a computer readable program, wherein the computer
readable program when executed on a computer causes the computer to
perform a phrase-based translation method, comprising: providing a
statistically integrated phrase lattice (SIPL) (H) which represents
an entire translational model; and translating an input (I) by
determining a best path through an entire lattice (S) by performing
an efficient composition operation between the input and the SIPL,
wherein the efficient composition operation is performed by a
multiple level search where each operand of the efficient
composition operation represents a different search level.
16. The computer program product as recited in claim 15, wherein
the SIPL comprises multiple finite state transducers computed
separately and prior to the translating step.
17. The computer program product as recited in claim 16, wherein
the finite state transducers include at least one language model
(L) and at least one translation model (M).
18. The computer program product as recited in claim 17, further
comprising computing the at least one translation model (M)
offline, wherein the at least one translation model includes a
word-to-phrase sequencer (P), a phrase translation transducer (T),
a target language phrase-to-word transducer (W).
19. The computer program product as recited in claim 15, wherein
providing a statistically integrated phrase lattice (SIPL) includes
providing the SIPL as a chain of conditional probabilities wherein
portions of the chain include finite state machines.
20. The computer program product as recited in claim 19, wherein
the finite state machines include determinizable transducers.
21. The computer program product as recited in claim 15, wherein
translating includes performing a state traversal search across the
entire lattice (S) wherein each of multiple levels is searched
simultaneously.
22. The computer program product as recited in claim 15, wherein
the multiple level search includes multiple levels with a level for
the input (I), and at least one level for the SIPL.
23. The computer program product as recited in claim 15, wherein
multiple levels of the multiple level search includes a level for
the input (I), a level for a translation model and a level for
language model.
24. The computer program product as recited in claim 15, wherein
the best path is determined based on negative log probability
cost.
25. The computer program product as recited in claim 15, wherein
translating further comprises merging active search states of two
or more of the input (I), the language model (L) and the
translation model (M) when the states are identical.
26. The computer program product as recited in claim 15, wherein
translating further comprises pruning states to balance between
speed and accuracy.
27. The computer program product as recited in claim 16, wherein
the product is run on a portable device.
28. The computer program product as recited in claim 27, wherein
the portable device includes less than 20 MB of operating
memory.
29. A method for training a phrase-based translation model,
comprising: extracting bilingual phrase pairs from utterances and
estimating translation probabilities of the bilingual pairs to
create an inventory of bilingual phrase pairs; and creating a
statistically integrated phrase lattice (SIPL) (H) using the
inventory of bilingual phrase pairs to represent an entire
translational model which includes a plurality of weighted finite
state transducers (WFSTs) including at least a language model (L)
and a translation model (M) against which an input may be compared
to translate phrases.
30. The method as recited in claim 29, wherein the translation
model (M) includes a word-to-phrase sequencer (P), a phrase
translation transducer (T), and a target language phrase-to-word
transducer (W).
31. The method as recited in claim 29, wherein the entire
translational model is computed offline and stored in a portable
device.
32. A computer program product comprising a computer useable medium
including a computer readable program, wherein the computer
readable program when executed on a computer causes the computer to
train a phrase-based translation model, comprising: extracting
bilingual phrase pairs from utterances and estimating translation
probabilities of the bilingual pairs to create an inventory of
bilingual phrase pairs; and creating a statistically integrated
phrase lattice (SIPL) (H) using the inventory of bilingual phrase
pairs to represent an entire translational model which includes a
plurality of weighted finite state transducers (WFSTs) including at
least a language model (L) and a translation model (M) against
which an input may be compared to translate phrases.
33. A phrase-based translation system, comprising: a statistically
integrated phrase lattice (SIPL) (H) stored in memory which
represents an entire translational model; and a translation module
configured to translate an input (I) by determining a best path
through an entire lattice (S), the translation model being
configured to perform an efficient composition operation between
the input and the SIPL, wherein the efficient composition operation
is performed by a multiple level search where each operand in the
efficient composition operation represents a different search
level.
34. The system as recited in claim 33, wherein the system is run on
a portable device.
35. The system as recited in claim 34, wherein the portable device
includes less than 20 MB of operating memory for performing the
translation and the entire translational model is stored in less
than 100 MB.
Description
BACKGROUND
[0002] 1. Technical Field
[0003] The present invention relates to language translation
systems and more particularly to a phrase-based translation system
built within a finite state transducer (FST) framework that
achieves high memory efficiency, high speed and high translation
accuracy.
[0004] 2. Description of the Related Art
[0005] The need for portable machine translation devices has never
been more apparent; however, existing methods for statistical
machine translation generally require more resources than are
readily available in small devices.
[0006] One of the applications for machine translation is a
handheld device that can perform interactive machine translation.
However, the great majority of research in machine translation has
focused on methods that require at least an order of magnitude more
resources than are readily available on e.g., personal digital
assistants (PDAs).
[0007] The central issue in limited-resource statistical machine
translation (SMT) is translation speed. Not only are PDA's much
slower than PC's, but interactive applications require translation
speeds at last as fast as real time. In practice, it may be
difficult to begin translation until after a complete utterance has
been entered (e.g., because speech recognition is using all
available computation on the PDA). In this case, translation speeds
of much faster than real time are needed to achieve reasonable
latencies.
[0008] Various translation methods have been implemented in the
prior art using weighted finite-state transducers (WFSTs). For
example, Knight et al in "Translation with Finite-State Devices",
4.sup.th AMTA Conference 1998, describe a system based on
word-to-word statistical translation models, Bangalore et al. in "A
Finite-State Approach to Machine Translation", NAACL 2001, use
WFST's to select and reorder lexical items for the translation.
More recently, in, the present inventors in Zhou et al.
"Constrained Phrase-Based Translation Using Weighted Finite-State
Transducers", Proc. ICASSP '05, 2005, describe a constraint-phrase
based translation system using WFST's, where a limited number of
frequent word sequences and syntactic phrases are re-tokenized in
the training data. Kumar et al. in "A Weighted Finite State
Transducer Translation Template Model for Statistical Machine
Translation", Journal of Natural Language Engineering 11(3), 2005,
implement a phrase-based approach of the alignment template
translation models using WFSTs.
[0009] In the prior art, a desirable way to handle translation
using the WFST scheme is to first build a search hypothesis
transducer by composing component translation models, and secondly,
the input sentence to be translated is represented as a FSA (finite
state acceptor), which is composed with the transducer as a common
practice. Finally, the translation is the best path in the composed
machine.
[0010] However, a phrase-based translation implemented in the
previous studies is not able to be composed into a static lattice
offline due to practical memory constraints. In order to make the
chain composition computationally tractable, some of the key
component transducers have to be collapsed into smaller machines
through online composing with the given input. For example, in
Kumar et al., the integrated transducers have to be built
specifically for a given input, achieved by a sequence of
composition operations on the fly.
[0011] A significant disadvantage of such previous studies is the
heavy online computational burden, and the loss of advantages of
the FST approach that optimal algorithms can be applied offline for
improved performance. As a result, the computational speeds of
these schemes are significantly slower than those of phrase-based
systems not using FST's. Previous FST systems translate at a speed
around 10 words or less per second compared to the typical speeds
that lie between 100 and 1600 words per second for full blown
computers.
SUMMARY
[0012] Advantageously, embodiments of the present invention enable
the building of a phrase-based translation system developed within
a FST framework that achieves a high speed that is as fast as
between about 4,000 and 7,000 words per second while maintaining
high translation accuracy. This permits making statistical machine
translation (SMT) practical for small devices.
[0013] A phrase-based translation system and method includes a
statistically integrated phrase lattice (SIPL) (H) which represents
an entire translational model. An input (I) is translated by
determining a best path through an entire lattice (S) by performing
an efficient composition operation between the input and the SIPL.
The efficient composition operation is performed by a multiple
level search where each operand in the efficient composition
operation represents a different search level.
[0014] A phrase-based translation method includes providing a
statistically integrated phrase lattice (SIPL) (H) which represents
an entire translational model, and translating an input (I) by
determining a best path through an entire lattice (S) by performing
an efficient composition operation between the input and the SIPL,
wherein the efficient composition operation is performed by a
multiple level search where each operand in the efficient
composition operation represents a different search level.
[0015] In alternate methods, the SIPL includes multiple finite
state transducers computed separately and prior to the translating
step. The finite state transducers may include at least one
language model (L) and at least one translation model (M). The at
least one translation model (M) may be computed offline, wherein
the at least one translation model includes a word-to-phrase
sequencer (P), a phrase translation transducer (T), a target
language phrase-to-word transducer (W). A statistically integrated
phrase lattice (SIPL) may include a chain of conditional
probabilities wherein portions of the chain include finite state
machines. The finite state machines preferably include
determinizable transducers.
[0016] In still other methods, translating includes performing a
state traversal search across the entire lattice (S) wherein each
of multiple levels of the multiple level search is searched
simultaneously. The multiple levels for the multiple level search
may include a level for the input (I), and at least one level for
the SIPL. The multiple levels may include a level for the input
(I), a level for a translation model and a level for language
model. The best path may be determined based on negative log
probability cost. The translating may further include merging
active search states of two or more of the input (I), the language
model (L) and the translation model (M) when the states are
identical. Pruning states to balance between speed and accuracy may
be performed.
[0017] The methods described herein are preferably run on a
portable device, and the portable device can have less than 20 MB
of operating memory. A computer program product may be provided
comprising a computer useable medium including a computer readable
program, wherein the computer readable program when executed on a
computer causes the computer to perform a phrase-based translation
method as described herein.
[0018] A method for training a phrase-based translation model
includes extracting bilingual phrase pairs from utterances and
estimating translation probabilities of the bilingual pairs to
create an inventory of bilingual phrase pairs, and creating a
statistically integrated phrase lattice (SIPL) (H) using the
inventory of bilingual phrase pairs to represent an entire
translational model which includes a plurality of weighted finite
state transducers (WFSTs) including at least a language model (L)
and a translation model (M) against which an input may be compared
to translate phrases.
[0019] In other embodiments, the translation model (M) includes a
word-to-phrase sequencer (P), a phrase translation transducer (T),
and a target language phrase-to-word transducer (W). The entire
translational model is preferably computed offline and stored in a
portable device. A computer program product may be provided
comprising a computer useable medium including a computer readable
program, wherein the computer readable program when executed on a
computer causes the computer to train a phrase-based translation
model as described herein.
[0020] A phrase-based translation system includes a statistically
integrated phrase lattice (SIPL) (H) stored in memory which
represents an entire translational model. A translation module is
configured to translate an input (I) by determining a best path
through an entire lattice (S), the translation model being
configured to perform an efficient composition operation between
the input and the SIPL, wherein the efficient composition operation
is performed by a multiple level search where each operand in the
efficient composition operation represents a different search
level.
[0021] The system may be run on a portable device, and the portable
device may include less than 20 MB of operating memory for
performing the translation, and the entire translational model may
be stored in less than 100 MB.
[0022] These and other objects, features and advantages will become
apparent from the following detailed description of illustrative
embodiments thereof, which is to be read in connection with the
accompanying drawings.
BRIEF DESCRIPTION OF DRAWINGS
[0023] The disclosure will provide details in the following
description of preferred embodiments with reference to the
following figures wherein:
[0024] FIG. 1 is a flow/block diagram showing a high level
method/system for translating phrases in accordance with an
illustrative embodiment;
[0025] FIG. 2 is a flow/block diagram showing a method/system for
training a translational model in accordance with an illustrative
embodiment;
[0026] FIG. 2 is a diagram showing an illustrative portion of a
source sentence segmentation transducer (P) graph in accordance
with an exemplary embodiment;
[0027] FIG. 4 is a flow/block diagram showing a method/system for
translating phrases in accordance with an illustrative embodiment;
and
[0028] FIG. 5 is a block diagram showing an illustrative system for
translating phrases in accordance with an exemplary embodiment.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0029] In accordance with aspects of present embodiments, a new
phrase-based statistical machine translation framework is provided.
A Statistical Integrated Phrase Lattice (SIPL) is constructed that
is statically optimized using weighted finite-state transducer's
(WFST's) algorithms and avoids the commonly used on-the-fly
composition applied in previous related studies of the prior art.
Furthermore, a new decoding method has been developed for this
framework.
[0030] Combining these advantages, the translation system built
upon this framework achieves very fast translation speed yet
produces high translation accuracy. In addition, the architecture
and decoding method have the advantage of better memory efficiency,
and high portability for varied computational platforms.
[0031] In this work, a novel framework for performing phrase-based
statistical machine translation is provided using weighted
finite-state transducers (WFST's) that is significantly faster than
existing frameworks while still being memory-efficient. In
particular, the entire translation model is represented with a
single WFST that is statically optimized, in contrast to previous
work that represents the translation model that must be composed on
the fly. While the language model is dynamically combined with the
translation model, a new decoding algorithm is described that can
be viewed as an optimized implementation of dynamic composition or
efficient composition. Using these techniques, a machine
translation system that can translate at least 500 words/second has
been developed on a PDA device while still retaining excellent
accuracy. The translation system is evaluated on two bidirectional
translation tasks, one for English-Chinese, and one for English and
a dialect of Arabic.
[0032] Embodiments of the present invention can take the form of an
entirely hardware embodiment, an entirely software embodiment or an
embodiment including both hardware and software elements. In a
preferred embodiment, the present invention is implemented in
software, which includes but is not limited to firmware, resident
software, microcode, etc.
[0033] Furthermore, the present invention can take the form of a
computer program product accessible from a computer-usable or
computer-readable medium providing program code for use by or in
connection with a computer or any instruction execution system. For
the purposes of this description, a computer-usable or computer
readable medium can be any apparatus that may include, store,
communicate, propagate, or transport the program for use by or in
connection with the instruction execution system, apparatus, or
device. The medium can be an electronic, magnetic, optical,
electromagnetic, infrared, or semiconductor system (or apparatus or
device) or a propagation medium. Examples of a computer-readable
medium include a semiconductor or solid state memory, magnetic
tape, a removable computer diskette, a random access memory (RAM),
a read-only memory (ROM), a rigid magnetic disk and an optical
disk. Current examples of optical disks include compact disk--read
only memory (CD-ROM), compact disk--read/write (CD-R/W) and
DVD.
[0034] A data processing system suitable for storing and/or
executing program code may include at least one processor coupled
directly or indirectly to memory elements through a system bus. The
memory elements can include local memory employed during actual
execution of the program code, bulk storage, and cache memories
which provide temporary storage of at least some program code to
reduce the number of times code is retrieved from bulk storage
during execution. Input/output or I/O devices (including but not
limited to keyboards, displays, pointing devices, etc.) may be
coupled to the system either directly or through intervening I/O
controllers.
[0035] Network adapters may also be coupled to the system to enable
the data processing system to become coupled to other data
processing systems or remote printers or storage devices through
intervening private or public networks. Modems, cable modem and
Ethernet cards are just a few of the currently available types of
network adapters.
[0036] Referring now to the drawings in which like numerals
represent the same or similar elements and initially to FIG. 1, a
block/flow diagram is illustratively shown for a general
embodiment. In particularly useful embodiments described herein, a
phrase-based translation framework using WFST's is provided that
addresses the issues described above. In block 102, in this
framework, which will be referred to as Statistical Integrated
Phrase Lattices (SIPL's) for simplicity, a single optimized WFST is
statically constructed encoding an entire translational model.
E.g., all transducers are constructed offline in some specific way,
which enables the composition of all component machines into the
Statistical Integrated Phrase Lattice (SIPL), H, with optimization
algorithms such as, e.g., determinization and/or minimization,
applied.
[0037] In block 104, a specific decoder (e.g., a specialized
Viterbi decoder) is designed to translate source sentences which
completely avoid the need for online composition. This Viterbi
decoder can combine the translational model (e.g., a translation
model and a language model) FST's with the input lattice extremely
efficiently using an optimized dynamic composition operation,
resulting in translation speeds of, e.g., 4000 words/second on a PC
and 800 words/seconds on a PDA device.
[0038] A phrase-based statistical machine translational model or
multiple models are implemented in accordance with one embodiment
using weighted finite-state transducers (WFST's). Phrase-based
statistical translation models have shown clear advantages over
word-based models. In contrast to most word-level statistical
machine translation, phrase-based methods explicitly take word
context into consideration when translating a word. By comparing
several schemes for computing phrase-level correspondences, it is
noted that all of the phrase-level methods consistently outperform
word-based approaches.
[0039] Meanwhile, finite-state methods can be applied in a wide
range of speech and language processing applications. More
importantly, automatic speech recognition (ASR) may employ
WFST-based decoding methods, which can be significantly faster than
other types of systems. A WFST-based approach provides the
availability of mature and efficient algorithms for general purpose
decoding and optimization that can facilitate the translation task.
Adopting the notation introduced by Brown et al. in "The
Mathematics of Statistical Machine Translation: Parameter
Estimation", Computational Linguistics, 19(2); pages 263-611;
(1993), the task of statistical machine translation is to compute a
target language word sequence given a source word sequence
f.sub.1.sup.J as follows: e ^ = argmax .times. e 1 I .times. P
.times. .times. r .function. ( e 1 I | f 1 J ) ##EQU1## = argmax e
1 I .times. .times. P .times. .times. r .function. ( f 1 J | e 1 I
) .times. P .times. .times. r .function. ( e 1 I ) ( 1 )
##EQU2##
[0040] in WFST-based translation, the above computation is
expressed in the following way:
=best-path(S=I.smallcircle.M.sub.1.smallcircle.M.sub.2.smallcircle.
. . . .smallcircle.M.sub.m) (2) where S denotes the full search
lattice, I denotes the source word sequence expressed as a linear
finite-state automaton, the M.sub.i are component translation
models, and `.smallcircle.` represents the composition operation.
That is, it is possible to express the models used in equation (1)
in terms of a sequence of WFST's.
[0041] In automatic speech recognition (ASR), it can been shown
that this computation can be made much faster by computing
M*=M.sub.1 .smallcircle.M.sub.2 .smallcircle. . . .
.smallcircle.M.sub.m offline and by applying determinization and
minimization to optimize the resulting machine.
[0042] Because of the large number of phrases used in typical
translation systems, previous WFST-based implementations of phrase
based SMT were unable to compute the entire M* as a single FST due
to computational issues. Instead for these systems, M* is expressed
as the composition of at least three component transducers (two for
the translation model Pr(f|e) and one for the language model Pr(e))
and these component transducers are composed on the fly for every
input sentence I. There are several significant disadvantages to
this scheme, namely the large memory requirements and heavy online
computational burden for each individual composition operation, and
the loss of the benefits from doing static optimization on the
resulting transducers.
[0043] As a consequence, the translation speeds of existing
WFST-based systems are significantly slower than those of other
phrase-based SMT systems. For example, some previous FST systems
translate at a speed of less than a word per second on a personal
computer, which is substantially slower than the speeds of other
SMT systems that can be as high as 100 to 1600 words/second or
more. These speeds make it infeasible to deploy phrase-based WFST
systems for interactive applications.
[0044] Translational Models and FST's: While the task of
statistical machine translation can be expressed using equation
(1), in practice the following decision rule often achieves
comparable results: e ^ = argmax e 1 I .times. .times. P .times.
.times. r .function. ( e 1 I | f 1 J ) .times. P .times. .times. r
.function. ( e 1 I ) ( 3 ) ##EQU3##
[0045] As this formulation has some practical advantages, equation
(3) will be employed instead of equation (1) in the illustrations
herein. Phrase-based translation models explicitly take word
contexts into consideration when making a translation decision.
Therefore, the foreign word sequence is segmented into K phrases,
f.sub.1.sup.K, where 1.ltoreq.K.ltoreq.J, and each "phrase" here
simply indicates a consecutive sequence of words. e is the target
language word sequence. Then: P .times. .times. r .function. ( e 1
I | f 1 J ) = K , f - 1 K .times. .times. P .times. .times. r
.times. .times. ( e 1 l , f _ .times. 1 K , K | f 1 J ) ( 4 )
##EQU4##
[0046] By approximating the maximum sum in equation (4), the
translation model can be expressed as a chain of conditional
probabilities as follows: P .times. .times. r .function. ( e 1 I |
f 1 J ) .apprxeq. max f - 1 K .times. { ( 5 ) P .function. ( K | f
1 J ) .times. P .function. ( f _ 1 K | K , f 1 J ) .times. ( 6 ) P
.function. ( e _ 1 K | f _ 1 K , K , f _ 1 J ) .times. ( 7 ) P
.function. ( e _ 1 I | f _ 1 K , K , f _ 1 J ) .times. ( 8 ) P
.function. ( e 1 I ) } ( 9 ) ##EQU5##
[0047] For simplicity, each line of the equation above will be
referred to as a separate equation. The conditional probability
distributions in equations (6)-(9) can be represented by
finite-state machines (FSM's) that model the relationships between
their inputs and outputs. Therefore, the right-hand side of
equation (5) can be implemented as a cascade of these machines that
are combined using the composition operation. In particular, the
translation task can be framed as finding the best path in the
following FSM:
S=I.smallcircle.Det(P).smallcircle.T.smallcircle.W.smallcircle.L
(10) where Det denotes a determinization operation and where the
transducers P, T, W and L correspond to equations (6)-(9),
respectively. While determinization can be applied to any of the
component transducers, the transducers other than P are either
already mostly deterministic or nondeterminizable.
[0048] In the following section, a description of how a translation
models may be trained is provided, and then the construction of
each component WFST in equation (10) will be described.
[0049] Referring to FIG. 2, model training for a translation method
will now be described.
[0050] Bilingual Phrase Induction and Estimation: One task in
phrase-based translation is extracting bilingual phrase pairs and
estimating their translation probabilities in block 202. For this
step, known procedures may be followed to extract bilingual phrases
from training data. (See e.g., Och et al. in "Improved Alignment
Models for Statistical Machine Translation", Proc. EMNLP/VLC '99,
pages 20-28, MD USA, (1999)). First, bidirectional word-level
alignment is carried out on a parallel corpus. Based on the
resulting Viterbi alignments A.sub.e2f and A.sub.f2e, the union,
A.sub.U=A.sub.e2f U A.sub.f2e, is taken as the symmetrized
word-level alignment.
[0051] Next, bilingual phrase pairs are extracted from A.sub.U
using an extraction algorithm similar to the one described in Och
et al. (1999). Specifically, any pair of consecutive sequences of
words below a maximum length M is considered to be a phrase pair if
its component words are aligned only within the phrase pair and not
to any words outside. The resulting bilingual phrase pair inventory
is denoted as BP in block 204.
[0052] Then, the assumption is made that phrases are mapped from
the source language to target language and are not reordered, and
that each phrase is translated independently: P .function. ( e _ 1
K | f _ 1 K ) = k = 1 K .times. .times. P .function. ( e _ k | f _
k ) ( 11 ) ##EQU6##
[0053] While it is generally sensible to support phrase reordering
during translation, this incurs a heavy computational cost and a
preliminary investigation suggested that this would have a limited
effect on translation accuracy in the domains under consideration.
Also note that while it is assumed that each phrase is translated
independently, the language model will constrain the translations
of neighboring phrases. To estimate the phrase translation
probabilities in equation (11), a maximum likelihood estimation
(MLE) may be used: P M .times. .times. L .times. .times. E
.function. ( e _ | f _ ) = N .function. ( e _ , f _ ) N .function.
( f _ ) ( 12 ) ##EQU7##
[0054] where N( f) is the occurrence count of fand N( e, f) is the
co-occurrence count of faligning with e. These counts are all
calculated from BP.
[0055] An implicit advantage of this type of MLE estimation is that
the resulting model will typically favor phrase pairs with longer
span, which is desirable as more contexts are included in longer
phrases. However, this method will also tend to overestimate the
probabilities of long phrases. To address this issue, the MLE
probabilities are smoothed using a word-based lexicon that is
estimated from word-level Viterbi alignments. For an aligned phrase
pair, e=e.sub.i.sub.1.sup.i.sub.2 and f=f.sub.j.sub.1.sup.j.sub.2,
the smoothing distribution P.sub.S(.cndot.) may be estimated as: P
s .function. ( e i 1 i 2 | f j 1 j 2 .times. .times. ) = i = i 1 i
2 .times. .times. ( 1 - j = j 1 j 2 .times. .times. ( 1 - P
.function. ( e i | f i ) ) ) ( 13 ) ##EQU8##
[0056] Next, in block 206, the phrase level translation
probabilities are combined as: P .function. ( e i 1 i 2 | f j 1 j 2
) = P M .times. .times. L .times. .times. E .function. ( e _ | f _
) .times. P s .function. ( e i 1 i 2 | f j 1 j 2 ) .lamda. s ( 14 )
##EQU9## where .lamda..sub.s.gtoreq.0 is a smoothing factor that
can be tuned. While this is not a properly normalized model, it has
been found to work well in practice.
[0057] In block 208, a statistically integrated phrase lattice
(SIPL) (H) using the inventory of bilingual phrase pairs is created
to represent an entire translational model (S) which includes a
plurality of weighted finite state transducers (WFSTs) including at
least a language model (L) and a translation model (M) against
which an input may be compared to translate phrases. It should be
understood that the lattice H may include multiple language models
and multiple translation models. The translation model (M) may
further include a word-to-phrase sequencer (P), a phrase
translation transducer (T), and a target language phrase-to-word
transducer (W). The entire translational model is preferably
computed offline and stored in a portable device. The finite state
transducers will now be explained in greater detail.
[0058] Source Language Segmentation FST: The source language
segmentation transducer, corresponding to equation (6), explores
all "acceptable" phrase sequences for any given source sentence. It
can be assumed that a uniform distribution exists over all
acceptable segmentations, i.e., P(K.sub.1|f.sub.1.sup.J)P(
f.sub.1.sup.K.sub.1|K.sub.1,f.sub.1.sup.J)=P(K.sub.2|f.sub.1.sup.J)P(
f.sub.1.sup.K.sub.2|K.sub.2,f.sub.1.sup.J)for
.A-inverted.K.sub.1,K.sub.2, f.sub.1.sup.K.sub.1,
f.sub.1.sup.K.sub.2. (15)
[0059] By "acceptable", it is meant that all phrases in resulting
segmentations belong to BP. In addition, the segmentation
transducer forces the resulting segmentation to satisfy:
concatenation( f.sub.1, . . . , f.sub.k)= f.sub.1.sup.J (16)
[0060] Using the WFST framework, the segmentation procedure is
implemented as a transducer P that maps from word sequences to
phrases. For example, Kumar et al. (2005) describes a typical
realization of P. However, in general, this type of realization is
not determinizable, and it is important that this transducer be
determinized because this can radically affect translation speed.
Not only can determinization greatly reduce the size of this FST,
but determinization collapses multiple arcs with the same label
into a single arc, vastly reducing the amount of computation needed
during search. The reason why a straightforward representation of P
is non-determinizable is because of the overlap between phrases
found in BP; i.e., a single word sequence may be segmented into
phrases in multiple ways. Thus, the phrase identity of a source
sentence may not be uniquely determined after the entire sentence
is observed, and such unbounded delays make P
non-determinizable.
[0061] Referring to FIG. 3, an illustrative graph showing the
construction of a portion 300 of transducer P is depicted. Portion
300 includes a plurality of states labeled 1-17, and each arc
connecting the states is labeled using the convention of
input:output. A token <epsilon> (also referred to hereinafter
as epsilon transitions or .epsilon.-transitions) denotes an empty
string and "#" is used as a separator in multi-word labels.
[0062] An auxiliary symbol, denoted as EOP, is introduced to mark
the end of each distinct source phrase. FIG. 3 shows a sample
portion of a resulting transducer. By adding the artificial phrase
boundary markers, each input sequence in FIG. 3 corresponds to a
single segmented output sequence and the transducer becomes
determinizable. Once determinized, the FST can replace the EPO
markers with empty strings in a later step, as appropriate. As it
is assumed that a uniform distribution exists over segmentations,
the cost (or negative log probability) associated with each arc is
set to zero.
[0063] Phrase Translation Transducer: The phrase translation model,
corresponding to equation (7), is implemented by a weighted
transducer that maps source phrases to target phrases. Under the
assumptions of phrase translation independence and monotonic phrase
ordering, the transducer may be a trivial one-state machine, with
every arc corresponding to a phrase pair included in BP. The cost
associated with each arc is obtained based on equation (14).
[0064] To be consistent with the other FST's in equation (10), one
more arc is added in this transducer to map EPO to itself with no
cost. This transducer is denoted as T.
[0065] Target Language Phrase-to-Word FST: After translation, the
target phrases can be simply concatenated to form the target
translation. However, to constrain translations across phrases, it
may be necessary to incorporate the effects of a target language
model in the translation system. To achieve this, the target
phrases are converted back to target words. It is clear that the
mapping from phrases to word sequences is deterministic. Therefore,
the implementation of this transducer is straightforward. Again,
the auxiliary token EPO is placed on additional arcs to mark the
ends of phrases. This transducer is denoted as W, corresponding to
equation (8).
[0066] Target Language Model: The target language model,
corresponding to equation (9), can be represented by a weighted
acceptor L that assigns probabilities to target language word
sequences based on a back-off N-gram language model (See e.g.,
Mohri et al., in "Weighted Finite-State Transducers in Speech
Recognition", Computer Speech and Language, 16(1) pages 69-88,
2002).
[0067] To effectively constrain phrase sequence generation during
translation, an N-gram should be of sufficient length to span
significant cross-phrase word sequences. Hence, a 5-gram language
model is preferably chosen although other N-grams can be
selected.
[0068] The searching aspect of the present invention will now be
described in greater detail.
[0069] Issues with Cascades of WFST's: As mentioned, the decoding
problem can be framed as finding the best path in the lattice S
described in equation (10) given an input sentence/automaton I.
Viterbi search can be applied to S to find its lowest-cost path. To
minimize the amount of computation needed at translation time, it
is desirable to perform as many composition operations in equation
(10) as possible ahead of time. In a preferred embodiment, H is
computed offline:
H=Det(P).smallcircle.T.smallcircle.W.smallcircle.L (17)
[0070] At translation time, one needs only to compute the best path
of S=I.smallcircle.H. Applying determinization and minimization to
optimize H can further reduce the computation needed. In the field
of speech recognition, decoders that fall under this paradigm
generally offer the fastest performance. However, it can be very
difficult to construct H given practical memory constraints. While
this has been done in the past for word-level and constrained
phrase-level systems, this has not yet been done for unconstrained
phrase-based systems. In particular, the nondeterministic nature of
the phrase translation transducer interacts poorly with the
language model; it is not clear whether H is of a tractable size
even after minimization, especially for applications with large
vocabularies, long phrases, and large language models. Therefore,
special consideration is needed in constructing transducers for
such domains.
[0071] Furthermore, even when one is able to compute and store H,
the composition I.smallcircle.H itself may be quite expensive. To
improve speed, it has been proposed that lazy or on-the-fly
composition be applied followed by Viterbi search with beam
pruning. In this way, only promising states in S are expanded
on-demand. Nevertheless, for large H (e.g., millions of states and
arcs), using such operations from general FSM toolkits can be quite
slow.
[0072] The Multilayer Search Algorithm (Efficient Composition):
While it may not be feasible to compute H in its entirety as a
single FSM, it is possible to separate H into two pieces: the
language model L and the translation model M:
M=Min(Min(Det(P).smallcircle.T).smallcircle.W) (18) where Min
denotes the minimization operation.
[0073] Due to the determinizability of P, M can be computed
off-line using a moderate amount of memory. All operations are
preferably performed using the tropical semiring as is consistent
with Viterbi decoding, e.g., when two path with the same labels are
merged, the resulting cost is the minimum of the individual path
costs. The cost associated with a transition is taken to be the
negative logarithm of the corresponding probability. Minimization
is performed following each composition to reduce redundant
paths.
[0074] To address the problem of efficiently computing
I.smallcircle.M.smallcircle.L (or I.smallcircle.H), a multilayer
search algorithm has been developed in accordance with the present
invention. The basic idea is: that the search is performed in
multiple FSM's or layers simultaneously. Specifically, one layer
for each of the input FSM's: I, L and M (or H) is included. At each
layer, the search process is performed via a state traversal
procedure starting from the start state s.sub.0, and consuming an
input word in each step in a left-to-right manner. (Recall that the
translation model does not support phrase reordering, only word
reordering within phrases.)
[0075] This can be viewed as an optimized version of on-the-fly or
dynamic composition, and is similar to search algorithms that have
been used in large vocabulary speech recognition. This optimized
version of composition may be referred to as efficient composition
to denote that reduce operational memory is need to perform this
operation.
[0076] Specialized versions of composition have the advantage of
not only being possibly many times faster than general composition
implementations found in FSM toolkits, but the specialized versions
can also incorporate information sources that cannot be easily or
compactly represented using WFST's. For example, the decoder can
permit application of translation length penalties and phrase
penalties to score the partial translation candidates during
search. In addition, the specialized versions can incorporate new
parameter values (e.g., language model weight) at runtime without
the need for any modification of the input WFST's.
[0077] Each state scan be represented in the search space using the
following 7-tuple: (s.sub.I,s.sub.M,s.sub.L,c.sub.M,c.sub.L, h
s.sub.prev) where s.sub.i, s.sub.M and s.sub.L record the current
state in each input FSM; c.sub.M, c.sub.L record the accumulated
cost in L and M in the best path up to this point; hrecords the
target word sequence labeling the best path up to this point; and
s.sub.prev records the best previous state. The initial search
state s.sub.0 corresponds to being located at the start state of
each input FSM with no accumulated costs.
[0078] At the beginning of the input sentence I, only the start
state s.sub.0 is active. The active states at each position t in I
are computed from the active states in the preceding position t-1
in the following way. For each active state sat position t-1, first
advance s.sub.I. Then, look at all outgoing arcs of s.sub.M labeled
with the current input word, and traverse one of these arcs,
advancing s.sub.M. Then, given the output label o of this arc, look
at all outgoing arcs of s.sub.L with o as its input, and traverse
one of these arcs, advancing s.sub.L. The set of all states
(s.sub.I, s.sub.M, s.sub.L . . . ) reachable in this way is the set
of active states at position t. The remaining state components
c.sub.M, c.sub.L, h, and s.sub.prev are updated appropriately, and
.epsilon.-transitions must be handled correctly as well.
[0079] The set of legal translation candidates are those
hassociated with states swhere each component sub-state is a final
state in its layer. The selected candidate is the legal candidate
with the lowest accumulated cost.
[0080] For each active state, the hypothesis his a translation of a
prefix of the source sentence, and can conceivably grow to be quite
large. However, we can store the h's for each state efficiently
using the same ideas as used in token passing in ASR. In
particular, the set of all active h's can be compactly represented
using a prefix tree, and each state can simply keep a pointer to
the correct node in this tree. To reduce the search space, two
active search states are merged whenever the states have identical
s.sub.I, s.sub.M, and s.sub.L values; the remaining state
components are inherited from the state with lower cost. In
addition, two pruning methods, histogram pruning and threshold or
beam pruning, may be employed to achieve the desired balance
between translation accuracy and speed. To provide the decoder for
a PDA, the search algorithm is preferably implemented using
fixed-point arithmetic.
[0081] Referring to FIG. 4, a phrase-based translation
system/method is illustratively depicted summarizing the process
described above. In block 402, a statistically integrated phrase
lattice (SIPL) or (H) is provided which represents an entire
translational model. The SIPL comprises multiple finite state
transducers computed separately and offline (e.g., prior to the
translation operation). In one embodiment, the SIPL includes a
chain of conditional probabilities wherein portions of the chain
include finite state machines. The finite state transducers may be
determined in pieces. In one preferred embodiment, the SIPL may
include a language model (L) and a translation model (M).
[0082] In block 404, the SIPL may be computed offline. The SIPL may
includes a translation model (M) and a language model (L). The
translation model (M) may include a word-to-phrase sequencer (P), a
phrase translation transducer (T), and a target language
phrase-to-word transducer (W). The finite state machines
(transducers) are preferably determinizable transducers.
[0083] In block 406, an input (I) is translated by determining a
best path through an entire lattice (S) by performing an efficient
composition operation between the input and the SIPL, wherein the
efficient composition operation is performed by a multiple level
search where each operand in the efficient composition operation
represents a different search level.
[0084] In block 408, translating may include performing a state
traversal search across the entire lattice (S) wherein each of the
multiple levels is searched simultaneously. The multiple levels may
include a level for the input (I), and at least one level for the
SIPL. In an alternate embodiment, the multiple levels may include a
level for the input (I), a level for a translation model (M) and a
level for language model (L). The best path is preferably
determined based on cost.
[0085] The step of translating may include merging active search
states of two or more of the input (I), the language model (L) and
the translation model (M) when the states are identical, in block
410. In block 412, pruning states to balance between speed and
accuracy may be performed.
[0086] Referring to FIG. 5, a system 500 is shown in accordance
with an illustrative embodiment. System 500 preferably includes a
portable computing device, such as a personal digital assistant, a
handheld computer, a handheld language translator, or other
portable electronic device with sufficient memory to run the
translation method described herein and store the translational
model. The system 500 may include a full scale system as well,
however, the efficiency of the present invention makes it
particularly useful in smaller systems where memory space is a
premium.
[0087] The system 500 includes storage memory 502. Storage memory
502 may include sufficient space to store a precomputed/created
translational model (e.g., a statistically integrated phrase
lattice (SIPL) or H), which preferably represents an entire
translational model. In an alternate embodiment, the storage memory
may include sufficient space to store at least a translation model
(M). The storage memory 502 may be between about 50 MB to about 200
MB, although more and less memory storage is contemplated.
Preferably, storage memory 502 includes less than 100 MB.
[0088] System 500 may include a separate memory 504 of e.g., less
than 20 MB used for performing translation operations and
computations. Memory 504 may be included in memory 502 on a
translation module 506 or as a separate unit.
[0089] Translation module 506 preferably includes a phrase-based
translation module and is configured to translate an input (I) by
determining a best path through an entire lattice (S). The
translation module 506 is configured to perform a composition
operation between the input and the SIPL. The composition operation
is performed by a multiple level search performed by a specially
designed decoder 508 where each operand in the composition
operation represents a different search level.
[0090] The decoder 508 may include a specially designed Viterbi
decoder configured to perform efficient composition by a multiple
level search of the lattice S. The multiple level searches are
preferably performed by traversing states in S simultaneously at
the input (I) level, and the H level. The H level may also include
M and L as described above. If this is the case, than these two
levels, M and L, are searched simultaneously with I. The best path
(lowest cost) is thereby determined as a result of the search to
translate the input utterance/phrase.
[0091] The input is received or provided by an input device 514.
The input device 514 may include a microphone, keypad, or any other
input means to permit speech, text, or other information to be
input to the system for translation. Likewise an output module 516
may include a speaker, a printer, a display or any other output
device that conveys the translated input.
[0092] It should be understood that multiple translational models
may be included in system to permit translations to/from a
plurality of different language. System 500 may include a user
interface to select the type of translation desired. In preferred
embodiment, system 500 includes many customizable features and
settings.
[0093] Experimental Evaluation: The SIPL translation framework in
accordance with the present invention was evaluated on two speech
translation tasks. The first task is a two-way translation between
English and Chinese, and the other is a two-way translation between
English and a dialect of colloquial Arabic (DCA). The objective of
the speech translation system is to facilitate conversation between
speakers of different languages in real time. Thus, both our
training and test data are sentences transcribed from spontaneous
speech rather than written text.
[0094] Corpora and Setup: The majority of the training corpus of
the English-Chinese system was collected from simulated
English-only doctor/patient interactions, and the dialogs were
later translated into Chinese. As Chinese translations may not be
representative of conversational Chinese, an additional 6,000
spoken sentences were collected directly from native Chinese
speakers, to better capture the linguistic characteristics of
conversational Chinese. After being transcribed and translated into
English, this data set was also, included in our corpus. In total,
there are about 240K utterance pairs but with many repeated
utterances.
[0095] Several dialogs were randomly selected to form a development
set and a test set. No punctuation marks are present in any of the
data as it is assumed users exchange information sentence by
sentence. For the English-Chinese task, a Chinese segmenter was
employed to segment Chinese character sequences into word
sequences. Tables 1 and 2 list some statistics of the data sets
used for the English-Chinese and English-DCA tasks. TABLE-US-00001
TABLE 1 English-Chinese corpora statistics. Data English Chinese
Training set 240K sentences 6.9 words/sentence 8.4
characters/sentence Vocabulary 9690 words 9764 words Dev Set 300
sentences 582 sentences 7.1 word/sentence 8.9 characters/sentence
Test Set 132 sentences 73 sentences 9.1 words/sent 6.2
characters/sent
[0096] TABLE-US-00002 TABLE 2 English-DCA corpora statistics. Data
English DCA Training set 366K sentences 7.9 words/sentence 5.4
words/sentence Vocabulary 24303 words 79960 words Dev Set 395
sentences 200 sentences 10.4 word/sentence 6.5 word/sentence Test
Set 1856 sentences 1856 sentences
[0097] Experimental Results: The maximum phrase length M was set to
values between 5 and 9 depending on the language, as listed in
Table 3. This table also displays the sizes of the statically
constructed translation model WFST's H. While the framework can
handle longer spans and larger numbers of phrases, bigger M did not
produce significantly better results in this domain, probably due
to the short sentence lengths. The development sets were used to
adjust the model parameters (e.g., .lamda..sub.l and .lamda..sub.s)
and the search parameters (e.g., pruning thresholds).
[0098] For the results reported in Table 4, all decoding
experiments were conducted on a Linux.TM. machine with a 2.4 GHz
Pentium.TM. 4 processor. The machine used for training (including
the SIPL building) possessed a 4 GB memory. Since the decoding
algorithm of the SIPL framework is memory efficient, the
translation is performed on a machine with 512 MB memory (the
actual memory needed is less than 100 MB). TABLE-US-00003 TABLE 3
WFST sizes for various models. States Arcs States Arcs M
English-Chinese Chinese-English 7 7 H 2,293,512 3,275,733 1,908,979
2,777,595 M English-DCA DCA-English 5 9 H 6,303,482 11,086,596
8,089,145 11,784,418
[0099] TABLE-US-00004 TABLE 4 Translation Performance. The number
in parentheses is the number of references used in computing the
BLEU score. BLEU (%) English-Chinese (8) 59.57 Chinese-English (8)
32.98 English-DCA (2) 39.83 DCA-English (2) 50.10
[0100] Experimental results are presented in Table 4 in terms of
the BLEU metric (See, Papineni et al. in "BLEU: A Method for
Automatic Evaluation of Machine Translation", Technical Report
RC22176, IBM TJ Watson Research Center, 2001). Since the BLEU score
is a function of the number of human references, these numbers are
included in words parentheses. Note that for English-Chinese
translation, BLEU is measured in terms of characters rather than
words. It should be understood that the speed of translation in
accordance with present embodiments included speeds of hundreds to
thousands of words per second. As a point of comparison prior art
techniques provide speeds of a few words per second to a few
hundred per second at best.
[0101] From Table 4 is can be observed that the present approach
achieves encouraging results for all four translation tasks.
Moreover, using our dedicated translation decoder, all tasks
obtained an average decoding speed of higher than, e.g., 1000 words
per second. Higher or lower speeds may be achieved based upon the
operating conditions and models. For example, the speed varies due
to the complexity and structure of the lattices M and L. The
fastest speed is achieved for DCA-English translation, where the
average speed was, e.g., 4600 words per second. These speeds are
competitive with the highest translated speeds reported in the
literature. More significantly, the complete system can run
comfortably on a PDA or other handheld computing device as part of
a complete speech-to-speech translation system.
[0102] In this case, the translation component preferably runs in
about 20 MB or less of memory, with the FSM's H and L stored on
disk (e.g., taking a total of less than several hundred MB's,
preferably less than 100 MB) and paged in on demand. In one
configuration, the same exact accuracy figures in Table 4 are
achieved but at speeds ranging from high hundreds to thousands of
words/second. Because of these high translation speeds, SMT
contributes almost nothing to the latency in this interactive
application.
[0103] To the knowledge of the present inventors, these are the
first MT results for a handheld device. To give an idea of the
difference in speed between our optimized multilayer search as
compared to using general on-the-fly composition and pruning
operations from an off-the-shelf FSM toolkit, an earlier
toolkit-based SMT system was used for comparison that translated at
around 2 words/second on comparable domains. While the translation
models used are not comparable, this does give some idea of the
possible performance gains from using the specialized decoder of
the present invention. TABLE-US-00005 TABLE 5 Sample Translation
Sentences of the DCA English Task I just came here to visit my
family sure I understand that that's the way that used to be how we
can know which house how long do you have my identification doesn't
have any brothers or cousins we had electricity yes in temporary we
had electricity about twenty meters further away from our house two
stories built with bricks
[0104] In terms of translation accuracy, Table 5 provides some
sample translations produced by a system constructed in accordance
with the present invention for the DCA-English task. Note that the
focus of the present disclosure is on designing computationally
limited machine translation, and it would be unrealistic to expect
equivalent performance with systems with no such constraints. To
give some perspective, the best reported BLEU results for the Tides
evaluation for Modem Standard Arabic to English translation are 51%
for Arabic to English. While that data is most likely "harder" than
the present case, it does suggest that the present system produces
translations of similar quality, albeit on a simpler domain. Thus,
the present system is effective for the domain for which it was
designed.
[0105] Aspects of the present invention include a very fast
phrase-based machine translation framework using statistical
integrated phrase lattices (SIPL's). This WFST-based approach is
well-suited to devices with limited computation and memory. This
efficiency is achieved by employing methods that permit performing
more composition and graph optimization offline, and utilizing a
specialized decoder which performs a multilayer search. High
translation accuracies are achieved in all domains evaluated.
[0106] Having described preferred embodiments of systems and
methods for fast and memory efficient machine translation using
statistical integrated phase lattice (which are intended to be
illustrative and not limiting), it is noted that modifications and
variations can be made by persons skilled in the art in light of
the above teachings. It is therefore to be understood that changes
may be made in the particular embodiments disclosed which are
within the scope and spirit of the invention as outlined by the
appended claims. Having thus described aspects of the invention,
with the details and particularity required by the patent laws,
what is claimed and desired protected by Letters Patent is set
forth in the appended claims.
* * * * *