U.S. patent application number 10/091462 was filed with the patent office on 2002-09-12 for operator-assisted translation system and method for unconstrained source text.
Invention is credited to Anglehart, James, Brandon, Marek, Veres, Maria.
Application Number | 20020128814 10/091462 |
Document ID | / |
Family ID | 21956061 |
Filed Date | 2002-09-12 |
United States Patent
Application |
20020128814 |
Kind Code |
A1 |
Brandon, Marek ; et
al. |
September 12, 2002 |
Operator-assisted translation system and method for unconstrained
source text
Abstract
The computer-based translation system allows a person having
full understanding and competency in a language of an original
source text to confirm the meaning of the source text without
requiring knowledge of any other language so that data concerning
the meaning of the original language text may be used to translate
automatically the original text into the other language or other
languages.
Inventors: |
Brandon, Marek; (Pincourt,
CA) ; Veres, Maria; (Montreal, CA) ;
Anglehart, James; (Montreal, CA) |
Correspondence
Address: |
OGILVY RENAULT
1981 MCGILL COLLEGE AVENUE
SUITE 1600
MONTREAL
QC
H3A2Y3
CA
|
Family ID: |
21956061 |
Appl. No.: |
10/091462 |
Filed: |
March 7, 2002 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10091462 |
Mar 7, 2002 |
|
|
|
09394279 |
Sep 10, 1999 |
|
|
|
6385568 |
|
|
|
|
09394279 |
Sep 10, 1999 |
|
|
|
PCT/CA98/00549 |
May 27, 1998 |
|
|
|
60048715 |
May 28, 1997 |
|
|
|
Current U.S.
Class: |
704/1 |
Current CPC
Class: |
G06F 40/247 20200101;
G06F 40/55 20200101; G06F 40/58 20200101; G06F 40/221 20200101;
G06F 40/47 20200101; G06F 40/211 20200101 |
Class at
Publication: |
704/1 |
International
Class: |
G06F 017/20 |
Claims
We claim:
1. A translation system for translating a meaning code into an
output text, said meaning code comprising an identification code
corresponding to a meaning for each word found in an input source
text and sufficient grammatical information related to each said
identification code and to a selected generic sentence structure,
said system comprising: a meaning code to destination language
database providing a translated term corresponding to each said
identification code in said meaning code; a sentence structure
database containing data defining a number of generic sentence
structures acting as a model determining where grammatical
components of a sentence are to be found and relationships between
said grammatical components; and a sentence builder compiling said
output text using said sentence structure database data for those
sentences having said selected generic sentence structure, each
said translated term and said grammatical information for each
sentence structure contained in said meaning code.
2. The system as claimed in claim 1, further comprising: an output
text editor displaying said output text to a user, displaying
meaning information concerning at least some of words in said
output text, and accepting user input to edit said output text.
3. The system as claimed in claim 2, wherein said output text
editor further displays information concerning grammatical
relationships of said words.
4. The system as claimed in claim 2, wherein said meaning code
includes input language text information, said output text editor
further displays information concerning said input language text to
aid said user understand said output text in the case said user
understands said language of said input text.
5. The system as claimed in claim 1, wherein said system is
provided by plug-in software operating on a personal computer in
conjunction with a browser or text editor program.
6. The system as claimed in claim 1, wherein said meaning code is a
substantially numeric code.
7. The system as claimed in claim 6, wherein said meaning code
includes text representing said words of said input text, and said
sentence builder uses at least some of said text representing said
words of said input text, whereby said meaning code can be
converted more faithfully into said language of said input text.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of allowed U.S. patent
application Ser. No. 09/394,279 filed Sep.10, 1999 which was a
continuation of PCT/CA98/00549 filed May 27, 1998 designating the
U.S. and which claimed priority of U.S. Provisional patent
application Serial No. 60/048,715 filed May 28,1997.
FIELD OF THE INVENTION
[0002] The present invention relates to a computer-based
translation system. More particularly, the present invention
relates to a computer-based translation system in which a person
having full understanding and competency in a language of an
original source text to be translated provides and/or confirms the
meaning of the text without requiring knowledge of any other
language, so that data concerning the meaning of the original
language text may be used to automatically translate the original
text. The invention also relates to a computer-based translation
system which automatically translates meaning code data
representing a source text in order to obtain a translation in a
particular language. The invention further relates to a method of
translating in which meaning data concerning a source text is
provided by a person familiar with the language of the source text
without requiring any knowledge of another language, and then
translating the meaning data into a translation destination text
automatically without requiring any other understanding of the
source language.
BACKGROUND OF THE INVENTION
[0003] In the field of automated translation systems, two
approaches have traditionally been taken. In the first approach,
artificial intelligence has been used to provide a best guess of
the meaning of the source language in order to be able to generate
automatically a translation of the source text. Such automated
systems recognize parts of speech in the source language and this
grammatical information is used in order to reconstruct in the
destination language a suitable translation. When a word in the
source language has two meanings, the most probable meaning based
on the context is used in order to provide the translation The
context is determined by the presence of other words. The output
from such systems is a translated text which to date has been of
dubious quality and reliability.
[0004] In the second type of translation systems, the automated
translation systems provide an aid to translators in which the
source text is automatically parsed grammatically and each possible
translation for each word in the sentence may be selected by the
translator in order to obtain efficiently the translation text. The
translator must be knowledgeable as to the meaning of the original
language as well as the destination language in order to be
competent to confirm that the passing of the source text is
accurate and to select the correct translations for each word in
the sentence, and thus produce an accurate translation.
[0005] In the prior art, two attempts to provide a different type
of translation system are worth noting. In U.S. Pat. No. 5,587,903
to Yale, a sentence input by a user is translated into Esperanto
using his or her native language. This is similar to the second
type of translation systems, except that the user is translating
from his or her native language into Esperanto, and the translation
includes databases containing relational and/or grammatical
information about the Esperanto text. The result obtained is to map
the thought of the sentences translated in a form recognizable by a
machine. In "Technical translation as information transfer across
language boundaries" by P. C. Ganeshsundaram, Journal of
Information Science 2(1980), pp. 91-100, a framework for
pre-editing a text in the source language to define parts of speech
of the words is disclosed. In this pre-editing, no translation or
determining of the meaning of the words is carried out. For basic
technical texts, it is proposed that the pre-edited text can be
accurately machine translated using literal translations of the
pre-edited words into one of many target languages.
OBJECTS OF THE INVENTION
[0006] It is a general object of the present invention to provide a
translation system in which the burden of defining the exact
meaning of a text to be translated is carried out by a person
knowledgeable of the language and of the meaning of a text to be
translated without requiring any knowledge of the language into
which the text is to be translated. Data representing the exact
meaning is stored in order to facilitate automated translation into
one or more destination languages. For example, the author of a
text who wants his or her text to be readily translated into other
languages may use a text editor according to the invention in order
to provide the necessary meaning data in order that translation can
be done automatically without requiring any further linguistic
data.
[0007] It is a further object of the present invention to provide
an automatic translation text generator which creates a translation
text from the product of a unilingual meaning editor.
SUMMARY OF THE INVENTION
[0008] According to a broad aspect of the invention, there is
provided a translation system for translating a meaning code into
an output text. The meaning code comprises an identification code
corresponding to a meaning for each word found in an input source
text and sufficient grammatical information related to each the
identification code and to a selected generic sentence structure.
The system comprises a meaning code to destination language
database providing a translated term corresponding to each
identification code in the meaning code; a sentence structure
database containing data defining a number of generic sentence
structures acting as a model determining where grammatical
components of a sentence are to be found and relationships between
the grammatical components; and a sentence builder compiling the
output text using the sentence structure database data for those
sentences having the selected generic sentence structure, each the
translated term and the grammatical information for each sentence
structure contained in the meaning code.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] The invention will be better understood by way of the
following detailed description of the preferred embodiment with
reference to the appended drawings in which:
[0010] FIG. 1 is a schematic block diagram of the unilingual
meaning editor according to the preferred embodiment; and
[0011] FIG. 2 is a schematic block diagram of the automatic
translation generator according to the preferred embodiment.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
[0012] In the preferred embodiment, a computer system such a
general-purpose personal computer is provided with software to
provide specific functions as will be described hereinbelow.
[0013] As shown in FIG. 1, the unilingual meaning editor 10
comprises text input means 12 such as a communications interface or
any other suitable source of text data. A storage memory 14 is used
to hold the input text as well as a temporary file for the output
meaning code data. A grammatical parser 15 parses the sentences in
the input text stored in memory 14 and displays the parsed
sentences on a display 20 identifying the particular parts of
speech for each of the words and identifying the most probable or
simply the first meaning of each of the words in the sentence being
displayed. Software grammar parsers per se are known in the art.
The language database 25 provides word definition and
part-of-speech data to the parser along with grammar requirements
for the set of languages into which the meaning code data is
intended to be translated. The database 25 combines the term
database, which includes a list of all words and expressions in the
`A` language which are matched with a meaning in the meaning
definition set, with the database of meanings which includes the
corresponding identification codes used to build up the resulting
meaning code.
[0014] The parser 15 allows the user to select the appropriate
meaning for each of the words or group of words appearing in the
sentence using a meaning selector 16 which may be part of a
graphical user interface. The user is also required to provide
grammatical information unrelated to the original language or the
meaning in the original language which, however, may be required in
order to produce the translation in the languages into which the
meaning code data is to be translated. For example, it may be
necessary to identify the gender of a person in order to be able to
translate words associated with that person correctly in another
language, whereas in the original language such gender information
is not required. In the case that the original sentence structure
is simply too difficult to easily define the meaning of the terms
therein, an editor 18 is provided to change the original sentence
in order to facilitate definition or specification of the meaning
of the words contained in the sentence.
[0015] The meaning selector 16 may be provided by software which
causes a window to be displayed indicating the definitions which
are possible for a given word or group of words appearing in the
sentence which is provided in a main display window with the word
or group of words appearing highlighted. Using the graphical user
interface, the user selects the particular definition of the word
or group of words which best suits the meaning in the original
language. In the case that a word or group of words has an
unambiguous definition in the language or the plurality of
languages into which the input text is to be translated, there is
no need to select one of a plurality of meanings using the meaning
selector although the part of speech still needs to be
confirmed.
[0016] When a word or group of words is not found in the term
database of database 25, thesaurus means 19 offers the user a
reference tool for finding an alternate word or expression having
the same meaning in the language of the input text which can be
substituted by the user for the orginal word or group of words is
not found in the term database of database 25. The thesaurus means
may also provide dictionary definitions of words, in addition to
synonyms and antonyms.
[0017] As previously mentioned herein, any word or group of words
which cannot be defined using the terms found in the language
database 25, the meaning selector provides the option that the
particular term be left untranslated since the term is to be
considered to be a new term, as of yet undefined in other
languages, or the term is a trade mark, etc. In such case, the
meaning selector is used merely to designate the part of speech for
the word or group of words.
[0018] As can be appreciated, the source of the language database
25 could be an online source in order to ensure that the database
25 is up-to-date and complete. When a word in the input text 12
cannot be found in the language database 25, it would be possible
to provide communication means (e-mail, telephone or the like) for
the user to communicate with the compiler of the language database
25 in order to inform the database compiler that the specific word
or group of words cannot be found at all within the language
database or that the specific meaning intended for the word or
group of words cannot be found within the language database. The
language database compiler can then provide an update to the
language database.
[0019] The meaning database in database 25 cannot be edited by the
user, but rather the inventory of meanings may only be corrected or
expanded by the producer of the software. However, to facilitate
the user's own editing of text to generate meaning code, an editor
29 is provided to allow the user to create new terms in the term
database and to link them to established meaning entries in the
meaning database of database 25. The editor 29 can also be used to
change the links between an existing term in the term database and
entries in the meaning database. This allows the user to create,
for example, a new entry for "flapjack" and link it to the meaning
definition "(noun) thin cake cooked on a pan or griddle" previously
only linked to the term "pancake". As another example, the existing
term "plug" (meaning as a verb to connect) could have an extra link
added to the definition "(verb) to advertise or promote by way of
an action" previously only linked to "pitch". The database 25 may
thus be improved to best suit a user's needs over time with use by
the user.
[0020] Updates to database 25 which do not change meaning
identification codes or definitions may be implemented regularly,
while updates that create new meaning identification codes require
corresponding updates to the Readers translating the meaning code,
and thus should not be as frequent. To ensure backward
compatibility, editor 10 may include meaning identification codes
in the meaning code 26 generated which includes the improved new
meaning identification code according to a most recent version (a
more accurate meaning) along with the old meaning identification
code for older versions (a less accurate meaning), so that the
Reader software may use the most recent meaning code it is able to
recognize, while not refusing to provide a translation because of
incompatible versions.
[0021] When the unilingual meaning editor 10 has been used to
define the meaning of the entire input text, the parser 15 sends a
signal to the storage means 14 to place the meaning code data into
an output file 26. The output file 26 may be communicated by
electronic means to the person who wishes to obtain a translation
of the input text. The meaning code data may be used within a same
computer on which the translation system operates in order to be
able to proceed automatically with the generation of a translation.
In the preferred embodiment, the meaning code data includes
information concerning the specific definitions of each word or
group of words appearing in the input text as well as the
grammatical attributes for each word or group of words and the
relationship between the words in the input text. In the preferred
embodiment, additional information is contained in the meaning code
data 26 in order to ensure that a translation of the meaning code
data back into the original language of the input text will
generate an exact copy of the input text. Therefore, in the case
that the editor 18 was used to restructure a sentence or to change
one word for a synonym in order to ascribe a meaning to the
original input text which is closer to the definitions found in the
language database, the meaning code data contains additional
information concerning the original words or group of words which
were replaced by substitutes using editor 18 before selecting an
appropriate meaning.
[0022] As shown in FIG. 1, the unilingual meaning editor 10 is
supplemented with a memory 28 for storing meaning code data created
from a previous revision of the original text for the purposes of
generating the meaning code data for use with an automatic
translation generator for a language or group of languages (e.g.
languages `Y`) different from the language or group of languages
corresponding to the language database 25 (e.g. languages `X`). A
correspondence database 27 between the two different destination
languages is thus also provided and the meaning code data 28 for
the other language along with the correspondence table data 27 is
provided to parser 15 in order to provide on display 20 the input
text 12 already parsed and with meaning defined, inasmuch as there
are common similarities between the two destination languages (e.g.
between `X` and `Y`).
[0023] The user of the unilingual meaning editor is then required
merely to specify those meanings and provide the grammatical parts
of speech interpretation information which are unique to the
language in database 25. Since the bulk of the grammatical and
meaning selection has already been done for the previous language,
the ability to generate the output meaning code for the destination
language corresponding to database 25 can be done relatively
quickly. As can be appreciated, the preferred embodiment offers
interoperability between destination languages or groups of
destination languages while using the unilingual meaning editor in
case that the person using the unilingual meaning editor has the
task of defining the meaning of the input text for many different
language groups (e.g. Romance, Oriental, Indian, etc.).
[0024] With reference to FIG. 2, the automatic translation
generator 11 will now be described. The meaning code data file 26
is part of a computer memory which is read by an interpreter 30.
The interpreter 30 obtains from a language database 35 specific to
the language into which the meaning code data is to be translated.
For each word or group of words, the associated meaning code is
looked up in the language database in order to obtain the correct
term. The part of speech information and relationship information
with respect to other words in the sentence is obtained from the
meaning code data in order to change the form of the word or group
of words in accordance with grammatical rules included in the
language database 35. For example, verbs need to be conjugated in
languages which involve verb conjugation. Some of the grammar
information may not be required in the destination language and, as
such, some of the meaning code data may not be used by interpreter
30 when producing the output text in the destination language.
[0025] The output text in the destination language is stored in a
memory 32 and an editor system 33 including a display window 34 is
provided to handle post automatic translation editing in the event
that the person receiving the translation in the destination
language wishes to make stylistic changes to the translation text.
The editor 33 is provided not only with the text output from memory
32 but also with the appropriate information concerning the
definitions of the terms in the destination language obtained from
the language database 35 which correspond to the meaning codes
responsible for producing the output text. The editor 33 may also
display the grammatical relationship between the terms in the
translation text in order to provide the person using editor 33
with a greater understanding of the translation text in order to
make it easier to carry out corrections which still remain faithful
to the original meaning of the text in the source language.
[0026] It is presumed that the person operating editor 33 may have
no knowledge of the original language. However, in the special case
that the person operating editor 33 has knowledge of the original
language, the original language text could also be produced
alongside the translation text by providing interpreter 30 with
access to the information contained in the original language
database 25 and outputting to the editor 33 the original language
text.
[0027] In the preferred embodiment, the input text may include
format data and this format data may be passed through the
unilingual meaning editor into the meaning code data 26. In the
case of a HTML text for a web browser, the input data format may
include specifications as to text block position and dimensions in
order that such information may be passed on to the meaning code
data. As a consequence, the automatic translation generator 11 may
have a module integrated with the interpreter 30 for the purposes
of automatically generating an HTML output file which would
resemble in layout and font style an original HTML file in the
original language. In the preferred embodiment, the X language
database 35 and the interpreter 30 may comprise the heart of a
plug-in module to be integrated with a web browser. In this case,
the meaning code data 26 would be included in the downloaded file
to be viewed using a web browser.
[0028] With a view to improve understanding of the present
invention, the preferred embodiment will now be described in
greater detail in three specific portions. Firstly the meaning
database and coding used for parts of speech in the meaning code
will be described. Secondly, the meaning editor will be described.
And thirdly, the Reader or machine translation apparatus for
translating the meaning code into output text is described.
[0029] Catome Description
[0030] Within each language handled by the preferred embodiment,
the linguistic databases and tables that drive its capabilities in
that language are held within a logical structure which is termed a
Catome. This new term is a contraction from the descriptive phrase
"CATalogue Of MEanings" which accurately portrays the principal
functions of this structure. The Catome comprises the term database
and the meaning database.
[0031] High-level Structure
[0032] The primary needs that drive the high-level structure of the
Catome are two-fold:
[0033] Size of the Catome--as small as possible
[0034] Speed using the Catome--as fast as possible
[0035] The preferred embodiment uses various techniques to reduce
the size of the Catome--compression to reduce the size of the
Catome during downloads and the use of model tables to reduce the
storage space needed to hold the different forms of each noun and
verb. To satisfy the requirement for speed, there are two indices
that allow for direct access retrieval of information from the
Catome.
[0036] To satisfy the need for a compact size and fast speed, the
Catome contains the following databases, tables and indices:
[0037] Word/Meaning database
[0038] Sentence Structure database
[0039] Idiom database
[0040] Regular Verb model table
[0041] Irregular Verb model table
[0042] Regular Noun model table
[0043] Irregular Noun model table
[0044] Modal Verb model table
[0045] Pronoun model table
[0046] Contraction Model table
[0047] Word/Meaning Index by Word-Identifier
[0048] Word/Meaning Index by Meaning-Identifier
[0049] The essential difference between a database and a table in
the preferred embodiment is that the databases contain further
properties of the basic entities within them, whereas the tables
are simple two-dimensional arrays of basic entities.
[0050] Database Overviews
[0051] Word/Meaning database
[0052] This database is the largest within the Catome. It contains
all the words of the language with their associated properties. The
database has ten logical sections each devoted to a specific part
of speech:
[0053] 1.Adjectives
[0054] 2.Adverbs
[0055] 3.Articles
[0056] 4.Conjunctions
[0057] 5.Interjections
[0058] 6.Nouns
[0059] 7.Prepositions
[0060] 8.Pronouns
[0061] 9.Verbs
[0062] 10.Numbers
[0063] The first digit of the Word-Identifier uses the
corresponding number of the logical section it belongs in. This is
necessary to help in situations where the same word exists in two
or more different parts of speech, a situation very common in
English. For example, the verb "to keep" has a usage and meanings
entirely different to the English noun "keep" (the strongest part
of a castle). The Word-Identifier is a 6-digit number qualified by
a period and a two-digit usage field i.e. 999999.99
[0064] The two-digit usage field is highly specific to each
language. Its primary use is to help the editor identify the
particular way in which a modifiable part of speech (adjectives,
adverbs, nouns and verbs) is being used by users as they type text
in the input window of the Editor.
[0065] Properties
[0066] Some parts of speech have properties--some specific to the
language, some relevant to other languages. Verbs can be transitive
or intransitive; nouns can be "Proper", they can have a gender,
they may exist only in the singular or plural form and so on. They
may exist in some but not all regional dialects of the language.
The most important property however is "Meaning". Apart from
numbers (which are self-referring to meaning) all other parts of
speech have a "Meaning-Identifier".
[0067] The Property of Meaning
[0068] Each word in the Word/Meaning Catome has a
Meaning-Identifier associated with it. This allows us to
differentiate between the different meanings that a word can have
in a language. A word such as "fan" in English can be a verb "to
fan", or a noun "a fan" which could mean either a device to move
air, or a supporter of something. We differentiate between verbs
and nouns by the first digit of the word-identifier. But for a
specific word with different meanings within the same part of
speech, we assign them different "Meaning-Identifiers". A "fan" as
a noun therefore would have two entries in a Catome--each with a
different "meaning-identifier" to differentiate between the device
to move air and a human person who is a supporter of something.
[0069] Meaning Loops
[0070] In order to act as a thesaurus and lexicon, the inventors
have evolved the concept of "Meaning Loops". Each "meaning
identifier" is part of a "meaning loop" that consists of other
words with the same meaning. Each word in the Catome is linked to a
"meaning loop" through its "meaning identifier" acting as the key.
But this concept is extended. Each "meaning loop" has additional
pointers: the first to a higher "class of meaning", the second to a
lower class of meaning loop. For example, consider the word "male"
being used as a noun. It points to a meaning loop that has synonyms
for "male" such as "man", "stud", etc. That meaning loop points to
a higher class meaning loop that contains words such as "being",
"person" etc.
[0071] It also points to a lower class of meaning that contains
words such as "man", "cob", "boy", "stallion" that contain
"maleness" in their meanings.
[0072] Meanings and Meaning Loops are common across all Catomes--it
is the meaning that is derived from the text being typed by the
user, and it is those meanings which are imbedded in the meaning
code or CCML language to be interpreted by other language readers.
The user can ensure that the meaning of their words is precise
through interaction with the Editor. The meaning-identifier will
target the correct word to use in automatic translation by a Reader
for another language.
[0073] Sentence Structure Database
[0074] Words and meanings are crucial in achieving the preferred
goal of 100% translatability into other languages. But taken alone,
they cannot do that. It is the ability to recognize the sentence
structure being used that is the other component. The preferred
embodiment has a database of generic sentence structures that act
as a model to determine where the grammatical components of the
sentence are to be found, and the relationships between them.
[0075] The key to sentence structure is to determine which "Verb
Phrase" is being used in the sentence. The main verb phrase in the
sentence points to a set of generic sentence structures supporting
that specific "verb phrase" which are then used by the Editor to
identify the subject, object, subordinate clauses, adverbial
phrases and so forth--the grammatical components of the sentence.
Sentence structures are themselves classified into "positive",
"conditional", "querying" and "imperative" sub-types. There then
exist a "negative" form of each sentence type within each
sub-class. (An `imperative` sentence structure such as "Go Away!"
has its "negative" form of "Do not go away!") The sentence
structures use a generic coding scheme to show where words from the
different parts of speech are to be expected in the input text.
[0076] In the above example, the verb is represented as "9*.8*";
the "9*" indicates that any verb can be used, but only in its
imperative usage--represented by the two-digit qualifier of usage
"8*"following the period.
[0077] Idiom Database
[0078] Within the Idiom database can be found expressions and
phrases which have a meaning that would not be conveyed merely by
translating their constituent words. The sentence "The reason for
the breakdown was the dying battery and not the starter motor, an
entirely different kettle of fish" uses the idiom "different kettle
of fish". If this phrase were translated into other languages, it
would not help the reader understand the meaning of the phrase nor
its influence over the meaning of the whole sentence.
[0079] The Idiom database contains a current list of such idioms,
cliches and other multi-word phrases in common usage. Each is
assigned an "Idiom Identifier" and also an equivalent phrase that
conveys the meaning more accurately. In the example used earlier,
the phrase "different matter" would be attached to the generated
meaning code or CCML code from the Editor. If there were no
equivalent idiom in another language, the reader for that language
would show the translated equivalent of "different matter" rather
than translate the original idiom. An equivalent idiom in another
language exists if it has an "Idiom Identifier" which is the same
as the that of the original idiom.
[0080] Table Overviews
[0081] Regular Verb Model Table
[0082] For each type of regular verb within the language, there
exists a full declension of the verb for each tense that is in
current usage in that language. The table also identifies (where
appropriate) infinitives, gerunds and past participles.
[0083] Irregular Verb Model Table
[0084] For each specific irregular verb type within the language,
there exists a full declension of the verb for each tense that is
in current usage in that language. The table also identifies (where
appropriate) infinitives, gerunds and past participles. In
addition, a regular verb is considered to be irregular if the
verb's declension differs between regional versions of the same
language. The verb "to dive" in English is considered irregular
because the past participle differs from "dived" to "dove" between
British and American usage.
[0085] The concept of irregular verb type arises from languages
having some irregular verbs based on another unique irregular verb.
The verb "to become" in English follows the irregular verb model
for the verb "to come" as a good example.
[0086] Regular Noun Model Table
[0087] For each type of regular noun within the language, there
exists a table that models the way that the noun is suffixed in
particular usage within a sentence structure. In English, this
table lists the following situations for noun endings:
[0088] Noun Singulareg:BusTrainPlaneCity
[0089] Noun Pluraleg: BusesTrainsPlanesCities
[0090] Noun Singular Possessiveeg:Bus'sTrain's Plane'sCity's
[0091] Noun Plural Possessiveeg: Buses'Trains'Planes'Cities'
[0092] Other languages would list their particular suffix usage.
French, for example, does not have the grammatical concept of a
possessive suffix for nouns and the French table would not list
these situations.
[0093] Irregular Noun Model Table
[0094] Some languages have nouns that do not follow regular models
or they are suffixed differently depending on the regional variant
of that language. Also, nouns that can be used in a "singular" or
"plural" context only would be listed in this table. English has
examples such as "Fish" and "Men". These situations are identified
and listed in the Irregular Noun Model Table.
[0095] Modal Verb Model Table
[0096] Lists the uses of the modal verbs. These include the words
should, would, can, will, may, ought, dare and might as well as the
ubiquitous "be" and "have". These modal verbs precede and modify
the meaning of the following verb. "Be" and "have" are used
extensively as the basis of many tenses in English, for example the
passive past perfect "I have been misled." The modal verbs have
their own sentence structure entries in the Sentence Structure
database because they are a constituent part of the "Verb phrase"
by which the Sentence Structure database is organized.
[0097] Pronoun Model Table
[0098] A table that contains all the pronouns of the language and
their different forms. In English we would find:
1 I Me My Mine You You Your Yours He Him His His She Her Her Hers
It It Its (Its) We Us Our Ours You You Your Yours They Them Their
Theirs
[0099] Contraction Model Table
[0100] A table that lists all the commonly found contractions with
their fully expanded form commonly found in the language of the
Catome. These apply typically to pronoun subjects followed by a
modal verb, or to the negative use of a modal verb. Some
contractions have two separate expansions--for example in English
"I'd" is a contracted form of "I had" or "I would". The context
usually defines which expanded form applies. The Contraction Model
Table lists two entires in this instance.
[0101] Index Overviews
[0102] Word/Meaning Index by Word-Identifier
[0103] This index associates a word in the language with its
corresponding entry in the Word/Meaning database. Used in the
Editor when creating the meaning code or CCML language, this index
allows for the instant look-up of words as they are typed in the
Editor's input window.
[0104] Word/Meaning Index by Meaning-Identifier
[0105] In the Reader, the main retrievals from the Word/Meaning
database are meanings taken from the inter-lingua CCML to be
translated into the appropriate words in the Reader's language.
This index allows for instant direct retrieval from the Catome.
[0106] Catome Operating Modes
[0107] Development
[0108] The Catomes are held on Microsoft Access databases. They are
updated and expanded by linguists. There are two other databases on
the development side that the customer never sees. These are
the:
[0109] Universal Meaning Identifier Database
[0110] Universal Idiom Identifier Database
[0111] These two databases integrate meanings and idioms in each
different language.
[0112] Product
[0113] For both the Editor or Reader, the Catome provided is in
"read-only" mode and cannot be changed by the user. It cannot be
accessed directly by the user; it is accessed solely by Linguistic
Modules to provide data to specific internal tasks within the
Editor or Reader.
[0114] Editor Description
[0115] Introduction
[0116] For every input or source language supported by the
preferred embodiment, there is a unique Language Editor product.
These may be sold over the Internet using electronic commerce
transactions based on credit card processing. The Editor can be
activated as an add-on to Microsoft's Word, Internet Explorer,
Outlook, Qualcast's Eudora, Netscape Communicator and Corel's
WordPerfect software. In any of these settings, the Editor is
activated from a windows pull-down menu.
[0117] The first function of the Editor is to scan text typed by
the Editor user, and through a dialogue of pop-up windows, ensure
that the meaning of each word is identified and that each sentence
is grammatically correct.
[0118] The second main function of the Editor is to take the
sentence and translate it into a proprietary CCML language or
meaning code. The resulting CCML can be read and translated into
any language supported by a compatible Reader software.
[0119] Installation
[0120] Customers wishing to obtain a copy of the Editor may
download the product from a web site using an Internet browser. The
Editor is preferably "locked" against illegal copying and software
piracy. As installation starts, the customers are invited to
complete a secure HTML form with their credit card information and
demographic data (optional). Once they send off the form from their
browser, and the transaction is accepted, they receive a unique,
one-time "unlock key" that will allow the installation to
proceed.
[0121] During installation, the users are asked which regional
language they would like to use. For the English language the
choices would be:
[0122] UK English
[0123] US English
[0124] Canadian English
[0125] Australian English
[0126] Similarly, for example, for French, the Reader offers:
[0127] Paris French
[0128] Quebec French
[0129] Belgian French
[0130] Swiss French
[0131] The user can always change this regional language setting
through a pop-up menu at any time.
[0132] Starting the Editor
[0133] The Editor in the preferred embodiment is activated through
a pull-down window (normally "tools") in the menu bar of the text
processor, Internet Browser of word processing product. Instead of
typing into those products, the user is presented with an Editor
Input window. The user types text in this window, a sentence at a
time and ends the sentence normally with a period, exclamation
mark, question mark or a colon. They are invited to press F7 if the
sentence has indeed been typed completely.
[0134] Linguistic Processing
[0135] Linguistic Step 1
[0136] The first step involves a word-by-word translation into
meaning code, i.e. the Catome-to-Catome Meaning Language or just
CCML for short. Each word is compared to see if it matches a word
in the Catome or term database. The Catome is in fact a combination
of both the term database and the meaning database.
[0137] If the word does not exist, a pop-up window appears offering
the user three choices:
[0138] 1.Use the word as typed. It will appear untranslated to
someone reading the text with a Reader for a different
language.
[0139] 2.Change the word back in the input window and re-submit the
sentence.
[0140] 3.Select the dictionary function and get a list of words
from the Catome with closely associated spelling, then choose the
word that you meant to type.
[0141] If the word does exist, a pop-up window appears if the
Catome has several entries for that word. Listed in this window are
each entry found in the Catome, its part of speech (i.e. adjective,
noun, verb etc.) and a close synonym to establish the specific
meaning of that entry in the Catome. The user selects the entry
with the desired meaning. Once all the words have been processed,
the Editor prepares the CCML equivalents for each translatable
word, using the "Meaning Indicator" from the Catome as the CCML
value for each selected or unambiguous word.
[0142] Linguistic Step 2
[0143] The CCML sentence is scanned for Idioms (including cliches
and other multi-word phrases or expressions). If there is a match
between CCML words and an entry in the Idiom database within the
Catome, the Catome returns an "Idiom Indicator"--a number that
uniquely identifies the idiom. This is used to replace the CCML
text representing the original idiom. In addition, the Catome
passes a set of words in CCML that express the real meaning of the
idiom, in case the idiom identifier has no equivalent idiom in
another language. This CCML text is appended to the CCML word with
the "Idiom Identifier". (On translation using a Reader, if there is
no equivalent idiom in the other language, this alternative CCML is
used.)
[0144] Linguistic Step 3
[0145] The CCML sentence is scanned to identify any CCML component
that needs to have specific attributes that the Catome was unable
to furnish when the original words were processed. This occurs in
two situations:
[0146] Pronouns. With a pop-up window, the Editor asks once for
each different pronoun if it refers to a masculine, feminine or
inanimate object. The gender tag is added to the CCML pronoun. The
program assumes the next time that specific pronoun appears that
the gender information stands and doesn't need to ask again. If the
pronoun does not indicate if one or several people or objects are
reflected in the pronoun ("you" in English can mean one or many
people) a pop-up window appears to ask the user to clarify. The
plurality tag is added to the CCML pronoun.
[0147] Unknown words. If the word was unknown to the Catome, and
the user chose to use the original word, the Editor does not know
about any of the properties the word possesses. A pop-up window
appears that invites the user to:
[0148] Identify the part of speech (verb, noun, adjective etc.)
represented by the unknown word.
[0149] Identify if the word is singular, plural and/or possessive
if applicable
[0150] Identify the gender of the word (masculine, feminine or
neutral)
[0151] Identify if the word is a proper noun (always written
starting with a capital letter--like Marek, or Brandon)
[0152] If the word is a verb, identify which tense the word-as-verb
reflects.
[0153] The relevant properties are added as an attribute group to
the unknown word.
[0154] Linguistic Step 4 (Specific language anomalies
processing)
[0155] Nouns used as adjectives in English. It is assumed that when
it finds a string of separate, contiguous nouns, the last noun is
considered to be the "real" noun and those preceding to be nouns
acting as adjectives. The user is asked to confirm this if the
detailed "verification" option was selected at the beginning of the
session. (Other languages may have different anomalies and
different processing needs).
[0156] Linguistic Step 5
[0157] The CCML is now scanned as a complete sentence to see if it
matches one of the Sentence Structures in the Sentence Structure
database within the Catome. The Editor identifies the main verb or
main verb phrase within the CCML text and uses that to narrow its
search through the database. Once identified, the Sentence
Structure entry contains a coded description of the grammatical
components and their positioning within that sentence type. With
this coding system, the Editor can understand which CCML word(s) in
the input constitute the subject, the object, the indirect object,
subordinate clauses and other grammatical components in the
sentence. The CCML sentence is tagged with attributes to group the
CCML words accordingly.
[0158] If the software cannot find the verb phrase and resulting
sentence structure, it uses the closest matches in the Sentence
Structure database. The user is shown a pop-up window with the
original sentence reformed under each closest sentence structure,
complete with any important missing words (clause openers such as
"that" or "which") and any critical punctuation such as commas to
mark the end of a clause. The user is invited to select one of
these sentences or to go back and retype the sentence with some
helpful hints shown.
[0159] Linguistic Step 6
[0160] The completed sentence is now displayed in a new pop-up
window with each grammatical component color-coded. A key to color
coding is displayed indicating which component is colored as the
subject, verb phrase, object, indirect object etc. The main and
sub-ordinate clauses are also color-coded. Actually, this is very
simple to do as the software has matched a specific sentence
structure to the original sentence and this grammatical exercise is
very simple to perform. Clicking on any of the words in this window
will display what part of speech the word represents, be it
adjective, adverb, conjunction, noun, verb etc.). The invention may
thus provide an impressive grammar learning tool.
[0161] The translation of the sentence into CCML is now
complete.
[0162] Reader Description
[0163] Introduction
[0164] For every language supported by the preferred embodiment,
there is a unique language reader product. These may be distributed
free of charge over the Internet to anyone who wants to download
them. The Reader is activated as a browser plug-in by clicking on
an Icon on any web page or by activating the Reader from a Windows
pull-down menu on an E-mail system. The Reader can translate any
text created using the Editor on a web page or e-mail flawlessly,
automatically and perfectly into the language of the Reader and
display it back on the screen.
[0165] Functional Step 1
[0166] When the Reader is first installed, the User is asked if
they want to keep the Catome in compressed or dynamic form.
Compressed mode means a disk requirement of 5 Mb, wheras dynamic
mode can take upto 30 Mb. The trade-off is speed--the dynamic
version will have to be created every time the Reader is
subsequently used if the compressed version is stored.
[0167] Functional Step 2
[0168] The user is asked which regional language version they would
like to see their translations. For the English language the
choices would be:
[0169] UK English
[0170] US English
[0171] Canadian English
[0172] Australian English
[0173] Similarly for French, the Reader offers:
[0174] Paris French
[0175] Quebec French
[0176] Belgian French
[0177] Swiss French
[0178] The user can always change this regional language setting
through a pop-up menu at any time.
[0179] Functional Step 3--E-mail
[0180] The user receives an e-mail and a pop-up menu appears
informing the user that the e-mail is translatable using the
Reader. The Reader e-mail add-on module has detected that the
incoming e-mail message has a CCML component. The user is asked to
select one of the following:
[0181] Want to read the message in its original language?
[0182] Want to read the message in the language of their reader?
(If the user has several language readers, the pop-up window would
show them all)
[0183] If the user selects to use a language reader, the user is
asked in which regional language version they would like to see the
translation. The CCML component is moved to the input buffer of the
Reader and automatically translated in Steps 5-10. The resulting
translation is shown on the screen as if it was the original
message.
[0184] Once read, the user is asked if they want to save the e-mail
message as:
[0185] Translation only.
[0186] Translation and Original Message.
[0187] In either case, the CCML component is always saved so that
the message can be translated again in another language or at
another time in the same language.
[0188] The Reader function for e-mail is completed.
[0189] Functional Step 4--Web Page
[0190] The user surfs the Worldwide Web and finds a web page which
has the Icon identifying CCML files displayed. If the user clicks
on that icon, a pop-up menu appears informing the user that this
web page is readable using the CCML Reader. The Reader browser
plug-in module has detected that the current web page has an
invisible CCML component. The user is asked to select one of the
following:
[0191] To read the page in its original language
[0192] To read the page in the language of their Reader. (If the
user has several language readers, the pop-up window would show
them all)
[0193] If the user selects to use a language reader, the user is
asked in which regional language version they would like to see the
translation. The CCML component is moved to the input buffer of the
Reader and automatically translated in Steps 5-10. The resulting
translation is shown on the screen as if it was the original page.
All HTML tags for the original language are respected and retained:
this means that the translated text on the web page is formatted in
exactly the same manner as that of the original language.
[0194] The Reader function for web page translation is
completed.
[0195] Functional Step 5
[0196] The CCML to be translated is contained in the input buffer
of the Reader module.
[0197] The Reader takes each sentence in turn, translates it and
places the result in the output buffer. The processes are described
in Steps 6-10 following. Once all sentences have been translated,
the contents of the output buffer are moved to the display message
function of the e-mail software, or to the stored copy of the web
page in the cache of the browser. The e-mail software is triggered
to display the message, or the browser's "refresh" function is
activated to redisplay the web page.
[0198] If the CCML came from a web page, the Reader removes any
formatting HTML surrounding each individual CCML and replaces it
around each translated sentence within the output buffer. This will
allow the text of the translation to be formatted in an identical
manner to that of the original language.
[0199] Functional Step 6
[0200] The Sentence Structure Identifier is located and looked up
in the Catome. An equivalent structure is passed back from the
Catome that contains information as to the sequence of the
grammatical components in which the CCML should be translated and
ordered in the translation output. The CCML components are
according moved around to match this specification.
[0201] Functional Step 7
[0202] Each CCML component in the sentence is scanned for an Idiom
Identifier. If found, this is passed to the Catome which returns an
equivalent idiom in CCML and replaces the Idiom phrase in the
original CCML. If no expression is returned, the appended "Idiom
Meaning" in the original CCML is used instead to replace the idiom
phrase.
[0203] Functional Step 8
[0204] Clause by clause, the CCML is read for each "Meaning
Identifier". These are passed in turn to the Catome which returns
the word to be used along with its "Word Identifier" value.
[0205] Functional Step 9
[0206] The words are altered to "correspond", either in verb tense,
gender, subject or object matching, singular and plural. This is
done by taking the two-digit "meaning qualifiers" from the original
CCML components plus any gender or other attributes associated with
that CCML item in the original CCML. An update to the two-digit
qualifier to the "Word Identifier" is created, appended and the
resulting "Word Identifier" is passed to the Catome. The Catome
returns the word to be used, correctly modified.
[0207] Functional Step 10
[0208] If the "contraction" attribute is present for the sentence,
or the language uses formal contractions, any applicable
contractions for the words or word combinations are obtained from
the Catome and substituted for the uncontracted text. If
translating in French for example, the program translates the
English "I love" as "Je aime". As French does use formal
contraction, this step produces the correct answer of "J'aime" When
this is done, the whole sentence is passed to the output buffer of
the Reader. The translation is complete.
[0209] Although the invention has been described hereinabove in
detail with reference to a specific preferred embodiment, it is to
be understood that the description of the preferred embodiment is
not intended to limit the scope of the present invention.
* * * * *