U.S. patent application number 11/325583 was filed with the patent office on 2006-07-06 for sentence displaying method, information processing system, and program product.
Invention is credited to Kazuo Aoki, Miwa Kaneko.
Application Number | 20060149557 11/325583 |
Document ID | / |
Family ID | 36641769 |
Filed Date | 2006-07-06 |
United States Patent
Application |
20060149557 |
Kind Code |
A1 |
Kaneko; Miwa ; et
al. |
July 6, 2006 |
Sentence displaying method, information processing system, and
program product
Abstract
A method of displaying a sentence described in a first language
using an information processor includes the steps of
input-reception for receiving an input of the sentence described in
the first language; separation of separating the input sentence
into each constituent word; a determination of determining whether
the constituent word is a predetermined specific word; and a
display of display the constituent word in a second language in
response to the determination that the constituent word is the
predetermined specific word.
Inventors: |
Kaneko; Miwa; (Yokohama-shi,
JP) ; Aoki; Kazuo; (Yokohama-shi, JP) |
Correspondence
Address: |
IBM CORPORATION
3039 CORNWALLIS RD.
DEPT. T81 / B503, PO BOX 12195
REASEARCH TRIANGLE PARK
NC
27709
US
|
Family ID: |
36641769 |
Appl. No.: |
11/325583 |
Filed: |
January 4, 2006 |
Current U.S.
Class: |
704/277 |
Current CPC
Class: |
G06F 40/232 20200101;
G06F 40/242 20200101 |
Class at
Publication: |
704/277 |
International
Class: |
G10L 11/00 20060101
G10L011/00 |
Foreign Application Data
Date |
Code |
Application Number |
Jan 4, 2005 |
JP |
2005-207 |
Claims
1. A computer implementable method of displaying a sentence, the
method comprising: receiving the sentence in a first language;
separating the sentence into a plurality of constituent words;
comparing each of the constituent words in the sentence to a list
comprising a plurality of predetermined specific words; and
displaying, in a second language, each constituent word having a
corresponding predetermined specific word in the list.
2. The method according to claim 1, wherein the list comprises a
plurality of mistakable words.
3. The method according to claim 1, further comprising displaying a
proposed correction of words corresponding to one of the displayed
constituent words.
4. The method according to claim 1, further comprising editing a
constituent word displayed in the second language.
5. The method according to claim 4, further comprising receiving an
input from a user to edit a constituent word.
6. The method according to claim 1, wherein the step of separating
the sentence into a pluraltiy of constituent words comprises a
morphological analysis to apply a word class attribute indicating a
word class of the word, an unknown word attribute indicating that
the word is an unknown word, or a stop word attribute indicating
that the word is excluded from the words to be processed as the
specific word.
7. The method according to claim 2, wherein the step of comparing
includes determining whether the word is mistakable for the words
or the word groups listed in a mistakable word dictionary that
classifies words on the basis of mistakability.
8. A computer-usable medium embodying computer program code, the
computer program code comprising computer executable instructions
configured to: receive a sentence in a first language; separate the
sentence into a plurality of constituent words; compare each of the
constituent words in the sentence to a list comprising a plurality
of predetermined specific words; and display, in a second language,
each constituent word having a corresponding predetermined specific
word in the list.
9. The computer-usable medium of claim 8, wherein the list
comprises a plurality of mistakable words.
10. The computer-usable medium of claim 8, wherein the embodied
computer program code further comprises computer executable
instructions configured to display a proposed correction of words
corresponding to one of the displayed constituent words.
11. The computer-usable medium of claim 8, wherein the embodied
computer program code further comprises computer executable
instructions configured to edit a constituent word displayed in the
second language.
12. The computer-usable medium of claim 11, wherein the embodied
computer program code further comprises computer executable
instructions configured to receive an input from a user to edit a
constituent word.
13. The computer-usable medium of claim 8, wherein the computer
executable instructions configured to separate the sentence into a
pluraltiy of constituent words comprises a morphological analysis
to apply a word class attribute indicating a word class of the
word, an unknown word attribute indicating that the word is an
unknown word, or a stop word attribute indicating that the word is
excluded from the words to be processed as the specific word.
14. The computer-usable medium of claim 9, wherein the computer
executable instructions configured to compare includes determining
whether the word is mistakable for the words or the word groups
listed in a mistakable word dictionary that classifies words on the
basis of mistakability thereof.
15. An information processor for displaying a sentence described in
a first language comprising: an input unit for receiving an input
of the sentence described in the first language; a word separation
unit for separating the input sentence into constituent words; a
determination unit for determining whether each of the constituent
words is a predetermined specific word; and a display unit for
displaying the constituent word in a second language in response to
the determination that the constituent word is the predetermined
specific word.
16. The information processor according to claim 15, wherein the
specific word is a mistakable word among words or word groups used
in the first language.
17. The information processor according to claim 15, wherein the
display unit displays a proposed correction of words corresponding
to the constituent word.
18. The information processor according to claim 15, further
comprising an editing unit for displaying a word in the first
language or the second language, the word being associated with the
constituent word displayed in the second language.
19. The information processor according to claim 18, wherein the
editing unit receives an input from a user to edit the constituent
word.
Description
BACKGROUND OF THE INVENTION
[0001] The present invention relates to a method of displaying a
sentence described in other than a native language of a user using
the sentence, as well as an information processor, a program, and
an information processing system to perform the method.
[0002] Conventionally, there is a known method of supporting
writing and reading sentences (hereinafter, referred to as
"foreign-language sentences" as appropriate) in a non-native
language of a user using a translation program by a computer. For
example, in the program for checking the spelling of the words in
the foreign-language sentences input by the user, it determines
whether the spellings of input words are correct by checking them
against a dictionary of the foreign language, and notifies the user
of the misspelling if it is present.
[0003] With such spell-check programs, it has become possible to
notify the user of the mistakes as to the spelling. Moreover, there
is a known method of detecting the misspelling in the sentences and
displaying the correct word for the misspelled word (e.g., Patent
Document 1). According to this method, it is possible to detect the
misspelling and display a proposed correction of words with high
accuracy to correct the misspelling.
[0004] Japanese Unexamined Patent Publication (Kokai) No.
2003-223437
SUMMARY OF THE INVENTION
[0005] Even if the spell-check is performed for the respective
words in the sentences as described above, however, cautions can
not be given to the user as to the incorrect usage of the words
(misusage of the words). In other words, the spell-check method can
not detect the incorrect usage of the word when it is mistaken for
a similar word as to the form or the pronunciation while the
sentence bears no incorrect spelling.
[0006] For example, when the user writes a sentence "The register
on the planar should be changed.", it will exhibit no problem
because all the words in the sentence are correctly spelled.
However, when the user intended to input the word "resistor (chip
resistor)" instead of "register (record)", it results in the
sentence being written with the incorrect word not intended by the
user. It is therefore desirable to provide a method that allows the
user to find such mistakes intuitively to correct them when the
words themselves are misused while they are correctly spelled.
[0007] Meanwhile, upon reading the sentences as well, the
mistakable words may be mistranslated while continuing reading. It
is thus desirable to provide a method that allows the user to find
such reading mistakes intuitively to correct them.
[0008] It is an object of the present invention to provide a
method, an apparatus and a system for displaying foreign-language
sentences, providing a sentence-writing support method and a
correction method, an information processor, and an information
processing system that allow the user more readily to find the
misusage of the words. It is another object of the present
invention to provide a sentence-reading support method, an
information processor, and an information processing system for
supporting the user to read the foreign-language sentences, with
displaying concurrent translation of the mistakable words on, for
example, foreign-language emails and websites for the user.
[0009] Therefore, according to one aspect of the present invention,
the present inventor provides a method of displaying a sentence
described in a first language using an information processor,
including the steps of receiving an input of the sentence described
in the first language, separating the input sentence into
constituent words, determining whether one of the constituent words
is a predetermined specific word, and displaying the constituent
word in a second language in response to the determination that the
constituent word is the predetermined specific word.
[0010] More specifically, there is provided the method wherein the
specific word is a mistakable word among the words or word groups
used in the first language.
[0011] According to the present invention, when the sentence is
displayed in the first language, the word or the word group among
the constituent words of the sentence determined to be mistakable
in the first language is displayed in the second language. Thus,
without determining the mistakable word among the constituent words
of the sentence described in the first language, the mistakable
word is displayed in the second language.
[0012] Thus, according to the present invention, it is possible to
allow the user to recognize more readily the word or the word group
being misused when the user is writing the foreign-language
sentences, by separating the sentences into words, determining the
word or the word group that the user tends to misuse among the
separated words or word groups, and displaying the determined word
in the user's native language. Additionally, there is provided the
sentence-reading support method of supporting the user to read the
foreign-language sentences, by separating the sentences into words,
determining the word or the word group that the user tends to
misuse among the separated words or word groups, and displaying the
determined word in the user's native language.
[0013] According to the present invention, when the sentence is
displayed in the first language, the word or the word group
determined to be the specific word among the constituent words of
the sentence is displayed in the second language. Thus, without
determining the specific word among the constituent words of the
sentence described in the first language, the specific word is
displayed in the second language. As a result, the user browsing
documents in the first language can view specific words displayed
in the second language without performing a specific operation.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] FIG. 1 is a diagram illustrating a hardware configuration of
an information processor 1;
[0015] FIG. 2 is a schematic diagram of a second dictionary memory
section 25 according to an embodiment of the present invention;
[0016] FIG. 3 is a diagram illustrating a record format of a
mistakable word dictionary according to the embodiment of the
present invention;
[0017] FIG. 4 is a flowchart illustrating operations executed by
the information processor 1 according to the embodiment of the
present invention;
[0018] FIG. 5 is a flowchart illustrating operations executed in a
morphological analysis;
[0019] FIG. 6 is a graph illustrating ratios of a word determined
to be mistakable when words contain identical spellings;
[0020] FIG. 7 is a flowchart illustrating operations to determine
whether the word is mistakable;
[0021] FIG. 8 is a screen image that appears on a display unit
showing sentences in a first language and translations of the words
determined to be mistakable; and
[0022] FIG. 9 is a diagram illustrating a hardware configuration of
an information processing system 100.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0023] Hereinafter, preferred embodiments of the present invention
will be described based on the drawings.
[0024] FIG. 1 shows a hardware configuration of an information
processor 1. The information processor 1 is provided with an input
unit 12 to receive an input of a sentence in a first language by a
user, a display unit 11 to display the sentence in the input first
language or a translation thereof in a second language, a control
unit 10 to perform recognition of a word in the input sentence in
the first language or a dictionary search, and a memory unit 13 to
store a word dictionary or other dictionaries. The information
processor 1 may be an ordinary computer, a compact personal
terminal (e.g., a PDA) or a mobile phone.
[0025] Here, the first language denotes the language other than the
user's native language, which may be a foreign language. The second
language denotes the user's native language or second native
language. Moreover, a specific word denotes a word or a word group
of the first language requiring being displayed in the second
language as well, which may be, for example, a commonly mistakable
word (or word group) in writing or reading the sentences in the
first language.
[0026] The input unit 12 receives the input of the sentence in the
first language by the user and sends input information to the
control unit 10 or the memory unit 13. The input unit 12 may be,
for example a keyboard, a mouse, a voice input system (e.g., a
microphone), or the like. The display unit 11 displays the input
foreign-language sentence or an operation result by the control
unit 10. It may be, for example, a computer monitor which includes
a liquid crystal display monitor.
[0027] The control unit 10 controls the information in the
information processor 1. The control unit 10 may be a conventional
central processing unit (CPU), or may be provided with a buffer
section 23, which temporarily stores data, information or flags,
and an editing section 27. The buffer section 23 is, for example, a
cache or a RAM in the CPU. The buffer section 23 may be provided in
the memory unit 13 instead of the control unit 10. The buffer
section 23 may store the word or the word group itself to be
determined, or the information related to attributes of the word or
the word group (such as word class information of the target word
or word group, stop word information, or unknown word information:
hereinafter, referred to as the "attribute information"). Here, the
unknown word information denotes the information related to
generally unfamiliar words (unknown words). In other words, the
unknown word information denotes the information of the words which
are not listed in ordinary dictionaries or the like. Moreover, the
stop word information denotes the information related to the
attributes of the words not to be processed (e.g., the word or the
word group not to be displayed in the second language). The buffer
section 23 may also store the word or the word group determined to
be mistakable in the second language (translation).
[0028] The control unit 10 may include a word separation section 20
to separate the words in the sentence input by the user in the
first language, a determination section 22 to determine whether
each of the words or the word groups are a specific word or word
groups, and the editing section 27 to accept editing by the user of
the word determined to be the specific word in the sentence
displayed in the first language. Moreover, the word separation
section 20 may include an attribute management section 21 and the
buffer section 23. The attribute management section 21 may store
the attribute information of the separated words in the buffer
section 23 together with the word in the first language and the
word in the second language (translation).
[0029] The word separation section 20 separates the words and the
word groups in the sentence in the first language into constituent
words using a word boundary, e.g., a space, a comma, or a colon, as
a marker. The constituent word herein may be either the single word
or the word group consisting of a plurality of words. Moreover, the
word separation section 20 may separate the words in the
foreign-language sentence to apply attributes based on the words
listed in a word dictionary 30.
[0030] The determination section 22 determines whether the input
constituent word is a specific word (mistakable word) or not. In
the determination, the determination section 22 refers to a
mistakable word dictionary 32 stored in the memory unit 13 and
determines the word or the word group to be mistakable when it is
stored in the mistakable word dictionary 32.
[0031] The memory unit 13 stores data, dictionaries,
foreign-language sentences, or translations, used in the
information processor 1. The memory unit 13 may be, for example a
hard disk, a CD-ROM, a DVD-ROM or the like. The memory unit 13
stores the dictionaries which contain a large amount of data
related to words, and it may be provided with a first dictionary
memory section 24, a second dictionary memory section 25, and a
frequent word dictionary memory section 26. The first dictionary
memory section 24 stores the word dictionary 30 and a word group
dictionary 31. The word dictionary 30 is the data containing the
words in the first language and the words in the second language
corresponding thereto (translation), as well as the word classes of
the words. The word group dictionary 31 stores data containing the
word groups, i.e., idioms or compound words (e.g.,
"trick-or-treat"), and the translations corresponding thereto, as
well as the word classes of the word groups.
[0032] The second dictionary memory section 25 includes a
mistakable word dictionary 32. The mistakable word dictionary 32 is
configured so as to use a record format in which the mistakable
word and the translation thereof in the second language are
registered as a set of words (see FIG. 3). The record format of the
mistakable word dictionary may be composed of an entry word (the
constituent word shown in the first language) with the translation
thereof (the word shown in the second language corresponding to the
constituent word in the first language), a classification code, and
a similar word (the word determined to be similar to the
constituent word in the first language based on, for example
later-described rules) with the translation thereof (the word shown
in the second language corresponding to the similar word). Here,
the classification code denotes the information associated with the
constituent word, e.g., which of the later-described rules the word
corresponds to.
[0033] The mistakable word dictionary 32 may include a spelling
similarity dictionary 36 that classifies words as mistakable based
on whether there is any other word or word group similar in
spelling, may include a pronunciation similarity dictionary 37 that
classifies words as mistakable based on whether there is any other
word or word group similar in pronunciation, or may include a user
definition dictionary 38 containing the mistakable words registered
by the user. The user definition dictionary 38 may contain the
mistakable words and the translations thereof in the form of a set
of words, or separately (i.e., the entry word, the translation
thereof, and the classification code only; not as a set of words)
(see FIG. 2).
[0034] FIG. 4 is a flowchart illustrating information processing
executed by the information processor 1 according to the embodiment
of the present invention. First, the input of the sentence by the
user in the first language is received by the input unit 12 (Step
S01). The input may be received via a dedicated application
executing the information processing of the present invention, or
via a general-purpose application software for generating
documents, so that the application software executing the
information processing of the present invention operates
subordinately against the input foreign-language sentence.
[0035] The sentence input may be performed, for example, by
receiving the input of the foreign-language sentence from the
server and displaying the input. The operation will be described
below referring to FIG. 8.
[0036] Moreover, Step S02 may start by receiving the input of
translation confirmation by the user (e.g., clicking on an icon)
after the input of a series of sentences in the first language.
[0037] The control unit 10 executes a morphological analysis of the
input sentence in the first language (Step S02). The morphological
analysis denotes separating the input sentence in the first
language into words and applying the word class, the attribute, a
stop word attribute, an unknown word attribute or the like to the
respective words. A frequent word may be registered as a stop
word.
[0038] The determination section 22 determines whether the word is
the specific word (mistakable word) by searching the mistakable
word dictionary, based on the morphological analysis information
related to the word and the respective dictionaries stored in the
memory unit 13 (Steps S03 and S04). The determination as to whether
the word is mistakable will be described later in a section
describing a mistakable word determination routine (FIG. 7). Next,
the determination section 22 determines whether the word is the
frequent word (Step S06). The frequent word denotes the word
frequently used for usual writing of the sentences in the first
language. That is, the user is not likely to misuse the word if it
is the frequent word, so that the word is determined not to be the
mistakable word. While the word extracted as a frequently-used word
may be registered in the frequent word dictionary 33, the proper
nouns, the words described in Katakana, or the basic words that are
taught at, for example, a foreign-language school may also be
registered in the frequent word dictionary 33. Alternately, the
frequent word may be extracted with a stop word attribute being
applied.
[0039] After the word is determined to be the frequent word in Step
S06, it is determined whether a subsequent word is a mistakable
word (Step S05) when the subsequent word (words) still remains in
the sentence in the first language (Step S08). When the word is
determined not to be the frequent word, the process proceeds to
Step S07. If the word is determined to be the mistakable word, it
is stored in for example the buffer section 23 with the word in the
second language (translation) as a candidate for the mistakable
word (Step S07). The mistakable word in the second language may be
displayed as the candidate for the mistakable word.
[0040] For example, the user may select whether to display any one
in the second language of or any combination of: 1) a non-frequent
word stored in the mistakable word dictionary 32; 2) the frequent
word stored in the mistakable word dictionary; and 3) the frequent
word not stored in the mistakable word dictionary. Additionally,
the user may change a threshold value (extraction ratio) of the
above-described similar word determined to be similar to the
constituent word in the first language based on the rules described
below or of the non-frequent word.
[0041] Moreover, since the mistakable word dictionary stores the
mistakable word together with the similar word in the record
format, the editing step may be provided to display the candidate
words for correction as a "proposed correction of words" associated
with the mistakable word. In other words, the user may select the
word among the proposed correction of words or input correction via
the editing section 27 by displaying the proposed correction of
words.
[0042] Furthermore, upon reception of the input by the user after
Step S08, the mistakable word displayed together with the
translation thereof may be substituted by a different word. That
is, when the user recognizes that the mistakable word is
misspelled, the user inputs the correct word. The mistakable word
may be corrected (substituted) upon reception of the input by the
user.
[0043] Referring now to FIG. 5, the morphological analysis
operation will now be explained. The word separation section 20
separates the sentence in the first language into words (Step S10).
The attribute (e.g., the word class, the stop word, or the unknown
word) is applied to the separated word (Step S11). It is determined
whether the word is found in the word dictionary 30 of the first
dictionary memory section 24 (Step S12). If the word is not found,
regular expression processing, normalization processing, or
compound word processing may be operated (Step S13). The
normalization processing may be the processing to search for the
word again in the word dictionary after excluding unnecessary
letters, a number, or a symbol if the word contains any of them.
The compound word processing may be the processing to search a
hyphenated word consisting of a plurality of words or an idiom as a
single word in the word dictionary instead of searching for the
individual words only. The regular expression processing denotes
the processing to make, for example, a URL (Uniform Resource
Locator) recognized as a single word. The process is repeated from
Step S11 until the end of the process for all the words in the
sentence in the first language (Step S14).
[0044] The following describes the determination of the mistakable
word by the information processor 1. The mistakable word dictionary
32 may store the "similar word" as to the spelling or the
pronunciation together with the translation thereof. That is, the
word is determined to be mistakable based on whether the similar
word is present. The dictionary may be customized by the user to
register the word which the user recognizes to be mistakable or to
delete the word. The record format for the mistakable word
dictionary may be hierarchically composed of the entry word: the
translation; the classification (; the similar word: the
translation), as described above referring to FIG. 3.
[0045] There are documents listing the words that are commonly
recognized to be mistakable. For example, "Common Errors in
English" by Paul Brians lists the mistakable words. Among 212 sets
of words in this document, the word pairs in which 50% or more of
the spellings are identical to each other account for 94.8% (201
pairs) (see Graph 50 in FIG. 6). The remaining 11 pairs are, for
example, accede/exceed, bare/bear, cite/sight, close/clothes,
council/consul, and counsel/consul; all of which exhibit the
pronunciation similarity. Thus, the words recognized to be
mistakable can be classified based on the similarity in the
spelling and the pronunciation.
[0046] The similarity in the spelling is determined by applying the
rules described hereinbelow. Here, it is provided that either or
both of the first and last letters of the respective words are
identical. The number of letters herein denotes the number of the
letters constituting the word (e.g., both "adapt" and "adopt"
consist of 5 letters each). Here, the "word pair" denotes "the word
and another word compared thereto" (e.g., "adapt" and "adopt"). The
concordance ratio is the value obtained by dividing the number of
identical letters by the number of letters of the longer word.
[0047] Rule 1: In the case of the words the same or different in
the number of letters, the number of different letters in the
identical positions is:
[0048] For the word pair of 2 to 3 letters: [0049] only 1 letter is
different
[0050] For the word pair of 4 to 5 letters: [0051] 2 letters or
less are different
[0052] For the word pair of 6 to 7 letters: [0053] 3 letters or
less are different
[0054] For the word pair of 8 to 9 letters: [0055] 4 letters or
less are different
[0056] For the word pair of more than or equal to 10 letters:
[0057] 5 letters or less are different
[0058] Example: adapt/adopt (4 letters are identical) (For the word
pair of same word length: count the identical letters in the
identical positions. For the word pair of different word length:
count the identical letters from the beginning of the word if the
first letter is identical, or count the identical letters from the
end of the word if the first letter is not identical and the last
letter is identical.)
[0059] Rule 2: In the case of the words the same or different in
the number of letters, the concordance ratio of letters in the
identical positions of the word pair is 50% or more (For the word
pair of same word length: count the identical letters in the
identical positions. For the word pair of different word length:
count the identical letters from beginning of the word if the first
letter is identical, or count the identical letters from the end of
the word if the first letter is not identical and the last letter
is identical).
[0060] Example: [0061] continual/continuous [0062] (7 letters are
identical; 7/10=70% of concordance ratio) [0063]
compliance/complaint [0064] (6 letters are identical; 6/10=60% of
concordance ratio) [0065] aural/oral [0066] (3 letters are
identical; 3/5=60% of concordance ratio)
[0067] Rule 3: In the case of the words the same or different in
the number of letters, the number of different letters in the
different or identical positions is:
[0068] For the word pair of 2 to 3 letters: [0069] only 1 letter is
different
[0070] For the word pair of 4 to 5 letters: [0071] 2 letters or
less are different
[0072] For the word pair of 6 to 7 letters: [0073] 3 letters or
less are different
[0074] For the word pair of 8 to 9 letters: [0075] 4 letters or
less are different
[0076] For the word pair of more than or equal to 10 letters:
[0077] 5 letters or less are different (For the word pair of same
word length: count the identical letters in the identical
positions. For the word pair of different word length: count the
identical letters from the beginning of the word the first letter
is identical, or count the identical letters from the end of the
word if the first letter is not identical and the last letter is
identical.)
[0078] Rule 4: In the case of the words the same or different in
the number of letters, the concordance ratio of letters in the
different or identical positions of the word pair is 50% or more
(For the word pair of same word length: count the identical letters
in the identical positions. For the word pair of different word
length: count the identical letters from the beginning of the word
if the first letter is identical, or count the identical letters
from the end of the word if the first letter is not identical and
the last letter is identical).
[0079] Example: [0080] bear/bare [0081] (4 letters are identical;
4/4=100% of concordance ratio) [0082] close/clothes [0083] (5
letters are identical; 5/7=71% of concordance ratio) [0084]
fiscal/physical [0085] (5 letters are identical; 5/8=63% of
concordance ratio)
[0086] Rule 5: In the case of the words the same or different in
the number of letters, the concordance ratio of letters in the
identical positions of the word pair is 80% or more, and the
numbers of letters are equal to or less than 5 while 2 letters from
the beginning of each word are identical (For the word pair of same
word length: count the identical letters in the identical
positions. For the word pair of different word length: count the
identical letters from the beginning of the word if the first
letter is identical, or count the identical letters from the end of
the word if the first letter is not identical and the last letter
is identical).
[0087] Next, the similarity in the pronunciation is determined by
applying the rules described hereinbelow. Here, it is provided that
either or both of the first and last syllables of the respective
words are identical. The number of syllables herein denotes the
number of the syllables constituting the word (e.g., both
cite/sight (sa'it/sa'it) consist of 4 syllables respectively).
Here, the "word pair" denotes "the word and another word compared
thereto" (e.g., "cite" and "sight"). The concordance ratio is the
value obtained by dividing the number of identical syllables by the
number of syllables of the word consisting of the greater number of
syllables.
[0088] Rule 6: In the case of the words the same or different in
the number of syllables, the number of different syllables in the
identical positions is:
[0089] For the word pair of 2 to 3 syllables: [0090] only 1
syllable is different
[0091] For the word pair of 4 to 5 syllables: [0092] 2 syllables or
less are different
[0093] For the word pair of 6 to 7 syllables: [0094] 3 syllables or
less are different
[0095] For the word pair of 8 to 9 syllables: [0096] 4 syllables or
less are different
[0097] For the word pair of more than or equal to 10 syllables:
[0098] 5 syllables or lessare different
[0099] Example: cite/sight (4 syllables are identical) (For the
word pair of same word length: count the identical syllables in the
identical positions. For the word pair of different word length:
count the identical syllables from the beginning of the word if the
first syllable is identical, or count the identical syllables from
the end of the word if the first letter is not identical and the
last syllable is identical.)
[0100] Rule 7: In the case of the words the same or different in
the number of syllables, the concordance ratio of syllables in the
identical positions of the word pair is 50% or more (For the word
pair of same word length: count the identical syllables in the
identical positions. For the word pair of different word length:
count the identical syllables from the beginning of the word if the
first syllable is identical, or count the identical syllables from
the end of the word if the first letter is not identical and the
last syllable is identical).
[0101] Example: [0102] cite/sight sa'it/sa'it (100% of concordance
ratio)
[0103] Rule 8: In the case of the words the same or different in
the number of syllables, the number of different syllables in the
different or identical positions is:
[0104] For the word pair of 2 to 3 syllables: [0105] only 1
syllable is different
[0106] For the word pair of 4 to 5 syllables: [0107] 2 syllables or
less are different
[0108] For the word pair of 6 to 7 syllables: [0109] 3 syllables or
less are different
[0110] For the word pair of 8 to 9 syllables: [0111] 4 syllables or
less are different
[0112] For the word pair of more than 10 or equal to syllables:
[0113] 5 syllables or less are different (For the word pair of same
word length: count the identical syllables in the identical
positions. For the word pair of different word length: count the
identical syllables from the beginning of the word if the first
syllable is identical, or count the identical syllables from the
end of the word if the first letter is not identical and the last
syllable is identical.)
[0114] Rule 9: In the case of the words the same or different in
the number of syllables, the concordance ratio of syllables in the
different or identical positions of the word pair is 50% or more
(For the word pair of same word length: count the identical
syllables in the identical positions. For the word pair of
different word length: count the identical syllables from the
beginning of the word if the first syllable is identical or count
the identical syllables from the end of the word if the first
letter is not identical and the last syllable is identical).
[0115] Rule 10: In the case of the words the same or different in
the number of syllables, the concordance ratio of syllables in the
identical positions of the word pair is 80% or more, and the
numbers of syllables are equal to or less than 5 while 2 syllables
from the beginning of each word are identical (For the word pair of
same word length: count the identical syllables in the identical
positions. For the word pair of different word length: count the
identical syllables from the beginning of the word if the first
syllable is identical, or count the identical syllables from the
end of the word if the first letter is not identical and the last
syllable is identical).
[0116] As the further rule, the word groups which are not
frequently used (e.g., idioms) may be determined to be the
mistakable words. These rules 1 to 10 may be applied within a
specific word class to determine whether the word is mistakable
after the word class is specified by, for example, the
morphological analysis.
[0117] FIG. 7 is a flowchart illustrating operations to determine
whether the word is mistakable. The target word is searched for in
the spelling similarity dictionary 36, the pronunciation similarity
dictionary 37, and the user definition dictionary 38 (Steps S20,
S22, and S25). The spelling similarity dictionary 36 and the
pronunciation similarity dictionary 37 store the information on
whether the word is mistakable on the basis of the foregoing rules
1 to 10. Based on the registered information, it is determined
whether the target word is the mistakable word. In other words, the
target word is registered in the spelling similarity dictionary 36
as the mistakable word if the word satisfies any of the rules 1 to
5 (Step S21), resulting in the word being determined to be
mistakable.
[0118] If the word is not registered in the spelling similarity
dictionary 36 as the mistakable word, then the pronunciation
similarity dictionary 37 is searched to see if the word is
registered therein (Step S22). The target word is registered in the
pronunciation similarity dictionary 37 as the mistakable word if
the word satisfies any of the rules 6 to 10, resulting in the word
being determined to be mistakable (Steps S24 and S23).
[0119] If the word is not registered in the pronunciation
similarity dictionary 37 as the mistakable word, then the word
group dictionary 31 is searched to see if the word is registered
therein (Step S27). The target word group is registered in the word
group dictionary 31 as the mistakable word if the word group is,
for example, a non-frequent word group, resulting in the word group
being determined to be mistakable (Step S23). The word group may be
an idiom such as "call for" or a compound word such as
"trick-or-treat". The compound word may be processed as a single
word, instead of being recognized as the word group.
[0120] If the target word group is not registered in the word group
dictionary 31 as a mistakable word, the word group is determined to
be a normal word (Step S29) and the process ends.
[0121] Instead of the process of searching the word group
dictionary 31 on the word-by-word basis, as shown in FIG. 7, the
word group may be processed after all the words in the sentence in
the first language are searched for in, for example the spelling
similarity dictionary 36 and the pronunciation similarity
dictionary 37.
[0122] FIG. 8 is an example of a display image showing the input
sentences in the first language and the translations of the words
in the sentences in the first language determined to be mistakable.
Such a screen image is displayed in the display unit 11 of the
information processor 1. As shown in FIG. 8, the translation of the
word determined to be mistakable may be displayed associated with
the sentence (in the first language) input by the user.
[0123] In the present invention, while the translations of the
words, such as "compliance" and "supervise", in the sentences in
the first language shown in FIG. 8 are displayed, the translations
of the words, such as "If", "have" and "System", which the user is
not likely to misuse are not displayed. Thus, the user can prevent
misusing the words by checking only the translations of the
mistakable words.
[0124] As an alternative embodiment of the present invention, an
information processing system 100 may comprise a client terminal
101, a server 103, and a communication network 102 connecting the
client terminal 101 and the server 103 to achieve the object of the
present invention.
[0125] More specifically, the client terminal 101 may be a computer
which receives the input of the sentence in the first language by
the user and displaying the input result, provided with the display
unit 11 and the input unit 12 of the information processor 1
described above. That is, the input sentence in the first language
by the user is inputted from a client input unit of the client
terminal 101 into the server 103 via the communication network 102.
The server 103 is provided with the control unit 10 and the memory
unit 13 of the information processor 1 described above to perform
the morphological analysis or the determination of the mistakable
words for the respective words in the input sentence in the first
language, so that the translation of the mistakable word may be
sent to the client terminal 101 and displayed in the display unit
of the client terminal 101.
[0126] Moreover, the server 103 may be provided with the memory
unit 13, as well as a server transmission section to send the
translation of the mistakable word to the client terminal 101. In
other words, the server transmission section may send the data of
the word determined to be mistakable by the determination section
22 and the translation associated with each other to the client
terminal 101. Furthermore, the first dictionary memory section 24,
the second dictionary memory section 25, and the frequent word
dictionary memory section 26 are stored in a plurality of servers,
respectively. The communication network 102 may be the Internet,
while a plurality of client terminals 101 may be provided.
[0127] The information processor, a sentence displaying method, and
a sentence processing system practicing the foregoing embodiments
can be realized by a program executed by the computer or the
server. A memory medium for the program includes an optical memory
medium, a tape medium, and a semiconductor memory. The memory
device such as the hard disk or a RAM provided in a server system
connected to a dedicated communication network or the Internet may
be used as the memory medium to provide the program via the
network.
[0128] While the embodiments of the present invention have been
described, it is intended to only illustrate the particular
examples without specifically limiting the scope of the present
invention. The advantages of the present invention are not limited
to the advantages described in the embodiments of the present
invention, which are shown only as the most suitable advantages
derived from the present invention.
[0129] The first language in the present invention to write the
sentence (foreign-language sentence) is not limited to a specific
language. The present invention may be realized without depending
on the specific language as long as the user is writing the
sentence in a language other than the native language. Moreover,
the specific word in the present invention is not limited to the
mistakable word in using the first language, while the specific
word may include the word requiring to be displayed in the second
language as well when using the first language.
* * * * *