U.S. patent application number 14/820472 was filed with the patent office on 2016-12-01 for mining parallel data from user profiles.
The applicant listed for this patent is Linkedln Corporation. Invention is credited to Kin Kan, Bing Zhao.
Application Number | 20160350289 14/820472 |
Document ID | / |
Family ID | 57397088 |
Filed Date | 2016-12-01 |
United States Patent
Application |
20160350289 |
Kind Code |
A1 |
Zhao; Bing ; et al. |
December 1, 2016 |
MINING PARALLEL DATA FROM USER PROFILES
Abstract
Techniques for identifying multilingual cognates and using the
multilingual cognates are provided. In one technique, multilingual
cognates identified from multiple user profiles are used to train
one or more translation models. In another technique, multilingual
cognates identified from a single user's profile are used to
translate text provided by that user. In another technique,
multilingual cognates from a single user are used to align
sentences in one language to sentences in another language and the
aligned sentences are used to train a language model. In another
technique, multilingual cognates identified from multiple user
profiles are used to expand search queries. In another technique,
multilingual cognates identified from multiple user profiles are
used to translate other users' profiles into a target language so
that users associated with a source language are viewing the other
users' profiles.
Inventors: |
Zhao; Bing; (Sunnyvale,
CA) ; Kan; Kin; (Sunnyvale, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Linkedln Corporation |
Mountian View |
CA |
US |
|
|
Family ID: |
57397088 |
Appl. No.: |
14/820472 |
Filed: |
August 6, 2015 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
62169471 |
Jun 1, 2015 |
|
|
|
62169469 |
Jun 1, 2015 |
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 16/3344 20190101;
G06F 40/263 20200101; G06F 40/284 20200101; G06F 40/45 20200101;
G06F 16/3337 20190101 |
International
Class: |
G06F 17/28 20060101
G06F017/28; G06F 17/30 20060101 G06F017/30 |
Claims
1. A method comprising: identifying one or more multilingual
cognates provided by a user; wherein a particular multilingual
cognate of the one or more multilingual cognates comprises first
text in a first language and second text in a second language that
is different than the first language, wherein the particular
multilingual cognate associates the first text with the second
text; using the particular multilingual cognate to identify (1) a
first text portion, in a first body of text that is in the first
language and that is provided by the user and (2) a second text
portion in a second body of text that is in the second language and
that is provided by the user; performing a comparison of the first
text portion with the second text portion; based on the comparison,
determining whether to train a translation model based on the first
text portion and the second text portion, wherein the translation
model is used for machine translation of text in the first language
to translated text in the second language; wherein the method is
performed by one or more computing devices.
2. The method of claim 1, wherein: performing the comparison
comprises: aligning the first text portion with the second text
portion; using a second translation model to determine whether one
or more tokens in the first text portion are associated with one or
more tokens in the second text portion; determining whether to
train comprises determining whether to train the translation model
based on determining whether the one or more tokens in the first
text portion are associated with the one or more tokens in the
second text portion.
3. The method of claim 1, wherein: the first body of text is one of
a work summary of the user or a job summary of a job position that
the user has had; the second body of text is one of the work
summary or the job summary.
4. The method of claim 1, wherein the user is a first user that is
different than a second user, the method further comprising:
identifying one or more second multilingual cognates provided by
the second user; wherein a second cognate of the one or more second
multilingual cognates comprises third text in the first language
and fourth text in the second language, wherein the second cognate
associates the third text with the fourth text; using the second
cognate to identify (1) a third text portion in a third body of
text that is in the first language and that is provided by the
second user and (2) a fourth text portion in a fourth body of text
that is in the second language and that is provided by the second
user; performing a second comparison of the third text portion with
the fourth text portion; based on the second comparison,
determining whether to train the translation model based on the
third text portion and the fourth text portion.
5. The method of claim 4, further comprising, prior to performing
the comparison: identifying the one or more multilingual cognates
from a first user profile that was provided by the first user and
that is maintained by a social network service with which the first
user registered; identifying the one or more second multilingual
cognates from a second user profile that was provided by the second
user and that is maintained by the social network service with
which the second user registered.
6. The method of claim 1, wherein a cognate of the one or more
multilingual cognates corresponds to one of a person name, an
employer name, a job title, an academic degree, an academic
institution name, a field of study, or a skill.
7. The method of claim 1, wherein: performing the comparison
comprises determining that a text portion in the first text portion
does not align with any text portion in the second text portion;
the text portion is not used to train the translation model.
8. The method of claim 1, wherein each of the first text portion
and the second text portion is a sentence.
9. A method comprising: storing a set of multilingual cognates,
wherein each cognate in the set of multilingual cognates is
identified from a user profile in a plurality of user profiles;
wherein each cognate in the set of multilingual cognates comprises
a first text in a first language and a second text in a second
language that is different than the first language, wherein one of
the plurality of user profiles associates the first text with the
second text; receiving one or more search criteria that is in the
first language; in response to receiving the one or more search
criteria; generating, based on the set of multilingual cognates and
the one or more search criteria, translated search criteria that is
in the second language; performing a search based on the translated
search criteria; wherein the method is performed by one or more
computing devices.
10. The method of claim 9, wherein a cognate in the set of
multilingual cognates corresponds to a person's name, an employer
name, a job title, an academic degree, an academic institution
name, a field of study, or a skill.
11. The method of claim 10, wherein: the cognate is a first
cognate; the set of multilingual cognates includes a second cognate
that is different than the first cognate; the second cognate
corresponds to different one of the person's name, the employer
name, the job title, the academic degree, the academic institution
name, the field of study, or the skill.
12. The method of claim 9, wherein: performing the search comprises
searching a plurality of job postings based on the translated
search criteria; the method further comprising generating one or
more search results, each of which includes information about a
different job of the plurality of job postings.
13. The method of claim 12, wherein a search result of the one or
more search results includes data that was translated from the
second language into the first language.
14. The method of claim 9, wherein: the set of multilingual
cognates includes cognates associated with different sets of
languages; receiving the one or more search criteria comprises
receiving language data that indicates one or more languages that
includes the second language; generating the translated search
criteria comprises generating the translated search criteria based
on a strict subset of the set of multilingual cognates, wherein the
strict subset is associated with the one or more languages.
15. The method of claim 9, wherein: the set of multilingual
cognates includes cognates associated with different types of
information; receiving the one or more search criteria comprises
receiving type data that indicates a type of information of the one
or more search criteria; generating the translated search criteria
comprises generating the translated search criteria based on a
strict subset of the set of multilingual cognates, wherein the
strict subset is associated with the type of information.
16. The method of claim 15, wherein receiving the type data
comprises receiving, from a user, input that includes the type
data.
17. The method of claim 9, wherein: storing the set of multilingual
cognates comprises storing a translation model that is trained
based on the set of multilingual cognates; generating the
translated search criteria comprises using the translation model
and the one or more search criteria to generate the translated
search criteria.
18. The method of claim 17, wherein: storing the translation model
comprises storing a plurality of translation models; each
translation model of the plurality of translation models is
associated with a different type of information; receiving the one
or more search criteria comprises receiving type data that
indicates a type of information of the one or more search criteria;
the method further comprising selecting the translation model based
on the type data.
19. The method of claim 18, wherein: the type data indicates a
first type of information of a first search criterion in the one or
more search criteria and a second type of information of a second
search criterion in the one or more search criteria; selecting the
translation model comprises selecting a first translation model
based on the first type of information and selecting a second
translation model based on the second type of information.
20. The method of claim 19, wherein: the first type of information
is one in a set consisting of job title, employer name, academic
degree, job skill, academic institution name, an industry name, and
field of study; the second type of information is another one in
the set.
21. The method of claim 9, wherein: the one or more search criteria
includes a text portion; the translated search criteria includes
(1) a first translated text portion that is based on the text
portion and (2) a second translated text portion that is based on
the text portion; the first translated text portion and the second
translated text portion are in the second language and are
semantically similar.
Description
PRIORITY CLAIM AND CROSS-REFERENCE TO A RELATED APPLICATION
[0001] This application claims the benefit of U.S. Provisional
Application Nos. 62/169,469 and 62/169,471, filed Jun. 1, 2015, the
entire contents of each of which are hereby incorporated by
reference as if fully set forth herein, under 35 U.S.C.
.sctn.119(e).
[0002] This application is related to U.S. patent application Ser.
No. ______ [Attorney Docket No. 60352-0127], filed the same day
herewith, the entire contents of which are hereby incorporated by
reference as if fully set forth herein.
TECHNICAL FIELD
[0003] The present disclosure relates to analyzing profile data
and, more specifically, utilizing profile data for machine
translation.
BACKGROUND
[0004] Machine translation is a sub-field of computational
linguistics that investigates the use of software to translate text
or speech from one language to another. On a basic level, machine
translation performs simple substitution of words in one language
for words in another language, but that alone usually cannot
produce a good translation of text because recognition of whole
phrases and their closest counterparts in the target language is
needed.
[0005] Current machine translation techniques are relatively
ineffective at producing accurate and reliable translations. One
problem is that the training corpus upon which a machine translator
is based is extremely limited, even if the training corpus is
relatively large in size. The variation in how ideas and concepts
may be expressed in a given language is so great that it is nearly
impossible for a machine translator to generate accurate
translations, unless the phrases that need translating appear
word-for-word in the training corpus. For example, many
English-to-Chinese machine translators of free form text have a sub
20% accuracy rate.
[0006] The approaches described in this section are approaches that
could be pursued, but not necessarily approaches that have been
previously conceived or pursued. Therefore, unless otherwise
indicated, it should not be assumed that any of the approaches
described in this section qualify as prior art merely by virtue of
their inclusion in this section.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] In the drawings:
[0008] FIG. 1 is a block diagram that depicts a system for
identifying multilingual cognates in a data set, in an
embodiment;
[0009] FIG. 2 is a flow diagram that depicts a process for
identifying cognates from multiple user profiles, in an
embodiment;
[0010] FIG. 3 is a flow diagram that depicts a process for
translating user-provided text, in an embodiment;
[0011] FIG. 4 is a flow diagram that depicts a process for aligning
multilingual bodies of text, in an embodiment;
[0012] FIG. 5 is a flow diagram that depicts a process for
expanding a query search in the language domain, in an
embodiment;
[0013] FIG. 6 is a flow diagram that depicts a process for
presenting a translated user profile of one user to another user,
in an embodiment;
[0014] FIG. 7 is a flow diagram that depicts a process for
presenting, to a user associated with a first language, content
items (or translated versions thereof) that are in a language that
is different than the first language, in an embodiment;
[0015] FIG. 8 is a block diagram that illustrates a computer system
upon which an embodiment of the invention may be implemented.
DETAILED DESCRIPTION
[0016] In the following description, for the purposes of
explanation, numerous specific details are set forth in order to
provide a thorough understanding of the present invention. It will
be apparent, however, that the present invention may be practiced
without these specific details. In other instances, well-known
structures and devices are shown in block diagram form in order to
avoid unnecessarily obscuring the present invention.
General Overview
[0017] A method and system are provided for identifying
multi-lingual cognates in user profile data. A multi-lingual
cognate (hereinafter "cognate") is a pair of words or phrases that
are provided by a user and that mean (or intended to mean) the same
thing. For example, a user provides a job title in English and a
job title in Chinese to allow English readers and Chinese readers
to know at least some information about the user. In one technique,
cognates of a particular user are used to translate a summary
provided by the particular user from one language to another
language. In another technique, cognates of a particular user are
used to match sentences (provided by the particular user) in
different languages. The matched sentences are used to train a
language model used for machine translation.
[0018] In a related technique, multiple user profiles, each
provided by a different user, are mined to identify multiple
cognates. Each cognate may correspond to the same type of
information, such as job title. Alternatively, a group of cognates
correspond to one type of information (e.g., academic degree) and
another group of cognates correspond to another type of information
(e.g., name of current employer). Cognates may be used in one of
multiple ways. In another technique, cognates are used to assist
users when performing a search.
System Overview
[0019] FIG. 1 is a block diagram that depicts a system 100 for
identifying cognates in a data set, in an embodiment. System 100
includes client 110, network 120, server system 130, and storage
140.
[0020] Client 110 is an application or computing device that is
configured to communicate with server system 130 over network 120.
Examples of computing devices include a laptop computer, a tablet
computer, a smartphone, a desktop computer, and a Personal Digital
Assistant (PDA). An example of an application includes a dedicated
application that is installed and executed on a local computing
device and that is configured to communicate with server 130 over
network 120. Another example of an application is a web application
that is downloaded from server system 130 and that executes within
a web browser executing on a computing device. Client 110 may be
implemented in hardware, software, or a combination of hardware and
software. Although only a single client 110l is depicted, system
100 may include multiple clients that interact with server system
130 over network 120.
[0021] Through client 110, a user is able to provide input that
includes multilingual information about the user. Later, the user
may interact with server 130 to retrieve, supplement, and/or update
the multilingual information.
[0022] Network 120 may be implemented on any medium or mechanism
that provides for the exchange of data between client 110 and
server system 130. Examples of network 120 include, without
limitation, a network such as a Local Area Network (LAN), Wide Area
Network (WAN), Ethernet or the Internet, or one or more
terrestrial, satellite or wireless links. Storage 140 stores a data
set 142 from which cognates are identified. Storage 140 may
comprise persistent storage and/or volatile storage. Storage 140
may comprise multiple storage devices. Also, although depicted
separately from server system 130, storage 140 may be part of
server system 130 or may be accessed by server system 130 over a
local network, a wide area network, or the Internet.
Server System
[0023] As depicted in FIG. 1, server system 130 includes a cognate
identifier 132 that identifies one or more cognates found in data
set 142 that is stored in storage 140.
[0024] In an embodiment, data set 142 comprises a single user
profile. Alternatively, data set 142 comprises multiple user
profiles, each provided by a different user. In this embodiment,
server system 130 maintains accounts for multiple users. Server
system 130 may provide a web service, such as a social networking
service. Examples of social networking service include Facebook,
LinkedIn, and Google+. Although depicted as a single element,
server system 130 may comprise multiple computing elements and
devices, connected in a local network or distributed regionally or
globally across many networks, such as the Internet. Thus, server
system 130 may comprise multiple computing elements other than
cognate identifier 132.
[0025] A user's profile may include a first name, last name, an
email address, residence information, a mailing address, a phone
number, one or more educational institutions attended, one or more
current and/or previous employers, one or more current and/or
previous job titles, a list of skills, a list of endorsements,
and/or names or identities of friends, contacts, connections of the
user, and derived data that is based on actions that the candidate
has taken. Examples of such actions include jobs to which the user
has applied, views of job postings, views of company pages, private
messages between the user and other users in the user's social
network, and public messages that the user posted and that are
visible to users outside of the user's social network.
[0026] Some data within a user's profile (e.g., work history) may
be provided by the user while other data within the user's profile
(e.g., skills and endorsement) may be provided by a third party,
such as a "friend" or connection of the user or a colleague of the
user.
[0027] Before data set 142 is analyzed, server system 130 may
prompt users to provide profile information in one of a number of
ways. For example, server system 130 may have provided a web page
with a text field for one or more of the above-referenced types of
information. In response to receiving profile information from a
user's device, server system 130 stores the information in an
account that is associated with the user and that is associated
with credential data that is used to authenticate the user to
server system 130 when the user attempts to log into server system
130 at a later time. Each text string provided by a user may be
stored in association with the field into which the text string was
entered. For example, if a user enters "Sales Manager" in a job
title field, then "Sales Manager" is stored in association with
type data that indicates that "Sales Manager" is a job title. As
another example, if a user enters "Java programming" in a skills
field, then "Java programming" is stored in association with type
data that indicates that "Java programming" is a skill.
[0028] In an embodiment, server system 130 stores access data in
association with a user's account. Access data indicates which
users, groups, or devices can access or view the user's profile or
portions thereof. For example, first access data for a user's
profile indicates that only the user's connections can view the
user's personal interests, second access data indicates that
confirmed recruiters can view the user's work history, and third
access data indicates that anyone can view the user's endorsements
and skills.
[0029] In an embodiment, some information in a user profile is
determined automatically by server 130 (or another automatic
process). For example, a user specifies, in his/her profile, a name
of the user's employer. Server system 130 determines, based on the
name, where the employer and/or user is located. If the employer
has multiple offices, then a location of the user may be inferred
based on an IP address associated with the user when the user
registered with a social network service (e.g., provided by server
system 130) and/or when the user last logged onto the social
network service.
Determining Multiple Languages Associated With A Single Profile
[0030] A user profile comprises multiple (e.g., text) fields.
Example fields include job title, employer name, field of study,
name of an academic institution attended, name of a degree awarded
by the academic institution, an industry name (e.g., "Finance" or
"Law"), and headline (or moniker) possibly indicating a
characteristic of the user (e.g., "Code Ninja"). This set of
information is referred to herein as "top-card information." Each
field is associated with a name for the type of information that is
provided to that field. For example, a job title field is
associated with the name "job title." Thus, after user enters a job
title in the job title field and submits that and other top-card
information, server system 130 stores the entered job title with a
field label or other metadata, such as "job title."
[0031] In an embodiment, a user profile comprises multiple
language-specific profiles, where each language-specific profile
comprises the same (or similar) top-card information, but in a
language that is different than each other language-specific
profile in the same user profile. For example, after a user fills
out top-card information in one language (which server system 130
may not know at the time the top-card information is provided by
the user), the user provides input (e.g., selecting a particular
graphical button) that indicates that the user desires to enter the
top-card information in another language. Server system 130 stores
each set of top-card information as a language-specific profile of
the user's profile.
[0032] Server system 130 may determine a language of each
language-specific profile in one of multiple ways. For example, a
user that provided the profile information manually specifies the
language of each language-specific profile. Such input may be
specified during registration with server system 130 or sometime
after registration, such as after a successful login attempt.
Specifying a language may comprise the user selecting a language
from a list of languages in a drop-down menu that is displayed
adjacent to user-provided profile information that is in one
language. Server system 130 then stores, in association with the
language-specific profile, data that identifies the user-selected
language.
[0033] Additionally or alternatively, server system 130
automatically determines the language of a language-specific
profile (i.e., without user input that identifies the language).
For example, for each of one or more fields of a language-specific
profile (e.g., a job title or employer name), server system 130
compares user-provided text of that field with a text corpus of
known languages (e.g., English, Spanish, French, German, and
Chinese). If the user-provided text matches text from a text corpus
of a particular language, then server system 130 stores, in
association with the language-specific profile, language
identification data that identifies the particular language. One
example of a text corpus is a set of user profiles where the
languages are known or have already been determined, whether
determined automatically or determined based on user input that
specifies the respective languages.
Identifying Cognates
[0034] In an embodiment, cognate identifier 132 identifies a field
in a user's language-specific profile and identifies the
corresponding field in another of the user's language-specific
profiles. A set of data items associated with the corresponding
fields (e.g., job title fields) is considered a cognate. One data
item of a cognate is in one language (e.g., English) and another
data item of the cognate is in another language (e.g., Chinese).
Each data item comprises one or more terms, words, or phrases, or
one or more characters in a character-based language, such as
Chinese.
[0035] In an embodiment, cognate identifier 132 identifies a set of
cognates from a user's profile and stores the set of cognates in
association with the user or with the user's account. The set of
cognates of the user may be used to translate text provided by the
user and/or to train (along with sets of cognates from profiles of
other users) a machine translator.
[0036] In an embodiment, cognate identifier 132 identifies one or
more cognates from a user's profile and updates a cognate storage
that includes cognates identified from multiple users' profiles.
Cognates in the cognate storage may be stored in any number of
ways, including relationally or object relationally in a database,
as values in a key-value store, or as a list of data item pairs in
a file system. Cognates may be indexed for faster retrieval.
[0037] If a cognate from a first user does not already exist in the
cognate storage, then the cognate is added to the cognate storage.
If the cognate is already represented in the cognate storage, then
metadata of the cognate may be updated to increment a count that
indicates a number of times that the cognate appears in the cognate
storage (or in user profiles). As a specific example, if "Software
Engineer" and "la ingeniero de programas" are data items in the job
title field of a user's profile and that pair of data items has
been seen 20 times previously in other user profiles, then a count
associated with that cognate is incremented to become 21.
[0038] As another example, if two data items are already
represented in a cognate, then metadata of the cognate may be
updated to modify a probability associated with each data item.
Each data item may be associated with a probability that, given
that data item, the other data item of the cognate will appear.
Thus, two probabilities may be stored in association with a single
cognate. As a specific example, a probability of seeing "la
ingeniero de programas" given "Software Engineer" (e.g., P(t|s))
may be updated based on this pair of data items being identified in
a user's profile. At the same time, a probability of seeing
"Software Engineer" given "la ingeniero de programas" (e.g.,
P(s|t)) may be updated based on this pair of data items being
identified in the user's profile.
[0039] If a probability of one cognate is updated in response to
analysis of a user profile, then the probability of another cognate
may be updated. For example, one cognate may be "Software Engineer"
and "la ingeniero de programas" and another cognate may be
"Programmer" and "la ingeniero de programas." If the pair of
strings "Software Engineer" and "la ingeniero de programas" is
identified in a user profile, then a probability associated with
the cognate "Software Engineer" and "la ingeniero de programas" may
increase and a probability associated with the cognate "Programmer"
and "la ingeniero de programas" may decrease.
[0040] Cognate storage may be filtered to remove one or more
cognates that are not frequent or not likely to be good
translations of each other. For example, if a cognate appears less
than ten times in the cognate storage, then the cognate is removed
from the cognate storage. As another example, if P(t|s) and/or
P(s|t) of a cognate is below a particular threshold (e.g., 20%),
then the cognate is removed from the cognate storage. After the
filtering step, the cognate storage may be used in one or more ways
described herein, such as being used as training data, used to
translate queries, used to translate user profiles, and used to
identifying content items for a user.
[0041] FIG. 2 is a flow diagram that depicts a process 200 for
identifying cognates from multiple user profiles, in an
embodiment.
[0042] At block 210, a user profile is identified in a database of
user profiles. Each user profile include structured information
that associates user-provided data with a context or information
type. For example, when a user creates a profile, each input field
is associated with a context, such as job title, field of study, or
skill. Some contexts may be unique in that no two input fields are
associated with the same context, such as current job title. Other
contexts may not be unique, such as academic degree and skill,
since a user may have earned multiple academic degrees and may have
acquired multiple skills. When the user-provided data is stored,
each data item is stored in associated with a context.
[0043] At block 220, it is determined whether the user profile
includes multilingual information. If so, process 200 proceeds to
block 230; otherwise, process 200 proceeds to block 250.
[0044] At block 230, one or more multilingual cognates are
identified in the user profile.
[0045] At block 240, for each identified cognate, a set of
previously-identified cognates is updated. How the set is updated
depends on whether an identified cognate is new or not. If the
identified cognate is new, then that cognate is added to the set.
If the identified cognate already exists in the set, then metadata
associated with that cognate in the set is updated to reflect the
occurrence of the cognate in the database of user profiles. The
metadata may be a count or a probability, as described
previously.
[0046] At block 250, it is determined whether there are more user
profiles in the database need to be analyzed. If so, process 200
proceeds to block 210. Otherwise, process 200 ends.
Using Cognates as Training Data for Machine Translation
[0047] In an embodiment, a set of cognates (and associated
metadata, such as count information and probability information)
identified based on multiple user profiles are used to train a
translation model. Any technique for training a translation model
may be used. Machine translation is a sub-field of computational
linguistics that investigates the use of software to translate text
or speech from one language to another. On a basic level, machine
translation performs simple substitution of words in one language
for words in another language, but that alone usually cannot
produce a good translation of text because recognition of whole
phrases and their closest counterparts in the target language is
needed.
[0048] Statistical machine translation, a sub-field of machine
translation, generates translations using statistical methods based
on a bilingual (or multilingual) text corpus. In statistical
machine translation, a document is translated according to the
probability distribution p(e|f) that a string e in a target
language (e.g., English) is the translation of a string f in the
source language (e.g., French). The problem of modeling the
probability distribution p(e|f) may be approached in a number of
ways. In one approach, Bayes Theorem is applied, that is p(e|f) oc
p(f|e)p(e), where the translation model p(f|e) is the probability
that the source string is the translation of the target string, and
the language model p(e) is the probability of seeing that target
language string. This decomposition is advantageous as it splits
the problem into two subproblems. Finding the best translation
{tilde over (e)} is done by picking up the one that gives the
highest probability:
e ~ = argmax e .di-elect cons. e * p ( e f ) = argmax e .di-elect
cons. e * p ( f e ) p ( e ) ##EQU00001##
[0049] For a rigorous implementation of this formula, an exhaustive
search would have to be performed by going through all strings ts:
in the native language. A machine translation decoder that uses the
foreign string, heuristics, and other methods to limit the search
space may be used to perform the search efficiently and at the same
time keep acceptable quality. If the language model p(e) is limited
to a certain domain (or type of information), such as user profiles
and/or job information, then the size of e* may not be substantial
enough to slow down the translation process.
[0050] In an embodiment, server system 130 supports multiple
translation models, each associated with a different pair of
languages. For example, one translation model is for translating
English to Spanish and vice versa, another translation model is for
translating English to Chinese and vice versa, and another
translation model is for translation French to German and vice
versa.
[0051] In a related embodiment, server system 130 supports multiple
translation models, each associated with a different information
type. For example, one translation model is for translating job
titles and another translation model is for translating employer
names. Each of the translation models may be associated with the
same pair of languages (e.g., English and Chinese). Thus, if the
context of information type of source text is known, then the
appropriate language model is identified. Additionally or
alternatively, a single translation model may be based on multiple
information types.
[0052] In an embodiment, a set of cognates are used to perform
dictionary lookups during the translation process. To be included
in the set of cognates, a cognate may have to satisfy one or more
criteria. For example, only high frequent cognates are used in
dictionary lookups. A high frequent cognate is one that appears in
user profiles a threshold number of times, such as five hundred.
Additionally or alternatively, only high confident cognates are
used in dictionary lookups. In one approach, a high confident
cognate is determined based on co-occurrence counts using
chi-squared statistics based on a 2.times.2 contingency table,
which measures how much the observed counts deviate from expected
counts assuming the occurrence of the two cognate items is
statistically independent. Another approach to determining a high
confident cognate involves selecting thresholds on a log-linear
combined score of both p(e|f) and p(f|e) (where e and f are cognate
pair candidates): score=w1.times.log p(f|e)+w2.times.log p(e|f),
where w1 and w2 are parameters chosen to maximize the accuracy.
[0053] In an embodiments, sets of bilingual cognates (or
translation models) associated with different sets of languages are
used to translate between a source language and a target language
for which no cognate exists or for which current cognates have low
count or confidence. For example, one set of cognates (or one
translation model) may be available for English-Spanish translation
and another set of cognates (or separate translation model) may be
available for English-Chinese translation. However, only a
relatively few low confidence cognates (or translation model(s))
exist for Spanish-Chinese translation. Nevertheless,
English-Spanish cognates (or English-Spanish translation model) and
English-Chinese cognates (or English-Chinese translation model) may
be used to translate Spanish text into Chinese text by first
translating Spanish text into English text (using, for example, the
English-Spanish cognates) and the English text into Chinese text
(using, for example, the English-Chinese text).
Acronyms, Abbreviations, and Misspellings
[0054] Because user profiles is based on user input and different
users have different preferences and human error is not uncommon,
some user profiles may contain acronyms, abbreviations, and/or
misspellings. For example, a user may type in "Assistent Professor"
in a job title field of the user's profile, instead of "Assistant
Professor." The user also provides a corresponding set of Chinese
characters.
[0055] In an embodiment, a translation model translates misspelled
words, acronyms (e.g., "IT" from "information technology"), and
abbreviations (e.g., "prof" from "professor") in source text to a
target language. Such a translation is possible if such misspelled
words, acronyms, and abbreviations are sufficiently observed in the
training corpuses comprising multiple user profiles. If E and E'
both occurred with Chinese character sequence C, then E and E' will
be connected with a probability of
p ( E E ' ) = C p ( E C ) p ( C E ' ) , ##EQU00002##
where p(E|C) and p(C|E') are relative frequencies learnt from data
mined using techniques described herein. Together with other
metrics, such as edit distance between E and E', spelling errors,
acronyms, and abbreviations may be identified. For example, because
the edit distance between "Assistent Professor" and "Assistant
Professor" is one (i.e., only one character needs to change to make
both text strings identical), it is more likely that one of the two
text strings is a misspelling of the other text string.
Using Cognates to Translate User-Provided Text
[0056] In an embodiment, multilingual cognates identified from a
user's profile are used to translate text provided by the user. The
text is in one of the two languages associated with the
multilingual cognates. For example, if multilingual cognates of a
user's profile are English-Spanish cognates, then the text provided
by the user may be in English or in Spanish. If in English, then
the cognates are used to translate the text into Spanish. If in
Spanish, then the cognates are used to translate the text into
English.
[0057] Example text provided by a user includes a summary of the
user. A summary may be in any format, such as a list of words or
phrases or a set of one or more fully structured and grammatically
correct sentences. A summary may be considered part of a user's
profile but is separate from the top-card information from which
the multilingual cognates are identified. Examples of summaries
include a profile summary and a job summary. A profile summary may
indicate what the user has accomplished and skills that the user
has acquired through the user's career in industry and/or academia.
A job summary may describe specific duties, responsibilities,
and/or accomplishments that the user has had at a particular job.
If a user had held multiple job positions, then the user may
provide multiple job summaries, one for each job position. Each of
multiple job positions may be with the same employer.
Alternatively, some job positions (and, thus, job summaries) may be
associated with different employers.
[0058] While many users provide multilingual top-card information,
only a subset of those users might provide multilingual summaries.
Thus, there may be many users that provide multilingual top-card
information but that provide a profile summary and/or one or more
job summaries in a single language. Because a profile or job
summary provided by a user is semantically related to the top-card
information provided by the user, the user is likely to use the
same words and phrases (e.g., job title, employer name, skills) in
the top-card information and in other portions of their respective
profiles. This aspect of user profile creation is leveraged to
assist in translating user-provided text from one language to
another.
[0059] FIG. 3 is a flow diagram that depicts a process 300 for
translating user-provided text, in an embodiment. Process 300 may
be performed by one or more computing elements of server system
130.
[0060] At block 310, a user that has provided a multilingual
top-card information is identified. Prior to process 300, server
system 130 may have analyzed multiple (e.g., all) users or members
of server system 130 to determine which users have provided
multiple language specific profiles. Thus, process 300 may involve
iterating through each of those users.
[0061] At block 320, it is determined whether the user has provided
text (e.g., a profile summary or a job summary) in a single
language. If so, then process 300 proceeds to block 330. The single
language is referred to as the "source language" and the language
that text in the source language is to be translated to is referred
to as the "target language."
[0062] Block 320 may involve determining a portion of the user's
profile that comprises free text and that corresponds to a summary.
Then, the language of that portion is determined using, for
example, one or more of the techniques described herein.
[0063] At block 330, a set of one or more cognates from the
multilingual top-card information is identified. In particular, the
data item of each cognate that corresponds to the language of the
text is identified. For example, if the language of the text is
English, then the data item (of each cognate) that is in English is
identified.
[0064] The user that provided the multilingual top-card information
is the same user who provided the text that is to be translated.
Alternatively, the set of cognates is based on multilingual
top-card information from multiple users, none of which may be the
same as the user who provided the text that is to be
translated.
[0065] At block 340, the text is analyzed to identify text portions
that match one or more data items in one or more cognates. For
example, if the text is in English, then one or more English
words/phrases in the cognates are identified in the text. A text
portion that matches a data item of a cognate is referred to as a
"pivot point." Thus, for example, a body of text may include
multiple pivot points, one for job title, another for employer
name, another for job skill, etc.
[0066] At block 350, the one or more cognates are used to translate
the identified text portions or pivot points. Given the previous
example where English words/phrases in one or more cognates are
identified in English text, those English words/phrases are
translated into Spanish based on the one or more cognates. For
example, an English job title is translated into a Spanish job
title by identifying the other data item of the cognate.
[0067] At block 360, one or more translation models are used to
translate the non-translated portions of the text. For example, if
the text is in English and some text strings were translated into
Spanish using the cognates in block 350, then block 360 involves
translating other English text into Spanish. Because there is a
high confidence in the cognates, the other portions of the
translated text should have a relatively high degree of accuracy,
at least relative to translations that do not involve the use of
cognates.
[0068] For example, a user provides the following text of a job
summary: "I have been a software engineer for seven years at XYZ
company" and there is no corresponding Spanish translation provided
by the user. Using cognates identified from multilingual top-card
information from the user, "software engineer" is translated into
"la ingeniero de programas" and "XYZ company" is translated into
"la compania XYZ." Then a translation model is used to translate "I
have been a" into "He habido un." The same or different translation
model is used to translate "for seven years at" into "para siete
anos a." The translation model(s) that are used to translate the
non-translated portions in block 360 may take into account the
results from the translation using the one or more cognates of
block 350. Thus, because it is known that the phrase "la ingeniero
de programas" is immediately preceded by "I have been a," a
translation model is more likely to perform a proper
translation.
[0069] In an embodiment, multiple translation models are used to
translation the same body of text, the same summary (e.g., profile
summary), or even the same sentence. Each translation model is
generated based on a different information (or context) type. For
example, one translation model is generated based on job title,
another translation model is generated based on job skill, and
another translation model is generated based on employer name.
Depending on the type of pivot point (e.g., job title, job skill,
or employer name), the corresponding translation model is used to
translate non-translated text adjacent to the pivot point, such as
text that immediately precedes the pivot point and/or text that
immediately succeeds the pivot point. Given the example above, a
job title translation model may be used to translate the text
portion "I have been a," which immediately precedes a specific job
title and an employer name translation model may be used to
translate the text potion "for seven years at," which immediately
precedes the specific employer name.
[0070] In an embodiment, translated text from a user's profile
(such as a profile summary or a job summary) is stored in
association with the user's profile. The user is then allowed to
view the translation and make any changes to the translation and
have the changes persisted with the profile. In an embodiment,
server system 130 generates notification data that is used to
notify a user that a translation of certain text of the user is
available. The notification data may be a flag, graphics, or text
that is displayed on a profile page (or another page) when the user
logs into to system server 130 or an online social network service
provided by system server 130. Additionally or alternatively, the
notification data may be sent to a device of the user through
email, instant message, text message, or a mobile application
associated with the user.
[0071] A user may provide input that allows a translation of a
profile (whether it be of top-card information and/or one or more
summaries) to be viewable by other users. Alternatively, other
users (such as connections or friends of a user) may be able to
view a translated profile or portion thereof without requiring
input from the user.
[0072] If a first user is allowed to view a translated version of a
second user's profile, then, in an embodiment, server system 130
determines a language associated with the first user and, based on
the language, determines which of multiple translations of the
second user's profile to display to the first user. In that way,
the first user, at the time the second user's profile is requested,
is not required to provide further input that indicates or
specifies a particular language. The determination may be based on
the language of the first user's profile or a user preference that
the first user previously set when interacting with server system
130. Alternatively, the first user is provided a selectable option
(e.g., a graphical button) that is displayed adjacent to the second
user's profile that is in one language, where the selectable option
indicates another language in which the profile may be viewed.
Aligning Multilingual Text to Generate Training Data
[0073] In some cases, a user provides not only a translation of
top-card information but also of other non-structured information,
such as free form text. An example of free form text is a personal
summary. For example, a user may provide a summary of his/her work
history in English and a corresponding summary of his/her work
history in Spanish. Thus, there may be a one-to-one correlation in
the sentences of the English summary and the Spanish summary.
Sentence Level Alignment
[0074] In an embodiment, multilingual cognates are used to identify
which text portions (e.g. sentences or phrases) in one language are
to be correlated with which text portions in the other language.
For example, there are N sentences in English and M sentences in
Spanish. The English portion of a job title cognate is used to
identify one or more English sentences. The Spanish portion of the
job title cognate is used to identify one or more Spanish
sentences. If an English portion of a cognate is found in a
particular English sentence and a Spanish portion of the same
cognate is found in a particular Spanish sentence, then the
sentences are correlated. Thereafter, with or without additional
analysis, the English sentence and the Spanish sentence may be
treated as translations of each other and used to train a
translation model.
[0075] In this example, the multilingual cognates that are used to
align text in one language with text in another language originate
from a profile of the user that provided the text.
[0076] A text portion (e.g., sentence or phrase) within a body of
text may be identified in one or more ways, such as punctuation
(e.g., periods, commas, spaces), carriage returns, newline
characters, and new paragraph characters. For example, two periods
in a summary may indicate a starting of a sentence and an ending of
the sentence. As another example, each bullet in a list of bulleted
textual portions is treated as a phrase or sentence.
[0077] In some situations, it is determined that one or more
sentences or phrases in one text portion (in a first language) are
not correlated with any sentence or phrase in another text portion
(in a second language). For example, if there are N English
sentences and M Spanish sentences, then one of the N English
sentence may not correlate with any of the M Spanish sentences.
Additionally, at the same time, one of the M Spanish sentences may
not correlate with any of the N English sentences. Thus, one or
more English sentences and/or one or more Spanish sentences may not
be used as training data for the translation model.
[0078] In some situations, it is determined that one
sentence/phrase in one language is correlated to multiple
sentences/phrases in another language. For example, an English job
title of a user is found in two sentences in the user's work
summary while only a single Chinese sentence includes a Chinese job
title of the user. In such situations, the multiple
sentences/phrases may be concatenated (e.g., with a space) and
treated as a single sentence or phrase.
[0079] Additionally or alternatively to using cognates to correlate
sentences or phrases from different languages, non-cognate
information is used to correlate sentences or phrases from
different languages. An example of non-cognate information is date
information (e.g., "1998-2013"), which is typically identical in
multiple languages, particularly year information. Another example
of non-cognate information includes proper names, such as employer
names or academic university names (e.g., "U.C. Berkeley"), which
are typically used in the native language and foreign languages.
Some users do not translate proper names into one or more target
languages; instead, they retain the proper names in the native
language. Thus, some users provide multilingual sentences where a
relatively small subset of a multilingual sentence is in one
language (e.g., English) and the remainder of the multilingual
sentence is in another language (e.g., Chinese).
[0080] In an embodiment, correlating text portions (e.g., sentences
or phrases) of different languages involves taking into account
locality of the respective text portions. For example, only the
first three sentences in a Chinese version of a user's work summary
are considered as candidates to be correlated with the first
sentence in an English version of the user's work summary.
Similarly, the first four sentences in the Chinese version of the
user's work summary are considered as candidates to be correlated
with the second sentence of the English version. This is a type of
greedy approach in order to limit the space of possible
alignments.
Word Level Alignment
[0081] Once two text portions are correlated based on one or more
multilingual cognates, a comparison of the two text portions is
performed to determine whether to train a translation model using
the two text portions. Comparing two text portions may involve
tokenizing each text portion to create multiple tokens of each text
portion, aligning tokens from one text portion with tokens from the
other text portion, and comparing aligned tokens. For example,
English sentence E1 is correlated with Chinese sentence C1. E1 is
tokenized into token string "E1w.sub.0 . . . E1w.sub.i" and C1 is
tokenized into token string "C1w.sub.0 . . . C1w.sub.j." Multiple
bilingual token pairs (e.g., E1w.sub.g.revreaction.C1w.sub.h) are
identified and individually checked to determine if each bilingual
token pair appears together in other instances.
[0082] For example, a generic translation model is used to
determine whether a bilingual token pair (which comprises a token
from one text portion and a token from the other text portion) has
appeared together in a training set. Each bilingual token pair may
be individually scored and multiple bilingual token pair scores may
be combined into a single score to determine whether the bilingual
pair of text portions should be used as training data. A bilingual
token pair score may be a binary 1 or 0 or may be on a continuum,
such as any value between 0 and 1. For example, a bilingual token
pair that has appeared together ten times according to a
translation model may have a token pair score of 0.4 while a
bilingual token pair that has appeared together fifty times
according to the translation model may have a token pair score of
0.98. A combined score for a bilingual pair of text portions may be
an average (or some other aggregate) of the corresponding multiple
token pair scores.
[0083] One factor in scoring a bilingual token pair may be the
relative distance each token in the token pair is away from the
beginning, end, or middle of the corresponding text portion (e.g.,
sentence). For example, all things being equal, token pair
E1w.sub.0.revreaction.C1w.sub.0 will have a higher token pair score
than token pair E1w.sub.0.revreaction.C1w.sub.1 since both
E1w.sub.0 and C1w.sub.0 appear at the beginning of their respective
text portions (e.g., sentences). As another example, all things
being equal, token pair E1w.sub.i.revreaction.C1w.sub.j will have a
higher token pair score than token pair
E1w.sub.i.revreaction.C1w.sub.j-2 since both E1w.sub.i and
C1w.sub.j appear at the end of their respective text portions
(e.g., sentences).
[0084] In a related embodiment, some possible bilingual token pairs
are not considered if the respective tokens do not appear within a
threshold distance of each other when the corresponding text
portions are aligned. For example, E1w.sub.0 and C1w.sub.j may not
be considered since E1w.sub.0 appears at the beginning of its
sentence and C1w.sub.j appears at the end of its sentence.
[0085] The higher the combined score of a bilingual pair of text
portions, the more likely the pair of text portions will be used to
train a translation model. For example, a threshold score may be
established (e.g., manually) such that if a combined score of a
bilingual pair is above the threshold score, then the bilingual
pair is used to train a translation model. The higher the threshold
score, the higher the quality (as training data) the bilingual
pairs should be. On the other hand, fewer bilingual pairs will be
used as training data. Combined scores of multiple bilingual pairs
may be analyzed manually to determine a threshold score such that
as many acceptable quality bilingual pairs as possible are used as
training data.
[0086] In an embodiment, sets of text provided by different users
are used to train the same translation model. For example, in
addition to a user profile, a first user provides, to a social
network service, a work summary in both English and Chinese. One or
more cognates identified from the user profile of the first user
are used to align the English version of the work summary with the
Chinese version of the work summary. The aligned phrases or
sentences are used to train a particular translation model. A
second user provides, to the social network service, a user profile
and a work summary in both English and Chinese. One or more
cognates identified from the second user's profile are used to
align the English version of the work summary with the Chinese
version of the work summary. The aligned phrases or sentences
provided by the second user are used to train the particular
translation model.
Example Process
[0087] FIG. 4 is a flow diagram that depicts a process 400 for
aligning multilingual bodies of text, in an embodiment. Process 400
may be implemented by one or more computing components of server
system 130.
[0088] At block 410, one or more multilingual cognates are
identified. The multilingual cognates may have been identified from
a profile of a user, where the profile includes top-card
information or any other type of structured information about the
user.
[0089] At block 420, the one or more multilingual cognates are used
to correlate, associate, or align at least a first text portion in
a first body of text with a second text portion in a second body of
text. Each text portion may be a sentence or phrase that comprises
multiple words or characters. The two bodies of text are provided
by the user and are in the languages of the one or more
multilingual cognates. For example, if the multilingual cognates
are in English and Chinese, then one of the bodies of text is in
English and the other body of text is in Chinese. As noted
previously, in an embodiment, the bodies of text are work summaries
or job summaries.
[0090] The initial iteration of block 420 may involve dividing each
body of text into sentences and/or phrases and identifying an
initial match of two text portions based on one or more
multilingual cognates.
[0091] At block 430, the first text portion is compared to the
second text portion. Block 430 may involve aligning tokens in the
first text portion with tokens and the second text portion and
generating one or more scores that reflect how well the tokens in
the respective text portions align with each other.
[0092] At block 440, based on the comparison, it is determined
whether the first text portion and the second text portion should
be used to train a translation model. If so, then process 400
proceeds to block 450 where the first text portion is stored in
association with the second text portion and that association
becomes part of training data for the translation model. Otherwise,
process 400 proceeds to block 460.
[0093] At block 460, it is determined whether there are any more
text portion pairs from the respective bodies of text to analyze.
For example, the first iteration of block 430 may have involved
STP1 ("source text portion 1") with TTP1 ("target text portion 1").
If block 440 was negative, then a second iteration of block 430 may
involve comparing STP1 with TTP2 or comparing STP2 with TTP 1. If
an iteration of block 440 is affirmative, then both text portions
that were involved in the corresponding comparison may be removed
as options for further comparison during performance of process
400. For example, if it was determined that STP1 and TTP1 should be
used as training data, then neither STP1 nor TTP1 is compared to
another text portion for the remainder of process 400. In this way,
the number of irrelevant or non-useful comparisons may be
reduced.
[0094] If the determination in block 460 was in the affirmative,
then process 400 proceeds to block 420 where a new set of text
portions is identified. Otherwise, process 400 ends.
Multilingual Query Expansion
[0095] In some scenarios, a user would like to submit a query to
identify certain information, not knowing that relevant information
may be available in another language. For example, a user may be
interested in changing job positions but a job posting for a
candidate job position is in a language that is different than the
language of the query submitted by the user. Without translating
the query, the user may never discover multiple job
opportunities.
[0096] In an embodiment, multilingual cognates are identified from
multiple user profiles and stored in a set of multilingual
cognates. Techniques for identifying multilingual cognates
described herein may be used. Because different users translate the
same word or phrase differently, there may be multiple cognates for
the same word or phrase. For example, one user may translate
"software engineer" into one phrase in Spanish (e.g., "la
ingenieria de programas") and another user may translate "software
engineer" into another phrase in Spanish. Similarly, one user may
translate "la ingenieria de programas" into "software engineer" and
another user may translate "la ingenieria de programas" into
"programmer."
[0097] A multilingual cognate may be associated with a count that
indicates a number of times that cognate appears in a data set of
user profiles. Additionally or alternatively, a multilingual
cognate may be associated with a probability that, given one
word/phrase of the cognate, the other word/phrase of the cognate
appears in the data set.
[0098] Each cognate may also be associated with a type label that
indicates an information type of the cognate, such as job title,
job skill, employer name, employer name, academic institution,
academic degree, geographical name, and job skills. The type label
may be used in query processing to limit the number of cognates to
consider (or identify a translation model) when translating a query
term/phrase. Some cognates may be associated with one information
type while other cognates may be associated with another
information type.
[0099] Subsequently, a user (who may be different from the user
associated with any of the user profiles from which the cognates
were identified) submits a query of one or more words. Each of the
one or more words may be associated with a specific context that
indicates an information type. A query engine that accepts the
query as input identifies one or more cognates that are associated
with one or more words. Each cognate may be associated with an
information type. If query text is associated with an information
type, then only a strict subset of all the cognates may need to be
considered to determine a translation of the query text.
[0100] For example, a user submits the query "sales representative"
with a job title information type selected. A query engine accepts
the query and the selected information type and identifies, among
cognates that are associated with the selected information type and
that have "sales representative" as one of the pairs, one or more
possible translations. In this way, the selected information type
is used to limit the number of cognates that need to be searched.
If multiple translations are identified, then each may be in the
same target language (e.g., Chinese). Alternatively, if multiple
translations are identified, then one or more may be in one target
language (e.g., Chinese) and one or more may be in another target
language (e.g., Spanish). Regardless of the number of translations
that are identified based on a query, each translation is used as
part of a search. For example, the search may involve comparing a
query translation to one or more job postings that are in the same
language as the query translation. Specifically, the comparison may
be comparing a query translation of a job title to job titles in
job postings that are in the same language as the query
translation. The job postings may be provided by employers to
server system 130. Additionally or alternatively, the job postings
may be retrieved (e.g., using a web crawling process) from sources
that are remote to server system 130 and stored at server system
130 (or at least accessible to server system 130).
[0101] In an embodiment, a query is translated into multiple
candidates, each in the same language. As part of a search based on
the query, each translation candidate is compared to searchable
items (or metadata thereof) to identify relevant search results.
For example, the query "la ingenieria de programas" is translated
into "software engineer" and "programmer." Each of these
translation candidates is compared to job postings in English.
Therefore, even though "software engineer" may not match a
particular job posting, "programmer" might. As a result, a
searchable item that may not have been identified as relevant for
one translation candidate may be identified as relevant for another
translation candidate.
[0102] Metadata of a searchable (content) item is separate from
content of the content item. Content of a content item is typically
displayed, played, or otherwise presented when the content item is
requested. Content may be text, graphics, audio, video, or any
combination thereof. Metadata is often not displayed with its
associated content, unless a user explicitly requests to view the
metadata, which is not available in some cases. Examples of
metadata vary depending on the type of content item. An example of
metadata for a blog article may include date of creation, author,
publisher, and one or more tags that describe content of the blog
article. An example of metadata for an audio file may include date
of creation and modification, band name, distributor, track name,
album name, length of audio (e.g., in time units).
[0103] In a related embodiment, instead of translating the query
into a target language, searchable items (e.g., job postings) are
translated into the language of the query. For example, at least a
portion of a German job posting is translated into English (using
one or more multilingual cognates) and an English query is compared
to the translated portion. The translated portion may have been
generated before receiving the query or in response to receiving
the query.
[0104] In an embodiment, instead of using cognates directly to
translate a query or a searchable item, a translation model that is
based on multilingual cognates identified from multiple user
profiles is used to translate a query from a source language to a
target language. The translation model is trained based on the
multilingual cognates and, optionally, other training data. Such a
translation model is useful if the query is "free form," such that
the query is a complete (or mostly complete) sentence with pronouns
and/or prepositions instead of just nouns that correspond to one or
more information types (e.g., job title, job skill, or academic
degree).
[0105] In this embodiment, if a translation model exists for each
information type and a query submitter indicates one or more
information types, then the translation model(s) that correspond to
the indicated information type(s) are identified and used to
perform a translation of one or more tokens (e.g., words or
characters in a character-based language, such as Chinese)
identified in a query. For example, if a user (1) selects option
"job title" and inputs text to be associated with that selected
option and (2) selects option "skill" and inputs text to be
associated with that selected option, then a search engine
identifies two translation models: one for job titles and one for
skills. The job title translation model is used to translate the
text associated with job title into a target language and the skill
translation model is used to translate the text associated with
skill into the target language.
[0106] In an embodiment, a search engine that expands queries in
the language domain supports translation among multiple distinct
sets of languages. For example, the search engine may support
English-Chinese translation, English-German translation, and
French-Russian translation.
[0107] In an embodiment, the user that submits a query provides
input that indicates one or more target languages. In this way, the
user can dictate into which languages a query is to be translated
or which searchable items will be translated into the language of
the query. In the former case, the query engine ensures that the
query is translated (at least once) into each indicated target
language. Also, in this way, there is a limit to the number of
translated query strings and the search corpus, which limit may
keep the search latency relatively low or acceptable to
end-users.
[0108] In an embodiment, instead of providing results of a search
in the same language as the one or more target language(s) (or the
language(s) of the search corpus), results of a search based on a
query are translated into the language of the query. For example,
if the query is in English and the search corpus is Chinese and
German (e.g., job postings in Chinese and job postings in German),
then the results of the query are in English. Therefore, if an
English version of a Chinese job posting is not available, then the
Chinese job posting is translated into English. The same or
different translation model that translated the English query into
Chinese may be used to translate the
[0109] FIG. 5 is a flow diagram that depicts a process 500 for
expanding a query search in the language domain, in an embodiment.
Process 500 may be implemented by one or more computing components
of server system 130.
[0110] At block 510, one or more search criteria of query in a
first language is received. The query may target one or more data
sources, such as a searchable database of job postings. The
searchable database may include data items in multiple
languages.
[0111] At block 520, the one or more search criteria are translated
into a second language based on multilingual cognates identified
from multiple user profiles of different users. The multilingual
cognates may be used directly to perform the translation or a
translation model that is trained based on the multilingual
cognates is used to perform the translation.
[0112] At block 530, the translated search criteria is used to
search one or more data sources. Block 530 may involve comparing
the translated search criteria to multiple searchable items or
metadata thereof.
[0113] At block 540, one or more search results are identified.
Each search result may include a link to or otherwise identify a
searchable item that satisfied the translated search criteria.
[0114] At block 550, the one or more search results are presented
to a user that submitted the query. A search result may be
displayed in its native or original language, i.e., the second
language. Alternatively, a translated version of at least a portion
of a content item (or metadata thereof) is presented to the user.
If there are multiple search results, each corresponding to a
different content item that was deemed relevant enough to present
to the user, and each in a different language, then a set of search
results that are presented to the user may all be in the first
language, i.e., the language associated with the user.
[0115] In a related embodiment, instead of translating the one or
more search criteria (as in block 520), portions of the searchable
items (or metadata thereof) are translated into the first language
and compared to the one or more (untranslated) search criteria.
Translating User Profiles
[0116] In online social networks, users are generally provided with
the capability to view each other's profiles. Sometimes, the amount
of a first user's profile data that a second user is able to view
or access depends on (a) whether the second user is connected to
the first user (either directly or indirectly through one or more
other users or groups), (b) a measure of strength of that
connection (if one exists), (c) how closely related the first user
is to the second user, or (d) whether the second user has performed
some action (e.g., paid a subscription) to be granted that access.
Regardless of how the second user is able to view the first user's
profile, the language of the first user's profile may be different
than one or more languages that are known or familiar to the second
user.
[0117] In an embodiment, a user's profile (or a portion thereof) is
automatically translated into one or more languages for a different
(viewing) user. A translation model that is trained based on
cognates identified from multiple user profiles (as described
herein) may be used to translate the user's profile. Alternatively,
a translation model that is trained based on a different set of
training data may be used to translate the user's profile.
[0118] In an embodiment, multiple translation models are used to
translate different portions of a user's profile. Each translation
model is associated with a different information type. For example,
one translation model that is trained based on cognates associated
with job titles is used to translate a job title listed in a user's
profile while another translation model that is trained based on
cognates associated with job skills is used to translate one or
more job skills listed in the user's profile.
[0119] When a user's profile is translated into another language
may vary from one approach to another. In a first approach, a
translated version of a user's profile is generated before a
determination is made to provide the translated version to another
user. For example, immediately after a user creates a profile or
otherwise updates his/her profile, server system 130 generates one
or more translated versions of the profile in one or more target
languages that are different than the source language (i.e., of the
profile). The target languages that are chosen may be selected
based on attributes of the friends or connections of the user. For
example, if most of a user's friends with foreign language profiles
(or foreign language preferences) are associated with a particular
language, then the particular language is selected as the target
language for translation.
[0120] As another example of the first approach, a first user
requests (e.g., via a HTTP GET request that includes a member ID) a
view of a second user's profile page. Server system 130 receives
the request and determines a language associated with the first
user. This determination may be made in multiple ways. In one way,
the first user may have set a language viewing preference (e.g., in
association with the first user's profile or in association with
the first user's current viewing session with server system 130).
Alternatively, the language is determined based on a language of
the first user's profile. Thus, if the first user's profile is in
Italian, then the determined language is Italian. In another way,
while a portion of the second user's profile is displayed to the
first user in a first language, the first user selects a language
from a drop-down list of one or more languages that are different
than the language of the second user's profile. Continuing with
this second example of the first approach, regardless of how the
language of the first user is determined, server system 130 then
identifies an already translated version of the second user's
profile in the determined language and causes the translated
version to be presented to the first user on a display of a
computing device of the first user (e.g., by sending the translated
version over network 120 to client 110).
[0121] In a second approach, the profile of a user is translated
after a determination is made to provide a translated version to
another user. For example, a first user requests a view of a second
user's profile page. The request may be an initial view of the
second user's profile. For example, the request may be submitted in
response to a submission of the second user's name in a search
field or in response to selection of a link (associated with the
second user) on a search results page that lists names of multiple
users. Alternatively, while the second user's profile is displayed
to the first user in a first language, the first user selects a
graphical user interface (GUI) element that is displayed adjacent
to the second user's profile and that corresponds to a second
language. The displayed option may be a graphical button that
specifies the second language or may be one item in a list of
options (e.g., in a drop-down menu) that is displayed adjacent to
the second user's profile. In a related example, the first user
provides voice input that indicates the second language.
[0122] Regardless of how the request is generated or how the
request originates, server system 130 receives the request and
determines a language associated with the first user. If the
request does not identify the language, then the language may be
determined as described previously (e.g., by determining a language
of the first user's profile or identifying a language preference
established by the first user). Server system 130 then identifies
and uses a translation model that corresponds to the determined
language and the language of the second user's profile to translate
the second user's profile into the determined language.
[0123] In an embodiment, a translated version of a user's profile
includes multiple data items that are in the same language and that
are based on a single data item in the user's profile. For example,
if a user's profile includes "la ingenieria de programas" as a job
title, then a translated version may include both "Software
Engineer" and "Programmer." The determination of whether to provide
multiple translated candidates for a particular profile data item
may be based on a confidence level or score associated with each
candidate of multiple candidate data items. If, for example, no
score is about a particular threshold (e.g., 0.7), then the top
(e.g., two) candidate data items are presented to the viewing user
(e.g., as long as the score of a candidate data item is above 0.3
in order to avoid preventing candidate data items with low
confidence). If no candidate data item associated with a particular
data item in a user's profile is associated with a score above a
minimum threshold, then no candidate data item may be displayed to
the viewing user.
[0124] In an embodiment, server system 130 provides options for one
or more individual data items within a user's profile to be
translated. For example, a first graphical user interface (GUI)
element (e.g., a button, link, or drop-down menu) may be displayed
adjacent to a user's job title and a second GUI element may be
displayed adjacent to the user's skills. In this way, a viewing
user may select which portions of the user's profile to view
corresponding translations. Again, the translations may be
generated in response to selection of one of the GUI elements or
may have been generated prior to selection of either of the
graphical elements.
[0125] Server system 130 may translate one type of data item of one
user's profile but not the same type of data of another user's
profile. For example, the job title of a first user's profile in
English is translated into three languages and the job title of a
second user's profile in English is not translated into any other
language. The decision to present one or more translations (or
options for viewing translations) may be based on a confidence
score related to the translations. A confidence score of a
translation may be based on a count associated with the original
(i.e., non-translated) data item (or a number of times the data
item has been seen). Additionally or alternatively, a confidence
score of a translation may be based on a number of times the
original data item and the translation appear as a cognate in a set
of user profiles. For example, a job title of one user's profile
may be so unique that a level of co-occurrence of the job title
with any translation of the job title is relatively low. In such
situations, an option to display a translated version of the job
title is not provided to a viewing user.
[0126] FIG. 6 is a flow diagram that depicts a process 600 for
presenting a translated user profile of one user to another user,
in an embodiment. Process 600 may be implemented by one or more
computing components of server system 130.
[0127] At block 610, a request to view a user profile of a first
user by a second user is received. The user profile is in a first
language. The user profile may be associated with language data
(e.g., metadata of the user profile) that indicates the first
language. Alternatively, the first language may be determined by
analyzing text within the user profile, as described previously.
The request from the second user may or may not indicate that a
translated version of the user profile is to be presented to the
second user. An example how a request may originate is the second
user selecting a user icon (or user name) that is associated with
the first user and that is displayed (a) in a messaging window
(e.g., an instant messaging (IM) window) or (b) adjacent to an
email message sent by the first user.
[0128] At block 620, a second language that is different than the
first language is determined. The second language may be specified
in the request or inferred based on a language preference of the
second user or based on one or more characteristics of the second
user's online behavior or own user profile.
[0129] At block 630, a version of the user profile in the second
language is identified. Block 630 may involve retrieving the
version from storage or may involve translating a copy of the user
profile into the second language.
[0130] At block 640, the translated version is presented to the
second user.
Providing Translated Content to a User
[0131] An online service (e.g., a social network service) that has
global membership may process digital content in many different
languages. However, a global online service may filter content to a
user based on the user's language preference, which may have been
explicitly provided by the user or may have been inferred by the
online service based on information about the user or behavior
associated with the user. Thus, a social network service may ensure
that only English content is presented to English users, only
Chinese content is presented to Chinese users, and so forth.
However, such an approach may prevent some potentially useful and
valuable content from reaching many users of the online
service.
[0132] In an embodiment, an online service determines, based on one
or more criteria, whether to provide a translated version of
content that is one language to a user that is associated with a
different language. A translatable content item comprises text and
one or more of graphics, audio, video, or any combination thereof.
Examples of types of content items include blog postings,
educational course listings, SlideShare presentations, news
articles provided by third parties, articles from users that are
considered (e.g., based on number of connections) influential by
the online service, comments on other user's content,
advertisements, and job postings.
[0133] An online service may use multiple rules or processes to
determine which content items to present to a target user. For
example, an online service may determine a relevancy of multiple
content items and present only those content items that are
associated with the highest (e.g., top 10) relevancy. Relevancy may
be based on multiple factors, such as how connected the source or
originator of the content item is to the target user (e.g., a first
degree or second degree connection), whether the source/originator
of the content item has communicated with the target user in the
past (e.g., using the online service), whether the target user is
in an address book of the source/originator and/or vice versa, how
similar the target user is to the source/originator (e.g., both
work in the same industry, both work for the same employer, both
having attended the same academic institution), how many words or
keywords in the content item are found in the target user's
profile, and how many words or keywords in the target's user's
profile are found in the content item.
Comparing a Profile with a Content Item
[0134] In an embodiment, a user's profile (or portion thereof) is
translated into a target language and compared or a content item
that is already in the target language to determine whether the
content item is relevant to the user. Any techniques may be used to
perform the translation. For example, a translation model that is
trained based on one or more of the techniques described herein
(i.e., using cognates identified from multiple user profiles) is
used to translate the user's profile. The translation of the user's
profile may occur prior to or subsequent to determining whether to
present the content item.
[0135] In an embodiment, the portion of a translated user's profile
that is compared to a content item depends on the type of content
item. For example, if the content item is a job posting, then a job
title and an industry specification in a user's profile are
translated (or they may already be translated) and the translated
job title and industry specification are compared to text within
the job posting. As another example, if the content item is a news
article, then a translated version of an employer name indicated in
the user's profile is compared to metadata associated with the news
article, the title of the news article, or to text within the news
article. As another example, if the content item is a comment of
another content item, then a translated version of the user's
personal interests are compared to text within the comment.
[0136] If it determined that a content item in a source language
should be presented to a user associated with a target language,
then a translated version of the content item is presented to the
user. The translated version may have been generated prior to the
determination or after the determination (e.g., "on-the-fly"). The
translated version of the content item may be of the entire content
item or a portion of the content item. For example, the translated
version may be only of metadata associated with the content item, a
title of the content item, one or more headings in the content
item, one or more (e.g., relevant) sentences or paragraphs in the
content item, or a combination thereof.
[0137] In a related embodiment, instead of comparing a translated
portion of a user's profile to a content item, a translated version
of the content item (or portion thereof) is compared to at least a
portion of the user's (untranslated) profile. For example, if a
content item is in Chinese and a user's profile is in English, then
a portion of the content item is translated into English and the
English translation is compared to the user's profile.
[0138] A determination may be made regarding which item (i.e., the
content item or the user profile) is to be translated for purposes
of comparison. The determination may be based on how confident the
online service is the respective translation models that may be
used to perform a translation. For example, if a translation model
for one type of content item is associated with more accurate
translations (or has been rated higher in some fashion) than a
translation model for a user profile in a particular language, then
then content item is translated. Otherwise, the user profile is
translated. Thus, for a single user, at one time, a content item is
translated and compared to a portion of the user's profile and, at
a later time, a portion of the user's profile is translated and
compared to a content item. Because a translation model may vary in
accuracy or reliability over time, the translation model may be
used at one point in time but not at a second point in time, even
though the translation model may have been used both times.
[0139] Different translation models for different languages may be
associated with different accuracy rates or reliability ratings.
Also, different translations models for different types of content
may vary in accuracy or reliability. For example, a first
translation model for English-to-Chinese translations of news
articles may be relatively high while a second translation model
for Chinese-to-English translations of user profiles may be
relatively low. In this scenario, the first translation model is
used to translate English news articles in order to compare
English-original news articles with profiles of Chinese readers. As
another example, a first translation model for Chinese-to-English
translations of academic course listings may be relatively low
while a second translation model for English-to-Chinese
translations of user profiles may be relatively high. In this
scenario, the second translation model is used to translate user
profiles into Chinese in order to compare Chinese academic course
listings with translated profiles of English readers.
[0140] FIG. 7 is a flow diagram that depicts a process 700 for
presenting, to a user associated with a first language, one or more
content items (or translated versions thereof) that are in a
language that is different than the first language, in an
embodiment. Process 700 may be performed by one or more computing
elements of server system 130, which has access to one or more user
profiles. While process 700 is described and depicted as occurring
in a specific order, other embodiments involve process 700 (or a
subset thereof) as occurring in a different order.
[0141] At block 710, a determination is made to display content to
a user that is associated with a first language. For example, the
user may have set a language preference in the user's profile
indicating the first language. As another example, the server
system 130 may analyze the language of the user's profile or a
majority of content that the user publishes to determine the first
language.
[0142] The determination may be made when the user requests a view
of the user's home page. Alternatively, the determination may be
made after a content item that is associated with a second language
is determined to be presented to the user. Thus, block 710 may
occur after block 740. In this scenario, server system 130 may
notify (e.g., through an application executing on a smartphone of
the user) the user that relevant content is available.
[0143] At block 720, one or more profile data items of the user are
identified. Each profile data item may be one of the data items in
the user's top-card information or any other keyword or relevant
text extracted from the user's profile, such as a word of phrase
from a (e.g., work or job) summary provided by the user.
[0144] At block 730, the one or more profile data items are
translated into the language associated with the content item,
which, in this example, is the second language. In an embodiment, a
single profile data item is translated into multiple profile data
items. For example, a profile data item is "la ingeniero de
programas,"which is translated into "Software Engineer" and
"Programmer" using a translation model or cognates directly. Both
translations may be provided because half (or a significant
percentage) of user profiles that have (1) Spanish-English profiles
and (2) "la ingeniero de programas" as a job title have "Software
Engineer" as an English counterpart and the other half have
"Programmer" as an English counterpart.
[0145] At block 740, the translated profile data item(s) are
compared to one or more tokens associated with the content item. A
token is a word, phrase (which may include multiple words that are
combined with one or more hyphens), or character (i.e., a character
in a character-based language, such as Chinese). A token may be
within the content item (e.g., a company name in an blog article)
or may be within metadata of the content item.
[0146] Block 740 may comprise determining a number of translated
profile data items are found in the content item or metadata
thereof. The higher the number, the higher the relevancy of the
content item to the user. Additionally, the number of occurrences
of each profile data item found in the content item or metadata
thereof may be a factor in determining relevancy of the content
item.
[0147] At block 750, based on the comparison, it is determined
whether the content item is to be presented to the user. If so,
process 700 proceeds to block 760, where the content item is
presented to the user, such as in a news feed (comprising multiple
content items) on a home page associated with the user.
[0148] At block 770, it is determined whether there are more
content items to consider. If not, then process 700 ends.
Otherwise, process 700 proceeds to block 780, after which a portion
of process 700 repeats. For example, process 700 may be used to
identify content items to present to the user in a feed, such as on
a home page of the user.
[0149] At block 780, another content item is selected, after which
process 700 proceeds to block 730 if the language of the other
content item is different than the language of the content item
considered in the previous iteration of block 730. For example, if
the language associated with the user is English, the first content
item is in Chinese, and block 780 involves selecting a content item
that is in Italian, then process 700 proceeds to block 730.
Otherwise, process 700 proceeds to block 740 (since the profile
data items have already been translated into the same language as
the newly selected content item).
Presenting Translated Content
[0150] Translated content may be presented to a user in one of a
number of ways and locations. For example, an online service
maintains an account for the user and, when logged on, the online
service includes a content feed feature where content is displayed
to the user through a dedicated application executing on a
computing device (e.g., smartphone) of the user. The user may
scroll down and view many content items, similar to an infinite
scroll feature. As the user scrolls down, more and more content
items are displayed. As another example, an online service may
maintain a home page for a registered user, where the home page is
of a fixed size. The online service modifies the home page to
include the translated content (and, thus, replace, content that
was previously presented to the user). The user may use a web
browser executing on a computing device (e.g., laptop computer) to
send, to the online service, a request with a URL that includes
data that identifies or authenticates the user. In response, the
online service determines content items to place in the
to-be-rendered web page and sends the content items (along with
other displayable content) to the computing device to be rendered
by the web browser. As another example, translated content may
appear on static portions of a display where display elements do
not move even though the user scrolls up or down or side to
side.
Hardware Overview
[0151] According to one embodiment, the techniques described herein
are implemented by one or more special-purpose computing devices.
The special-purpose computing devices may be hard-wired to perform
the techniques, or may include digital electronic devices such as
one or more application-specific integrated circuits (ASICs) or
field programmable gate arrays (FPGAs) that are persistently
programmed to perform the techniques, or may include one or more
general purpose hardware processors programmed to perform the
techniques pursuant to program instructions in firmware, memory,
other storage, or a combination. Such special-purpose computing
devices may also combine custom hard-wired logic, ASICs, or FPGAs
with custom programming to accomplish the techniques. The
special-purpose computing devices may be desktop computer systems,
portable computer systems, handheld devices, networking devices or
any other device that incorporates hard-wired and/or program logic
to implement the techniques.
[0152] For example, FIG. 8 is a block diagram that illustrates a
computer system 800 upon which an embodiment of the invention may
be implemented. Computer system 800 includes a bus 802 or other
communication mechanism for communicating information, and a
hardware processor 804 coupled with bus 802 for processing
information. Hardware processor 804 may be, for example, a general
purpose microprocessor.
[0153] Computer system 800 also includes a main memory 806, such as
a random access memory (RAM) or other dynamic storage device,
coupled to bus 802 for storing information and instructions to be
executed by processor 804. Main memory 806 also may be used for
storing temporary variables or other intermediate information
during execution of instructions to be executed by processor 804.
Such instructions, when stored in non-transitory storage media
accessible to processor 804, render computer system 800 into a
special-purpose machine that is customized to perform the
operations specified in the instructions.
[0154] Computer system 800 further includes a read only memory
(ROM) 808 or other static storage device coupled to bus 802 for
storing static information and instructions for processor 804. A
storage device 810, such as a magnetic disk or optical disk, is
provided and coupled to bus 802 for storing information and
instructions.
[0155] Computer system 800 may be coupled via bus 802 to a display
812, such as a cathode ray tube (CRT), for displaying information
to a computer user. An input device 814, including alphanumeric and
other keys, is coupled to bus 802 for communicating information and
command selections to processor 804. Another type of user input
device is cursor control 816, such as a mouse, a trackball, or
cursor direction keys for communicating direction information and
command selections to processor 804 and for controlling cursor
movement on display 812. This input device typically has two
degrees of freedom in two axes, a first axis (e.g., x) and a second
axis (e.g., y), that allows the device to specify positions in a
plane.
[0156] Computer system 800 may implement the techniques described
herein using customized hard-wired logic, one or more ASICs or
FPGAs, firmware and/or program logic which in combination with the
computer system causes or programs computer system 800 to be a
special-purpose machine. According to one embodiment, the
techniques herein are performed by computer system 800 in response
to processor 804 executing one or more sequences of one or more
instructions contained in main memory 806. Such instructions may be
read into main memory 806 from another storage medium, such as
storage device 810. Execution of the sequences of instructions
contained in main memory 806 causes processor 804 to perform the
process steps described herein. In alternative embodiments,
hard-wired circuitry may be used in place of or in combination with
software instructions.
[0157] The term "storage media" as used herein refers to any
non-transitory media that store data and/or instructions that cause
a machine to operation in a specific fashion. Such storage media
may comprise non-volatile media and/or volatile media. Non-volatile
media includes, for example, optical or magnetic disks, such as
storage device 810. Volatile media includes dynamic memory, such as
main memory 806. Common forms of storage media include, for
example, a floppy disk, a flexible disk, hard disk, solid state
drive, magnetic tape, or any other magnetic data storage medium, a
CD-ROM, any other optical data storage medium, any physical medium
with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM,
NVRAM, any other memory chip or cartridge.
[0158] Storage media is distinct from but may be used in
conjunction with transmission media. Transmission media
participates in transferring information between storage media. For
example, transmission media includes coaxial cables, copper wire
and fiber optics, including the wires that comprise bus 802.
Transmission media can also take the form of acoustic or light
waves, such as those generated during radio-wave and infra-red data
communications.
[0159] Various forms of media may be involved in carrying one or
more sequences of one or more instructions to processor 804 for
execution. For example, the instructions may initially be carried
on a magnetic disk or solid state drive of a remote computer. The
remote computer can load the instructions into its dynamic memory
and send the instructions over a telephone line using a modem. A
modem local to computer system 800 can receive the data on the
telephone line and use an infra-red transmitter to convert the data
to an infra-red signal. An infra-red detector can receive the data
carried in the infra-red signal and appropriate circuitry can place
the data on bus 802. Bus 802 carries the data to main memory 806,
from which processor 804 retrieves and executes the instructions.
The instructions received by main memory 806 may optionally be
stored on storage device 810 either before or after execution by
processor 804.
[0160] Computer system 800 also includes a communication interface
818 coupled to bus 802. Communication interface 818 provides a
two-way data communication coupling to a network link 820 that is
connected to a local network 822. For example, communication
interface 818 may be an integrated services digital network (ISDN)
card, cable modem, satellite modem, or a modem to provide a data
communication connection to a corresponding type of telephone line.
As another example, communication interface 818 may be a local area
network (LAN) card to provide a data communication connection to a
compatible LAN. Wireless links may also be implemented. In any such
implementation, communication interface 818 sends and receives
electrical, electromagnetic or optical signals that carry digital
data streams representing various types of information.
[0161] Network link 820 typically provides data communication
through one or more networks to other data devices. For example,
network link 820 may provide a connection through local network 822
to a host computer 824 or to data equipment operated by an Internet
Service Provider (ISP) 826. ISP 826 in turn provides data
communication services through the world wide packet data
communication network now commonly referred to as the "Internet"
828. Local network 822 and Internet 828 both use electrical,
electromagnetic or optical signals that carry digital data streams.
The signals through the various networks and the signals on network
link 820 and through communication interface 818, which carry the
digital data to and from computer system 800, are example forms of
transmission media.
[0162] Computer system 800 can send messages and receive data,
including program code, through the network(s), network link 820
and communication interface 818. In the Internet example, a server
830 might transmit a requested code for an application program
through Internet 828, ISP 826, local network 822 and communication
interface 818.
[0163] The received code may be executed by processor 804 as it is
received, and/or stored in storage device 810, or other
non-volatile storage for later execution.
[0164] In the foregoing specification, embodiments of the invention
have been described with reference to numerous specific details
that may vary from implementation to implementation. The
specification and drawings are, accordingly, to be regarded in an
illustrative rather than a restrictive sense. The sole and
exclusive indicator of the scope of the invention, and what is
intended by the applicants to be the scope of the invention, is the
literal and equivalent scope of the set of claims that issue from
this application, in the specific form in which such claims issue,
including any subsequent correction.
* * * * *