U.S. patent application number 11/948374 was filed with the patent office on 2008-03-27 for identifying and measuring related queries.
This patent application is currently assigned to YAHOO! INC.. Invention is credited to Rosie Jones, Benjamin Rey, Wei Vivian Zhang.
Application Number | 20080077588 11/948374 |
Document ID | / |
Family ID | 38445252 |
Filed Date | 2008-03-27 |
United States Patent
Application |
20080077588 |
Kind Code |
A1 |
Zhang; Wei Vivian ; et
al. |
March 27, 2008 |
IDENTIFYING AND MEASURING RELATED QUERIES
Abstract
A system and method are disclosed for identifying similar
queries. A user query may be compared with known search keywords.
The user query may be a Chinese related query, which is converted
into a different form before comparing with other converted queries
or keywords. A similarity score based on different features may be
used for comparing the queries.
Inventors: |
Zhang; Wei Vivian;
(Glendale, CA) ; Jones; Rosie; (Pasadena, CA)
; Rey; Benjamin; (Eguilles, FR) |
Correspondence
Address: |
BRINKS HOFER GILSON & LIONE / YAHOO! OVERTURE
P.O. BOX 10395
CHICAGO
IL
60610
US
|
Assignee: |
YAHOO! INC.
701 First Avenue
Sunnyvale
CA
94089
|
Family ID: |
38445252 |
Appl. No.: |
11/948374 |
Filed: |
November 30, 2007 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11365315 |
Feb 28, 2006 |
|
|
|
11948374 |
Nov 30, 2007 |
|
|
|
Current U.S.
Class: |
1/1 ;
707/999.006; 707/E17.07 |
Current CPC
Class: |
G06Q 30/00 20130101;
G06F 16/3332 20190101; Y10S 707/99934 20130101 |
Class at
Publication: |
707/006 |
International
Class: |
G06F 12/00 20060101
G06F012/00 |
Claims
1. A method for matching queries with keywords comprising:
receiving a non-native language user query; gathering a candidate
set of the keywords to be compared with the user query; converting
the user query to a form for comparison with the keywords, wherein
the keywords are converted to the form for comparison; comparing
the converted user query with each of the keywords, wherein a
similarity score is established for each keyword to determine
similarity with the user query; and matching at least one keyword
from the keywords with the user query based on the similarity
score.
2. The method according to claim 1 wherein the non-native language
user query comprises a Chinese related user query, wherein the
Chinese related user query comprises at least one Chinese
character.
3. The method according to claim 1 wherein each of the keywords are
associated with at least one advertisement.
4. The method according to claim 3 further comprising: providing
the at least one advertisement that is associated with the matched
at least one keyword.
5. The method according to claim 1 wherein the converting of the
user query comprises at least one of adding, removing, or
substituting at least one character from the user query.
6. The method according to claim 1 wherein the similarity score for
each of the keywords is based on an edit distance with the
converted user query.
7. In a computer readable storage medium having stored therein data
representing instructions executable by a programmed processor for
comparing a Chinese query with keywords, the storage medium
comprising instructions operative for: receiving the Chinese query;
selecting a set of the keywords for comparing with the Chinese
query; converting the Chinese query into at least one different
form; converting the set of keywords into the at least one
different form; determining at least one comparison between the
Chinese query and the set of keywords, wherein the at least one
comparison comprises a similarity score between the Chinese query
and the set of keywords; and identifying one of the set of keywords
based on the similarity score.
8. The storage medium according to claim 7 wherein the at least one
comparison comprises calculating an edit distance between the
converted Chinese query and each of the converted set of
keywords.
9. The storage medium according to claim 8 wherein the identified
one of the set of keywords has a closest edit distance with the
converted Chinese query.
10. The storage medium according to claim 7 wherein the at least
one different form comprises a conversion of at least one character
to at least one of a Chinese soundex form, a zhuyin form, a
radicals form, a pinyin without tone form, a pinyin with tone form,
or a Chinese utf8 form.
11. A method for determining similarity between queries comprising:
selecting at least two queries from a set of queries according to
one language system; converting each of the at least two queries
into a different format, wherein the conversion comprises a
transformation of certain characters in the at least two queries;
determining at least one comparison feature for each of the at
least two queries; and comparing the at least two queries based on
the at least one comparison feature to determine a similarity
between the at least two queries based on each of the at least one
comparison feature.
12. The method according to claim 11 wherein the language system
comprises Chinese, and the set of queries are Chinese related
queries.
13. The method according to claim 12 wherein the transformation of
certain characters in the at least two queries comprises changing
at least one character into at least one of a Chinese soundex form,
a zhuyin form, a radicals form, a pinyin without tone form, a
pinyin with tone form, or a Chinese utf8 form.
14. The method according to claim 11 wherein the at least one
comparison feature comprises at least one of comparing an edit
distance, comparing a character level prefix overlap, or comparing
a character level suffix overlap.
15. The method according to claim 14 wherein the comparing the edit
distance further comprises comparing an edit distance by characters
or an edit distance by words.
16. The method according to claim 11 wherein the at least two
queries are converted into a plurality of different formats,
wherein the comparing further comprises comparing the at least two
queries in each of the plurality of different formats.
17. A method for comparing queries comprising: receiving at least
two queries, wherein each of the at least two queries comprise at
least one Chinese representation; converting the at least two
queries into at least one common format; calculating an edit
distance between the converted at least two queries for each of the
at least one common format; and recording the edit distances
between each of the converted at least two queries.
18. The method according to claim 17 wherein the at least one
common format comprises an addition, subtraction, or substitution
of at least one character.
19. The method according to claim 17 wherein the at least one
common format comprises a conversion of at least one character into
at least one of a Chinese soundex form, a zhuyin form, a radicals
form, a pinyin without tone form, a pinyin with tone form, or a
Chinese utf8 form
20. A system for measuring related queries comprising: a search
engine operative to receive a user search query; an ad server
coupled with the search engine and operative to provide an
advertisement for display in response to the received user search
query, wherein the ad server includes a plurality of search
keywords, each of which are associated with at least one
advertisement; a search log database coupled with the search engine
and operative to store search queries including the plurality of
search keywords; and a language analyzer coupled with the search
engine that comprises: a receiver operative to receive the user
search query; a converter coupled with the receiver and operative
to convert the user search query into a different form; a
comparator coupled with the converter and operative to compare the
converted search query with a candidate set of the plurality of
search keywords; and a calculator coupled with the comparator and
operative to calculate a similarity score for each member of the
candidate set based on the comparison with the converted search
query; wherein the associated at least one advertisement that is
associated with the member of the candidate set with a closest
similarity score is provided for display in response to the
received search query.
21. The system according to claim 20 wherein the user search query
is Chinese related and the converter is operative to change the
Chinese related user search query into the different form by at
least one of adding, deleting or substituting at least one of the
characters of the Chinese related user search query.
22. The system according to claim 20 wherein the calculation of the
similarity score comprises a computation of an edit distance
between the converted user search query and the member of the
candidate set.
23. The system according to claim 20 wherein the converter is
operative to convert the candidate set of the plurality of search
keywords into the different form for comparison with the converted
user search query.
Description
[0001] This application is a continuation-in-part application to
U.S. patent application Ser. No. 11/363,315 (U.S. Pat. Pub. No.
2007/0203894), entitled "SYSTEM AND METHOD FOR IDENTIFYING RELATED
QUERIES FOR LANGUAGES WITH MULTIPLE WRITING SYSTEMS," filed Feb.
28, 2006, the disclosure of which is hereby incorporated by
reference.
BACKGROUND
[0002] Online advertising may be an important source of revenue for
enterprises engaged in electronic commerce. A number of different
kinds of web page based online advertisements are currently in use,
along with various associated distribution requirements,
advertising metrics, and pricing mechanisms. Processes associated
with technologies such as Hypertext Markup Language (HTML) and
Hypertext Transfer Protocol (HTTP) enable a web page to be
configured to contain a location for inclusion of an advertisement.
A page may not only be a web page, but any other electronically
created page or document. An advertisement can be selected for
display each time the page is requested, for example, by a browser
or server application.
[0003] Online advertising may be linked to online searching. Online
searching is a common way for consumers to locate information,
goods, or services on the Internet. A consumer may use an online
search engine to type in a query to search for other pages or web
sites with information related to that query. When the advertising
that is shown on the search engine page is related to the query,
the search may be referred to as a sponsored search. Sponsored
searching may require advertisers to bid for search keywords. The
search keywords are associated with the search query for displaying
advertisements with the search results. It may be difficult to
identify which keyword(s) that a search query is related to. In
particular, users may enter search queries that are misspelled or
that are in a different language.
BRIEF DESCRIPTION OF THE DRAWINGS
[0004] The system and method may be better understood with
reference to the following drawings and description. Non-limiting
and non-exhaustive embodiments are described with reference to the
following drawings. The components in the drawings are not
necessarily to scale, emphasis instead being placed upon
illustrating the principles of the invention. In the drawings, like
referenced numerals designate corresponding parts throughout the
different views.
[0005] FIG. 1 is a block diagram of an exemplary network
system;
[0006] FIG. 2 is a block diagram of a language analyzer;
[0007] FIG. 3 is a block diagram of exemplary conversion forms;
[0008] FIG. 4 is a block diagram of exemplary comparisons of
queries;
[0009] FIG. 5 is a flow diagram for identifying related queries;
and
[0010] FIG. 6 is a block diagram of a general computer system for
use with the disclosed embodiments.
DETAILED DESCRIPTION
[0011] By way of introduction, the embodiments described below
include a system and method for identifying and measuring related
queries. The embodiments relate to identifying similar Chinese
queries. A user query may be compared with known search keywords or
other search queries. The search keywords may be used by
advertisers for sponsored searching. The user query may be a
non-native language query, such as a Chinese related query in an
English language website or a query in a Chinese website. The user
query is converted into a different form before comparing with
other converted queries or the search keywords. For explanation
purposes, the embodiments are described in terms of a Chinese
related query, but other languages or query platforms may be used.
A similarity score based on various features may be used for
comparing the queries. Based on the similarity score or other
comparison features, the original user query may be substituted by
other queries or be associated with one or more search keywords.
The associated search keywords may be used for selecting the
advertisements that are displayed with the search results for that
search query.
[0012] Alternatively, related queries may be identified from a
reformulation of the original query. The reformulation may be based
on stored query logs and used to compare the original query with
stored queries. As part of the comparison, various features,
including language specific features, may be used to measure query
similarity. Based on the query similarity the original query may be
substituted for a stored query or search keyword for identifying
the relevant advertisements to display. A user's query may be
misspelled and the system may identify a related query that is
correctly spelled that replaces the initial user query. Chinese
related queries may be identified and measured due to an increased
interest in Chinese search and advertising markets.
[0013] Other systems, methods, features and advantages will be, or
will become, apparent to one with skill in the art upon examination
of the following figures and detailed description. It is intended
that all such additional systems, methods, features and advantages
be included within this description, be within the scope of the
invention, and be protected by the following claims. Nothing in
this section should be taken as a limitation on those claims.
Further aspects and advantages are discussed below in conjunction
with the embodiments.
[0014] FIG. 1 provides a simplified view of a network system 100 in
which the present embodiments may be implemented. Not all of the
depicted components may be required, however, and some embodiments
of the invention may include additional components not shown in the
figure. Variations in the arrangement and type of the components
may be made without departing from the spirit or scope of the
claims as set forth herein. Additional, different or fewer
components may be provided.
[0015] FIG. 1 is a block diagram illustrating an embodiment of an
exemplary network system 100 for language analysis and comparison.
In particular, system 100 includes a language analyzer 104 that may
receive and convert a user's search query for comparison with other
queries or search keywords. A user device 106 is coupled with a
search engine 102 through the network 109. The search engine 102 is
coupled with a search log database 112, and both are coupled with
the language analyzer 104. The search log database 112 is coupled
with a data source 113 and a unit dictionary 116. An ad server 103
may be coupled with the search engine 102 and/or coupled with the
language analyzer 104. Herein, the phrase "coupled with" is defined
to mean directly connected to or indirectly connected through one
or more intermediate components. Such intermediate components may
include both hardware and software based components. Variations in
the arrangement and type of the components may be made without
departing from the spirit or scope of the claims as set forth
herein.
[0016] The user device 106 may be a computing device for a user to
connect to a network 109, such as the Internet. Examples of a user
device include but are not limited to a personal computer, personal
digital assistant ("PDA"), cellular phone, or other electronic
device. The user device 106 may be configured to access other
data/information in addition to web pages over the network 109 with
a web browser, such as INTERNET EXPLORER.RTM. (sold by Microsoft
Corp., Redmond, Wash.). The user device 106 may enable a user to
view pages over the network 109, such as the Internet. The user
device 106 may be the user device described below with respect to
FIG. 6.
[0017] The user device 106 may be configured to allow a user to
interact with the search engine 102, the ad server 103, or other
components of the system 100. In one embodiment, the user device
106 may receive and display a site or page provided by the search
engine 102. The user device 106 may include a keyboard, keypad or a
cursor control device, such as a mouse, or a joystick, touch screen
display, remote control or any other device operative to allow a
user to interact with the page(s) provided by the search engine 102
and/or the ad server 103.
[0018] The search engine 102 is coupled with the user device 106
through the network 109, as well as being coupled with the language
analyzer 104, the ad server 103 and/or the search log database 112.
In one embodiment, the search engine 102 is a web server. The
search engine 102 may provide a site or a page over a network, such
as the network 109 or the Internet. A site or page may refer to a
web page or a series of related web pages which may be received or
viewed over a network. The site or page is not limited to a web
page, and may include any information accessible over a network
that may be displayed at the user device 106. In one embodiment, a
site may refer to a series of pages which are linked by a site map.
For example, the web site of www.yahoo.com (operated by Yahoo!
Inc., in Sunnyvale, Calif.) may include thousands of pages, which
are included at yahoo.com. Hereinafter, a page will be described as
a web page, a web site, or any other site/page accessible over a
network. A user of the user device 106 may access a page provided
by the search engine 102 over the network 109. As described below,
the page provided by the search engine 102 may be a search page
that receives a search query from the user device 106 and provides
search results that are based on the received search query.
[0019] The search engine 102 may include an interface, such as a
web page, e.g., the web page which may be accessed on the World
Wide Web at yahoo.com, which is used to search for pages which are
accessible via the network 109. The user device 106, autonomously
or at the direction of the user, may input a search query (also
referred to as a user query, original query, search term or a
search keyword) for the search engine 102. A single search query
may include multiple words or phrases. The search engine 102 may
perform a search for the search query and display the results of
the search on the user device 106. The results of a search may
include a listing of related pages or sites that is provided by the
search engine 102 in response to receiving the search query.
[0020] The ad server 103 is coupled with the search engine 102
and/or the language analyzer 104. The ad server 103 may be
configured to provide advertisements to the search engine 102. In
an alternate embodiment, the search engine 102 and the ad server
103 may be a common component and/or the search engine 102 may
select and provide advertisements. The ad server 103 may include or
be coupled with an advertisement database that includes
advertisements that are available to be displayed by the search
engine 102 for sponsored searching. In addition, the advertisements
may be associated with one or more search keywords. The search
keywords may be purchased or bid on by advertisers. Accordingly,
when that search keyword is searched for, the advertiser who
purchased or placed the highest bid is selected and their
advertisement is displayed. The ad server 103 may include or be
coupled with a database, such as an advertisement database, that
stores search keywords and the respective price or bid for each
keyword from advertisers that is referenced for each search query.
In one embodiment, a search query is received and compared with
known search keywords or other search queries when the ad server
103 selects and provides the advertisement to the search engine
102.
[0021] The search log database 112 includes records or logs of at
least a subset of the search queries entered in the search engine
102 over a period of time and may also be referred to as a search
query log, search term database, keyword database or query
database. In one embodiment, the search log database 112 may store
the search keywords that are used by the ad server 103 in selecting
an advertisement for a particular search query. The search log
database 112 may include search queries from any number of users
over any period of time. Alternatively, the search log database 112
may include records or logs of a subset of the queries or requests
for data entered at the search engine 102 over a period of time.
The search log database 112 may also store associations between
search queries from the search engine 102. For example, a search
query may be associated with a search keyword or other search
queries after a conversion and comparison by the language analyzer
104 as discussed below.
[0022] The search log database 112 may also be coupled with a data
source 113. The data source 113 may be an internal source of data,
an external source of search data, or a combination of the two. An
external data source may include search results from other search
engines or other sources. For example, a search engine other than
search engine 102 may be an external data source and provide search
logs to the search log database 112. An internal data source may
include search data or other data from the search engine 102. Other
data may include other searching or web browsing tendencies
identified by the search engine 102.
[0023] The search log database 112 may also be coupled with a unit
dictionary 116. The unit dictionary 116 may be a database of user
queries or search keywords that are coupled with one another as
units. Units may also be referred to as concepts or topics and are
sequences of one or more words that appear in search queries. For
example, the search query "New York City law enforcement" may
include two units, e.g. "New York City" may be one unit and "law
enforcement" may be another unit. A unit is a phrase of common
words that identify a single concept. As another example, the
search query "Chicago art museums" may include two units, e.g.
"Chicago" and "art museums." The "Chicago" unit is a single word,
and "art museums" is a two-word unit. Units identify common groups
of keywords to maximize the efficiency and relevance of search
results. The unit dictionary 116 may include Chinese related
queries, as well as Chinese related units that include Chinese
characters. Categorization of search queries into units is
discussed in commonly owned U.S. Pat. No. 7,051,023 issued May 23,
2006, entitled "SYSTEMS AND METHODS FOR GENERATING CONCEPT UNITS
FROM SEARCH QUERIES," which is hereby incorporated by
reference.
[0024] The unit dictionary 116 and the categorization of search
queries into units may be used to compare and analyze search
queries received by the search engine 102. A search query may be
broken into units that are compared with units from other queries
or search keywords. In one embodiment, past search queries and
search keywords are stored in the search log database 112 as units
that may be used in an analysis by the language analyzer 104.
[0025] In one embodiment, the ad server 103, the search engine 102
and/or the search log database 112 may be coupled with the language
analyzer 104. The language analyzer 104 receives a user query from
the user device 106 and matches or identifies other queries or
search keywords. The user query may be converted to a different
form for comparing various features of the user query with search
keywords as discussed with respect to FIG. 2.
[0026] The language analyzer 104 may be a computing device as
described below with respect FIG. 6. In one embodiment, the
language analyzer 104 includes a processor 105, memory 107,
software 108 and an interface 110. The language analyzer 104 may be
a separate component from the search engine 102 and the ad server
103. In an alternative embodiment, any of the language analyzer
104, search engine 102, and the ad server 103 may be combined as a
single component. The interface 110 may communicate with any of the
search engine 102, search log database 112, and ad server 103. In
one embodiment, the interface 110 may include a user interface
configured to allow a user to interact with any of the components
of the language analyzer 104. For example, a user may be able to
modify the conversion form or comparison features that are used by
the language analyzer 104.
[0027] The processor 105 in the language analyzer 104 may include a
central processing unit (CPU), a graphics processing unit (GPU), a
digital signal processor (DSP) or other type of processing device.
The processor 105 may be a component in any one of a variety of
systems. For example, the processor 105 may be part of a standard
personal computer or a workstation. The processor 105 may be one or
more general processors, digital signal processors, application
specific integrated circuits, field programmable gate arrays,
servers, networks, digital circuits, analog circuits, combinations
thereof, or other now known or later developed devices for
analyzing and processing data. The processor 105 may operate in
conjunction with a software program, such as code generated
manually (i.e., programmed).
[0028] The processor 105 may be coupled with a memory 107, or the
memory 107 may be a separate component. The interface 110 and/or
the software 108 may be stored in the memory 107. The memory 107
may include, but is not limited to computer readable storage media
such as various types of volatile and non-volatile storage media,
including to random access memory, read-only memory, programmable
read-only memory, electrically programmable read-only memory,
electrically erasable read-only memory, flash memory, magnetic tape
or disk, optical media and the like. In one embodiment, the memory
107 includes a random access memory for the processor 105. In
alternative embodiments, the memory 107 is separate from the
processor 105, such as a cache memory of a processor, the system
memory, or other memory. The memory 107 may be an external storage
device or database for storing recorded image data. Examples
include a hard drive, compact disc ("CD"), digital video disc
("DVD"), memory card, memory stick, floppy disc, universal serial
bus ("USB") memory device, or any other device operative to store
image data. The memory 107 is operable to store instructions
executable by the processor 105.
[0029] The functions, acts or tasks illustrated in the figures or
described herein may be performed by the programmed processor
executing the instructions stored in the memory 107. The functions,
acts or tasks are independent of the particular type of instruction
set, storage media, processor or processing strategy and may be
performed by software, hardware, integrated circuits, firm-ware,
micro-code and the like, operating alone or in combination.
Likewise, processing strategies may include multiprocessing,
multitasking, parallel processing and the like. The processor 105
is configured to execute the software 108. The software 108 may
include instructions for analyzing and converting search queries
and comparing features with other queries or search keywords.
[0030] The interface 110 may be a user input device or a display.
The interface 110 may include a keyboard, keypad or a cursor
control device, such as a mouse, or a joystick, touch screen
display, remote control or any other device operative to interact
with the language analyzer 104. The interface 110 may include a
display coupled with the processor 105 and configured to display an
output from the processor 105. The display may be a liquid crystal
display (LCD), an organic light emitting diode (OLED), a flat panel
display, a solid state display, a cathode ray tube (CRT), a
projector, a printer or other now known or later developed display
device for outputting determined information. The display may act
as an interface for the user to see the functioning of the
processor 105, or as an interface with the software 108 for
providing input parameters. In particular, the interface 110 may
allow a user to interact with the language analyzer 104 to
establish a conversion of a user query and the features that are
compared in matching a query with a search keyword.
[0031] Any of the components in system 100 may be coupled with one
another through a network. For example, the language analyzer 104
may be coupled with the search engine 102, search log database 112,
or ad server 103 via a network. Any of the components in system 100
may include communication ports configured to connect with a
network. The present disclosure contemplates a computer-readable
medium that includes instructions or receives and executes
instructions responsive to a propagated signal, so that a device
connected to a network can communicate voice, video, audio, images
or any other data over a network. The instructions may be
transmitted or received over the network via a communication port
or may be a separate component. The communication port may be
created in software or may be a physical connection in hardware.
The communication port may be configured to connect with a network,
external media, display, or any other components in system 100, or
combinations thereof. The connection with the network may be a
physical connection, such as a wired Ethernet connection or may be
established wirelessly as discussed below. Likewise, the
connections with other components of the system 100 may be physical
connections or may be established wirelessly.
[0032] The network or networks that may connect any of the
components in the system 100 to enable communication of data
between the devices may include wired networks, wireless networks,
or combinations thereof. The wireless network may be a cellular
telephone network, a network operating according to a standardized
protocol such as IEEE 802.11, 802.16, 802.20, published by the
Institute of Electrical and Electronics Engineers, Inc., or a WiMax
network. Further, the network(s) may be a public network, such as
the Internet, a private network, such as an intranet, or
combinations thereof, and may utilize a variety of networking
protocols now available or later developed including, but not
limited to TCP/IP based networking protocols. The network(s) may
include one or more of a local area network (LAN), a wide area
network (WAN), a direct connection such as through a Universal
Serial Bus (USB) port, and the like, and may include the set of
interconnected networks that make up the Internet. The network(s)
may include any communication method or employ any form of
machine-readable media for communicating information from one
device to another. For example, the ad server 103 or the search
engine 102 may provide pages to the user device 106 over a network,
such as the network 109. The network or networks described above,
including the network 109, may be the network discussed below with
respect to FIG. 6.
[0033] The ad server 103, the search engine 102, the search log
database 112, the language analyzer 104, the unit dictionary 116
and/or the user device 106 may represent computing devices of
various kinds, such as the components described with respect to
FIG. 6. Such computing devices may generally include any device
that is configured to perform computation and that is capable of
sending and receiving data communications by way of one or more
wired and/or wireless communication interfaces. Such devices may be
configured to communicate in accordance with any of a variety of
network protocols, as discussed above. For example, the user device
106 may be configured to execute a browser application that employs
HTTP to request information, such as a web page, from the search
engine 102 or ad server 103. The present disclosure contemplates a
computer-readable medium that includes instructions or receives and
executes instructions responsive to a propagated signal, so that
any device connected to a network can communicate voice, video,
audio, images or any other data over a network.
[0034] FIG. 2 illustrates an embodiment of a language analyzer. As
described with respect to FIG. 1, the language analyzer 104 may
convert a search query into a different form for comparing its
features with other queries or search keywords that are used for
selecting matching advertisements to be displayed on a search
results page. The language analyzer 104 may include a receiver 202,
a converter 204, a comparator 206, and a calculator 208. As shown,
the language analyzer 104 or any of its components may represent
computing devices of various kinds, such as the components
described with respect to FIG. 6.
[0035] The receiver 202 may receive a user query from the search
engine 102, which may receive the user query from the user device
106. The receiver 202 may also receive search keywords from the ad
server 103. The search keywords may be matched with advertisements,
such that when a user inputs the search keyword in a search engine,
the search results page includes the matched advertisement.
Accordingly, the language analyzer 104 may match user queries with
search keywords for selecting advertisements to be displayed on the
search query results page.
[0036] The converter 204 is coupled with the receiver 202. The
converter 204 receives the user query or other search keywords and
converts them into a different form for comparison. As described,
the user query may be a Chinese related query and the converter 204
may convert the Chinese related query into a different form to aid
comparison. A Chinese related query may include any Chinese
characters, including Roman characters that represent a Chinese
character or phrase. Chinese related queries may also include
queries that originate from or are received by a Chinese search
engine and may be simplified Chinese and/or traditional
Chinese.
[0037] FIG. 3 illustrates exemplary conversion forms. In
particular, the converter 204 may utilize any of the conversion
forms 302 to convert a Chinese related query. The converter 204 may
convert a search query into any of the conversion forms 302 to
compare the query with other converted queries or converted search
terms. As described below, the conversion may include a
transformation of the query by adding, deleting, and/or
substituting characters or words in the queries. The conversion or
transformation may result in a common format or common form that
may be used for comparing the queries. The conversion forms 302
shown in FIG. 3 are merely exemplary. In alternate embodiments,
there may be additional conversions forms 302 that are not
illustrated or described. The conversion may receive a Chinese
related query and convert each element or selected elements of the
query into an array that represents the converted form of the
Chinese related query.
[0038] A first conversion form is a conversion into Chinese soundex
304. The Chinese characters are converted into pinyin without tone,
while the roman letters remain. The query is then converted into a
Chinese soundex-like representation by first retaining the first
letter of a string. Second, all occurrences of a, e, h, i, o, and u
are removed, unless it is the first letter. Third, characters may
be replaced, such as, replacing "zh" with "z," "ch" with "c," "sh"
with "s," "ng" with "n," "rd" with "d," "rl" with "l," "rn" with
"n," "rs" with "s," and/or "rt" with "t." Fourth, the remaining
letters after the first letter are assigned a number, such as, (m,
n, l)=1, (b, p)=2, (f, v, w, h)=3, (d, t)=4, (j, z, s, x, q, c, g,
k)=5, (r)=6, (y)=7, and (a)=8. Fifth, if two or more letters are
adjacent, then the first letter remains and the others are omitted.
Sixth, the spaces are removed. Seventh, all characters remaining
are returned.
[0039] A second conversion form is converting the Chinese
characters into the keyboard input form zhuyin (Bopomofo) 306. Each
element in the array is either all zhuyin characters for one
corresponding Chinese character or a roman character originally in
the query without transformation. A third conversion form is a
similar zhuyin (Bopomofo) conversion 308, except each element in
the array is either one zhuyin character or a roman character
originally in the query without transformation.
[0040] A fourth conversion form is converting Chinese characters
into radicals 310. Each element in the array is either the radical
for a Chinese character or the roman character originally in the
query without transformation. A radical 310 may be the semantic
root (i.e., portion bearing the meaning) of a Chinese character. A
radical may be part of a Chinese character and/or the semantic
component of this Chinese character. For example, in the character
pronounced as jie with a meaning of "sister", the left part
(pronounced n{umlaut over ({hacek over (u)})} in Mandarin Chinese)
is the semantic component. Chinese characters may have at least one
or two radicals. The radicals may be used for Chinese Hanzi. A
dictionary may be used to match a Chinese character with its
radical(s). When a Chinese character has multiple radicals, the
most meaningful radical (which may be identified in a dictionary)
may be considered for comparison.
[0041] A fifth conversion form is converting Chinese characters
into pinyin without tone 412. Each element in the array is either
the complete pinyin without tone for one corresponding Chinese
character or a roman character originally in the query without any
transformation. A sixth conversion form is converting Chinese
characters into pinyin without tone 414 in which each element in
the array is either one pinyin character or a roman character
originally in the query without transformation. Pinyin may be a
Standard Mandarin Romanization system. In pinyin, the pin refers to
a "spelling" and the yin refers to a "sound." There may be a pinyin
corresponding to each Chinese Character. One pinyin may include
more than two roman characters. In the fifth conversion form, each
pinyin may be a unit for similarity comparison. In the sixth
conversion form, each character within pinyin may be a unit for
comparison.
[0042] A seventh conversion form is converting Chinese characters
into pinyin with tone 416. Each element in the array is either the
complete pinyin and its tone for one corresponding Chinese
character or a roman character originally in the query without
transformation. An eighth conversion form is converting Chinese
characters into pinyin with tone 418 in which each element in the
array is either one pinyin character, its tone, or a roman
character originally in the query without transformation. A ninth
conversion form is converting queries into two character-based
arrays 420. In particular, if a character is Chinese, three bytes
in Chinese (utf8) is an element in the array. In other words, each
Chinese character is represented in three bytes. If a character is
roman, then the roman character itself is an element.
[0043] A tenth conversion form is the removal of Chinese characters
422. The roman characters are left in the query and the Chinese
characters are removed. Likewise, an eleventh conversion form
removes the roman characters 424, and keeps the Chinese characters
in the query. A twelve conversion form includes leaving the query
as inputted 426. In other words, the twelve conversion is no
conversion 426.
[0044] In one embodiment, the receiver 202 receives two queries
that are to be compared to determine the similarity between those
queries. The queries are converted into at least one of the
conversion forms by the converter 204. In one embodiment, both
queries are converted into the twelve exemplary conversion forms
302 and the queries are compared in all twelve converted forms.
Alternatively, certain conversion forms are selected for converting
the queries and the queries are compared for each of those
converted forms.
[0045] After being converted, the queries may be compared by the
comparator 206. The comparator 206 may be configured to perform
comparison of a user's search query with other queries or with
search keywords that are used by the ad server 103 for displaying
relevant advertisements that are linked to particular search
keywords. In one embodiment, the comparator 206 determines the
similarity between two queries. The queries are first converted
into a similar form or similar forms by the converter 204 and each
of those forms are compared by the comparator 206. In one
embodiment, the queries are converted into the twelve forms
illustrated in FIG. 3 and the comparator 206 makes twelve
comparisons between the queries for each of the twelve conversions
of each query. In alternative embodiments, there may be more or
fewer conversion forms that are compared by the comparator 206.
[0046] In one embodiment, a user query may be compared with a
candidate set of queries to determine which of the candidate set is
most similar to the user query. The candidate set may be made up of
search keywords which are compared with the user query to determine
which search keyword is most similar. The candidate set of queries
or keywords for comparison may be chosen based on an initial
analysis of the user query compared with the search log database
112. In one embodiment, when the user query is received the
candidate set is identified and each member of the candidate set is
compared with the user query to determine which is most similar. As
described below, a similarity score may be calculated for each
member of the candidate set that represents a similarity with the
user query. The member of the candidate set with the closest
similarity score may be most similar to the user query. In an
alternative embodiment, the candidate set may include one query or
include all queries, such as those stored in the search log
database 112.
[0047] FIG. 4 illustrates exemplary comparisons of queries. In
particular, the comparator 206 may utilize comparison features 402
when comparing queries. The comparison features 402 shown in FIG. 4
are merely exemplary. In alternate embodiments, there may be
additional comparison features 402 that are not illustrated or
described. The comparison may involve comparing various forms of
converted Chinese related queries. In particular, the comparator
206 may compare an array of elements that is generated by the
converter 204 as a converted form of a Chinese related query. In
one embodiment, the comparator 402 may compare queries as described
in the commonly owned U.S. application entitled, "SYSTEM AND METHOD
FOR IDENTIFYING RELATED QUERIES FOR LANGUAGES WITH MULTIPLE WRITING
SYSTEMS," U.S. Pat. Pub. No. 2007/0203894, filed Feb. 28, 2006, the
disclosure of which is hereby incorporated by reference.
[0048] A first comparison feature may be an edit distance 404
between two queries. The edit distance may be a measure of the
difference between two character strings, such as queries. In one
embodiment, the edit distance may be a minimum number of edit
operations required to transform the first query into the second
query. The edit operation may include inserting or deleting a
character into a string or replacing a character by another
character. In an alternative embodiment, weights may be assigned
for different edit operations. For example, a higher weight may be
placed on replacing the character s by the character p, than on
replacing it by the character a. The edit distance may be the
Levenshtein distance or the Damerau-Levenshtein distance when a
transposition of characters counts as a single edit operation. In
alternative embodiments, there may be other algorithms that are
used for determining the edit distance between queries or there may
be more or fewer edit operations that are used in determining an
edit distance between queries.
[0049] A second comparison feature may be an edit distance without
a domain 406. In particular, two queries may have their domains
removed before computing the edit distance. The domain may be a web
domain, such as ".com" that is removed. The removal of the domain
may be helpful because a user querying "yahoo.com" and "yahoo.net"
is likely making the same query. A third comparison feature may be
a character level prefix overlap 408. The character level prefix
overlap 408 may be a measure of the characters/words that are the
same at the beginning of the queries. For example, "auto cleaners"
and "auto cleaning" have a prefix overlap of "auto clean." The
prefix overlap may indicate increased similarity. A fourth
comparison feature may be a character level suffix overlap 410. The
character level suffix overlap 410 measures the similarity between
queries at the end of the query. For example, "auto insurance
agent" and "home insurance agent" share a suffix overlap of
"insurance agent." Similar, to the prefix overlap, the suffix
overlap may indicate increased similarity.
[0050] A fifth comparison feature may be a minimum edit distance
412 over all the conversion forms. Likewise, a sixth comparison
feature may be a maximum edit distance 414 over all the conversion
forms. Given twelve conversion forms and twelve edit distances for
each conversion, the minimum edit distance 412 and the maximum edit
distance 414 may be identified. In one embodiment, the minimum and
maximum may be removed as outliers. Alternatively, the minimum or
maximum may be weighted higher when computing a similarity score. A
seventh comparison feature may be a minimum edit distance without a
domain 416 and an eighth comparison feature may be a maximum edit
distance without a domain 418. As discussed above, the domains in a
query may not be valuable in terms of determining what the user is
searching for, so the domains are removed before comparison.
[0051] Additional comparison features may be a word level edit
distance 420, a word level prefix overlap 422, or a word level
suffix overlap 424. The word level comparisons are similar to the
character level comparisons, except entire words are compared
rather than individual characters. A length difference 426 between
two queries may also be used for comparing.
[0052] The comparator 206 may be coupled with a calculator 208 that
may calculate a similarity score. The similarity score may be a
measure of the similarity between the queries. The similarity score
may be calculated based on individual comparisons of different
conversion forms of two queries with each individual comparison
being assigned a weighted value. The multiple conversion forms
described with respect to FIG. 3 may each result in a separate
comparison between two queries. Accordingly, using the twelve
conversion forms 302, there may be twelve different edit distances
or similarity scores, one for each conversion. Those twelve
converted forms may be compared and multiplied by a weight for each
form to get an overall similarity score between the queries.
Alternatively, a subset of the twelve conversion forms or
additional conversion forms not described may be utilized to
convert Chinese related queries into different forms for
comparison.
[0053] In one embodiment, the equation presented in Table A may be
used to calculate a similarity score indicating the strength of
similarity between a query pair. The query pair may include a given
query q and a comparison query MODS(q), either of which may be
written according to one or more Chinese writing systems. MODS(q)
may represent a converted query. In alternative embodiments, both q
and MODS(q) may be converted to the same form for comparison, or
MODS(q) is converted into a form for comparison with q. MODS(q) may
represent a related query that is identified as a potential
substitute for the user query q. When MODS(q) has good similarity
score with the user original query q, MODS(q) may be used as a
search keyword for fetching advertisements. MODS(q) may also be
referred to as a rewritten query. Both user original query q and
MODS(q) may be converted to the same form for comparison. The
equation in Table A makes use of a subset of the conversion forms
302 and the comparison features 402 that are discussed above. In
alternative embodiments, different conversion forms or comparison
features may be utilized to generate a similarity score. Those of
skill in the art recognize that the equation illustrated in Table A
is merely exemplary and may be modified so as to provide for the
calculation of a similarity score for multiple writing systems. A
formula may be optimized based on the source of the query, because
queries from Taiwan may be different from queries from Hong Kong.
Accordingly, the conversions, comparisons, and weights may be
modified for different types of queries. TABLE-US-00001 TABLE A LM
.times. .times. 1 .times. Score .function. ( q , MODS .function. (
q ) ) = 2.542 - 0.1778 . .times. pq .times. .times. 12 .times. min
.times. ( q , MODS .function. ( q ) ) - 0.3316 .times. levroman
.function. ( q , MODS .function. ( q ) ) - 1.064 . .times.
agreechar .times. ( q , MODS .function. ( q ) ) + 1.098 .times.
dlevpynchar .function. ( q , MODS .function. ( q ) ) - 0.2432 .
.times. q .times. .times. 1 .times. bidded .times. ( q , MODS
.function. ( q ) ) + 0.3486 . .times. wordr .function. ( q , MODS
.function. ( q ) ) + 0.2487 . .times. q .times. .times. 2 .times.
hasroman .times. ( q , MODS .function. ( q ) ) - 0.1284 . .times.
pq .times. .times. 12 .times. max .function. ( q , MODS .function.
( q ) ) - 0.4667 . .times. levtaiwanchar ( q , MODS .times. ( q ) )
- 0.2875 . .times. lengthdiffn .times. ( q , MODS .function. ( q )
) - 0.0006 . .times. entropy .times. .times. 21 .times. min .times.
( q , MODS .function. ( q ) ) - 0.2875 .times. lengthsubtmin
.times. .times. GT .times. .times. 3 .times. ( q , MODS .times. ( q
) ) ##EQU1##
[0054] According to the equation presented in Table A, q represents
a given query written according to one or more Chinese writing
systems and MODS(q) represents a query selected from a candidate
set of potential queries related to query Q. Alternatively, query q
may be referred to as query q1 and MODS(q) may be referred to as
query q2 or q'. The initial number before each feature is a weight
that may be used to emphasize or deemphasize features. The
exemplary features utilized in the equation presented in Table A
are described below.
[0055] Pq12min may be a function for calculating the query
substitution probability of query q1 following query q2 in a log of
user query sessions, such as from the search log database 112. The
search log database 112 may identify the order of the one or more
queries submitted by the user, for example, to provide an
indication of how the user refined a query, how the user rewrote a
query, how the user utilized one or more alternate writing systems
of a language with multiple writings systems to express a query Q,
etc. When queries q1 and q2 follow one another in a search log
database 112, it may be an indication that they are similar because
q2 may be a refinement of q1. According to one embodiment, the
pq12min function calculates a query substitution probability of a
given query q1 following a given query q2, and may also be used to
calculate a unit substitution of a unit u following a given unit
u'. In one embodiment, pq12min=prob(U_i->U_i'|U_i)/max_j
prob(U_i->U_j|U_i), where U_i is q1 or its units, U_i' is
possible U_i substitutions, and U_j is q2 or its units. For query
suggestions, pq12min may be the normalized probability of q2 as
q1's substitution. In one embodiment, a normalized probability is
computed of the units in q1 substituted by corresponding units in
q2, and take their minimum as pq12min.
[0056] Levroman is a comparison using the roman characters of a
query, such as with conversion form 322, which removes Chinese
characters. For each query all non-roman characters may be removed,
but spaces are left in the query. The roman character parts are
changed into arrays. Each roman character is an element in the
array, including any spaces. The Levenshtein distance is measured
between the two arrays. In the case that neither q1 nor q2 has
roman characters, levroman is set to 0. In the case that one of q1
or q2 has roman characters but the other does not have roman
characters, levroman is set to 1. As an example, consider a first
query q1= map" and a second query q2= map." The first query does
not include a space before map, but the second query includes a
space before map. After the Chinese characters are removed, the
queries are converted into arrays, in which q1 is represented as
the array: ##STR1## and q2 is represented as the array: ##STR2##
The Levenshtein distance between the two arrays is one because of
the space in the first element of q2. Accordingly because there are
four elements, the Levenshtein distance may be represented as
1/4=0.25 and levroman is 0.25 for this query pair.
[0057] Agreechar may relate to character agreement without removing
a space regardless of the order of characters. Agreechar may be
similar to wordr discussed below, except it is for the character
level rather than the word level. In one embodiment, agreechar is
the proportion of unique characters in common between a query pair,
such as: agreechar = C q .times. .times. 1 C q .times. .times. 2 C
q .times. .times. 1 C q .times. .times. 2 , ##EQU2## in which
C.sub.q1 is the set of unique characters (including space) in q1,
and C.sub.q2 is the set of unique characters (including space) in
q2. In the levroman example, q1 and q2 have 7 unique characters in
total, which are "m", "a", "p" and a space. Query q1 and q2 share 5
unique characters, which are "m", "a" and "p". Therefore, agreechar
is 0.714 (calculated by 5/7) for this query pair.
[0058] Wordr is similar to agreechar except is matches words rather
than characters. The queries are separated into words, segments, or
units as described above. The percentage of unique words not in
common is determined for wordr. In other words, wordr=1-proportion
of unique words in common, such as wordr = 1 - w q .times. .times.
1 w q .times. .times. 2 w q .times. .times. 1 w q .times. .times. 2
, ##EQU3## in which w.sub.q1 is the set of unique words in q1, and
w.sub.q2 is the set of unique words (including space) in q2. In the
previous example of levroman, map" is segmented into two words and
"map" and map" is segmented into two words and "map". There are
three unique words and one of them is common between q1 and q2, so
wordr is 1-1/3=0.666.
[0059] Dlevpynchar utilizes the complete pinyin without tone 312
conversion form. The first query q1 and second query q2 first have
a common domain removed and each roman character (including spaces)
are kept, while each Chinese character is converted into pinyin
without tone. The queries are then transformed into arrays. Each
roman character is an element in the array and each Chinese
character's pinyin without tone is an element in the array. The
Levenshtein distance is then measured. In the example described
above, when query q1 map" and query q2 map" where there is no space
in query q1, but there is a space in query q2. The first query q1
is converted into an array: ##STR3## The second query q2 is
converted into an array: ##STR4## The Levenshtein distance is
computed between the two arrays to be 1/6=0.167, which may also be
the dlevpynchar value for this query pair.
[0060] Q1bidded is 1 if q1 is bidded and q1bidded is 0 if q1 is not
bidded. When q1 is a user query and q1 is bidded, it may mean that
an advertiser chooses q1 as a keyword for the advertisements they
want to show. This bidding process may also identify a cost they
would like to pay if web searchers click the ads fetched by the
keyword. When q1 is not bidded that may mean there are no matched
keywords in the advertisement database. Therefore, a query
identifying system may identify a related query (e.g. MODS(q)) to
substitute for the user query.
[0061] Q2hasroman is 1 if q2 contains any roman characters, but not
including any spaces. Q2hasroman is 0 if q2 does not contain any
roman characters. The queries that are analyzed may be from Chinese
search engine or in a search engine that receives Chinese related
queries. A Chinese search engine may receive queries with roman
characters due to the usage of roman characters in Chinese and the
popularity of roman character based languages such as English. The
Chinese characters and roman characters maybe processed
differently. For example, a Chinese character may be converted into
Pinyin for a similarity comparison, while Roman characters are not
converted into Pinyin. Accordingly, a similarity score computation
may be adjusted based on the presence of Roman characters.
[0062] Pq21max may be a function for calculating the query
substitution probability of query q1 following query q2 in a log of
user query sessions, such as from the search log database 112. In
one embodiment, pq21max=prob(U_i->U_i'|U_i')/max_j
prob(U_i->U_j|U_j), where U_i is q1 or its units, U_i' is
possible U_i substitutions, and U_j is q2 or its units. The
normalized probability may be calculated according above equation
for each unit pair in the query pair and the maximum is used as
pq21max.
[0063] Levtaiwanchar utilizes the removal of roman characters 324
conversion. In particular, all non-Chinese characters are removed
and the remaining Chinese character parts are put into an array
where each Chinese character is an element in the array. The
Levenshtein distance is measured between the two arrays. When
neither query q1 nor query q2 includes Chinese characters,
levtaiwanchar is 0. When only one of q1 or q2 has Chinese
characters levtaiwanchar is 1. In the example described above, when
query q1= map" and query q2= where there is no space in query q1,
but there is a space in query q2. The first query q1 is converted
into an array:
Query q2 becomes the array:
Accordingly, the Levenshtein distance is computed between the two
arrays, which is 1/3=0.333 and levtaiwanchar is 0.333 for this
query pair.
[0064] Lengthdiffn is the length difference in characters between
q1 and q2, which is normalized by their maximum length in
characters. In one embodiment, lengthdiffn is: lengthdiffn = abs
.function. ( q .times. .times. 1 - q .times. .times. 2 ) max
.function. ( q .times. .times. 1 , q .times. .times. 2 ) .
##EQU4##
[0065] Entropy21min is an uncertainty that may be associated with a
similarity between q1 and q2. For a whole query substitution,
entropy .times. .times. 21 .times. min = i .times. ( freq
.function. ( q 1 .fwdarw. q 2 i ) / freq .function. ( q 2 i ) )
.times. log .function. ( ( freq .function. ( q 1 .fwdarw. q 2 i ) /
freq .function. ( q 2 i ) ) ) , ##EQU5## where i is the number of
possible q1 query substitutions with q2. For unit substitution,
entropy .times. .times. 21 .times. min = min j .times. i .times. (
freq .function. ( q 1 .times. j .fwdarw. q 2 .times. j i ) / freq (
q 2 .times. j i .times. ) ) i .times. log .function. ( ( freq
.function. ( q 1 .times. j .fwdarw. q 2 .times. j i ) / freq
.function. ( q 2 .times. j i ) ) ) , ##EQU6## where j is the number
of unit substitution between q1 and q2, and i is the number of
possible q1j's unit substitutions.
[0066] LenthsubtminGT3 utilizes a substitution of characters. For
query suggestions, lengthsubstminGT3 is 1 if the minimum length of
q1 and q2 is less than 3 in characters. Otherwise,
lengthsubstminGT3 is 0. For unit suggestions, lengthsubstminGT3 is
1 if the minimum length of any of the substitution units in
characters is greater than 3. Otherwise, lengthsubstminGT3 is 0.
Query suggestion may refer to a generation of related queries based
on an original user query. The user query may be broken into units
as described above. A related unit may be found for each unit and
combined to form a related query. For example, when a user enters a
query for "New York hotel," it may be split into two units "New
York" and "hotel." "New York" may be rewritten to a related query
"Manhattan" and "hotel" may be rewritten to "motel." Accordingly,
"Manhattan motel" may be a related candidate query for an original
user query of "New York hotel."
[0067] As described, the equation in Table A and the corresponding
features that are used to calculate a similarity score in the
calculator 208 are exemplary. Alternatively, a different equation,
different weights and different features may be utilized to compute
a similarity score. For example, the edit distance may be computed
for each of the comparison forms 302 and averaged to become the
similarity score. Alternatively, weights may be added to each
converted form, or additional comparison features 402 may be
used.
[0068] In one embodiment, the equation that is used to determine
the similarity score, such as the equation in Table A, is analyzed
by comparing with a human or editorial control set. The editorial
control set may include a human review of the similarity scores for
pairs of queries to determine an accuracy of the equation used for
calculating the similarity score. In one embodiment, the human
review may be used to optimize the equation that calculates the
similarity score. Human editors may label query pairs with a
relevance score. The relevance score may be used as a training
label for the similarity score calculation, such as for the weights
used in the equation in Table A. The editorial score may be a
response variable and/or a dependent variable. The model may be
fitted using linear regression.
[0069] FIG. 5 is an illustration for identifying related queries.
In block 502, a user query is received. The user query may be
Chinese-related and include at least one Chinese character. The
user query may be received by a search engine 102. The user query
may be compared with a selected candidate set of queries or search
keywords in block 504. The candidate set may be selected form the
search log database 112. In one embodiment, the candidate set may
be chosen based on an initial comparison of similarity with the
user query. The user query and/or the candidate set of queries may
be converted into a different form or format for comparison, such
as the conversion forms 302. The user query and a member of the
candidate set are compared in block 508. In block 510, a similarity
score is calculated to measure a similarity between the user query
and the member of the candidate set. The similarity score may be
based on utilizing any of the comparison features 402 for comparing
a converted form of the user query with a converted form of the
member. In block 512, another comparison at block 508 occurs for
another member from the candidate set and continues until all
members of the candidate set have been compared and have a
similarity score. In block 514, the similarity scores between the
candidate set may be reviewed to identify the member of the
candidate set with the closest similarity score to the user query.
The identification of a similar member, such as a similar search
keyword, may be used to identify which advertisements to display
for sponsored searching.
[0070] Referring to FIG. 6, an illustrative embodiment of a general
computer system is shown and is designated 600. The user device
106, ad server 103, the search engine 102, the search log database
112, the data source 113, the unit dictionary 116, and/or the
language analyzer 104 may be a computer or computing devices, such
as the computer system 600 or any of its components. The computer
system 600 can include a set of instructions that can be executed
to cause the computer system 600 to perform any one or more of the
methods or computer based functions disclosed herein. The computer
system 600 may operate as a standalone device or may be connected,
e.g., using a network, to other computer systems or peripheral
devices.
[0071] In a networked deployment, the computer system may operate
in the capacity of a server or as a client user computer in a
server-client user network environment, or as a peer computer
system in a peer-to-peer (or distributed) network environment. The
computer system 600 can also be implemented as or incorporated into
various devices, such as a personal computer (PC), a tablet PC, a
set-top box (STB), a personal digital assistant (PDA), a mobile
device, a palmtop computer, a laptop computer, a desktop computer,
a communications device, a wireless telephone, a land-line
telephone, a control system, a camera, a scanner, a facsimile
machine, a printer, a pager, a personal trusted device, a web
appliance, a network router, switch or bridge, or any other machine
capable of executing a set of instructions (sequential or
otherwise) that specify actions to be taken by that machine. In a
particular embodiment, the computer system 600 can be implemented
using electronic devices that provide voice, video or data
communication. Further, while a single computer system 600 is
illustrated, the term "system" shall also be taken to include any
collection of systems or sub-systems that individually or jointly
execute a set, or multiple sets, of instructions to perform one or
more computer functions.
[0072] As illustrated in FIG. 6, the computer system 600 may
include a processor 602, e.g., a central processing unit (CPU), a
graphics processing unit (GPU), or both. The processor 602 may be a
component in a variety of systems. For example, the processor 602
may be part of a standard personal computer or a workstation. The
processor 602 may be one or more general processors, digital signal
processors, application specific integrated circuits, field
programmable gate arrays, servers, networks, digital circuits,
analog circuits, combinations thereof, or other now known or later
developed devices for analyzing and processing data. The processor
602 may implement a software program, such as code generated
manually (i.e., programmed).
[0073] The computer system 600 may include a memory 604 that can
communicate via a bus 608. The memory 604 may be a main memory, a
static memory, or a dynamic memory. The memory 604 may include, but
is not limited to computer readable storage media such as various
types of volatile and non-volatile storage media, including but not
limited to random access memory, read-only memory, programmable
read-only memory, electrically programmable read-only memory,
electrically erasable read-only memory, flash memory, magnetic tape
or disk, optical media and the like. In one embodiment, the memory
604 includes a cache or random access memory for the processor 602.
In alternative embodiments, the memory 604 is separate from the
processor 602, such as a cache memory of a processor, the system
memory, or other memory. The memory 604 may be an external storage
device or database for storing data. Examples include a hard drive,
compact disc ("CD"), digital video disc ("DVD"), memory card,
memory stick, floppy disc, universal serial bus ("USB") memory
device, or any other device operative to store data. The memory 604
is operable to store instructions executable by the processor 602.
The functions, acts or tasks illustrated in the figures or
described herein may be performed by the programmed processor 602
executing the instructions stored in the memory 604. The functions,
acts or tasks are independent of the particular type of
instructions set, storage media, processor or processing strategy
and may be performed by software, hardware, integrated circuits,
firm-ware, micro-code and the like, operating alone or in
combination. Likewise, processing strategies may include
multiprocessing, multitasking, parallel processing and the
like.
[0074] As shown, the computer system 600 may further include a
display unit 614, such as a liquid crystal display (LCD), an
organic light emitting diode (OLED), a flat panel display, a solid
state display, a cathode ray tube (CRT), a projector, a printer or
other now known or later developed display device for outputting
determined information. The display 614 may act as an interface for
the user to see the functioning of the processor 602, or
specifically as an interface with the software stored in the memory
604 or in the drive unit 606.
[0075] Additionally, the computer system 600 may include an input
device 616 configured to allow a user to interact with any of the
components of system 600. The input device 616 may be a number pad,
a keyboard, or a cursor control device, such as a mouse, or a
joystick, touch screen display, remote control or any other device
operative to interact with the system 600.
[0076] In a particular embodiment, as depicted in FIG. 6, the
computer system 600 may also include a disk or optical drive unit
606. The disk drive unit 606 may include a computer-readable medium
610 in which one or more sets of instructions 612, e.g. software,
can be embedded. Further, the instructions 612 may embody one or
more of the methods or logic as described herein. In a particular
embodiment, the instructions 612 may reside completely, or at least
partially, within the memory 604 and/or within the processor 602
during execution by the computer system 600. The memory 604 and the
processor 602 also may include computer-readable media as discussed
above.
[0077] The present disclosure contemplates a computer-readable
medium that includes instructions 612 or receives and executes
instructions 612 responsive to a propagated signal, so that a
device connected to a network 620 can communicate voice, video,
audio, images or any other data over the network 620. Further, the
instructions 612 may be transmitted or received over the network
620 via a communication port 618. The communication port 618 may be
a part of the processor 602 or may be a separate component. The
communication port 618 may be created in software or may be a
physical connection in hardware. The communication port 618 is
configured to connect with a network 620, external media, the
display 614, or any other components in system 600, or combinations
thereof. The connection with the network 620 may be a physical
connection, such as a wired Ethernet connection or may be
established wirelessly as discussed below. Likewise, the additional
connections with other components of the system 600 may be physical
connections or may be established wirelessly.
[0078] The network 620 may include wired networks, wireless
networks, or combinations thereof. The wireless network may be a
cellular telephone network, an 802.11, 802.16, 802.20, or WiMax
network. Further, the network 620 may be a public network, such as
the Internet, a private network, such as an intranet, or
combinations thereof, and may utilize a variety of networking
protocols now available or later developed including, but not
limited to TCP/IP based networking protocols.
[0079] While the computer-readable medium is shown to be a single
medium, the term "computer-readable medium" includes a single
medium or multiple media, such as a centralized or distributed
database, and/or associated caches and servers that store one or
more sets of instructions. The term "computer-readable medium"
shall also include any medium that is capable of storing, encoding
or carrying a set of instructions for execution by a processor or
that cause a computer system to perform any one or more of the
methods or operations disclosed herein.
[0080] In a particular non-limiting, exemplary embodiment, the
computer-readable medium can include a solid-state memory such as a
memory card or other package that houses one or more non-volatile
read-only memories. Further, the computer-readable medium can be a
random access memory or other volatile re-writable memory.
Additionally, the computer-readable medium can include a
magneto-optical or optical medium, such as a disk or tapes or other
storage device to capture carrier wave signals such as a signal
communicated over a transmission medium. A digital file attachment
to an e-mail or other self-contained information archive or set of
archives may be considered a distribution medium that is a tangible
storage medium. Accordingly, the disclosure is considered to
include any one or more of a computer-readable medium or a
distribution medium and other equivalents and successor media, in
which data or instructions may be stored.
[0081] In an alternative embodiment, dedicated hardware
implementations, such as application specific integrated circuits,
programmable logic arrays and other hardware devices, can be
constructed to implement one or more of the methods described
herein. Applications that may include the apparatus and systems of
various embodiments can broadly include a variety of electronic and
computer systems. One or more embodiments described herein may
implement functions using two or more specific interconnected
hardware modules or devices with related control and data signals
that can be communicated between and through the modules, or as
portions of an application-specific integrated circuit.
Accordingly, the present system encompasses software, firmware, and
hardware implementations.
[0082] In accordance with various embodiments of the present
disclosure, the methods described herein may be implemented by
software programs executable by a computer system. Further, in an
exemplary, non-limited embodiment, implementations can include
distributed processing, component/object distributed processing,
and parallel processing. Alternatively, virtual computer system
processing can be constructed to implement one or more of the
methods or functionality as described herein.
[0083] Although the present specification describes components and
functions that may be implemented in particular embodiments with
reference to particular standards and protocols, the invention is
not limited to such standards and protocols. For example, standards
for Internet and other packet switched network transmission (e.g.,
TCP/IP, UDP/IP, HTML, HTTP) represent examples of the state of the
art. Such standards are periodically superseded by faster or more
efficient equivalents having essentially the same functions.
Accordingly, replacement standards and protocols having the same or
similar functions as those disclosed herein are considered
equivalents thereof.
[0084] The illustrations of the embodiments described herein are
intended to provide a general understanding of the structure of the
various embodiments. The illustrations are not intended to serve as
a complete description of all of the elements and features of
apparatus and systems that utilize the structures or methods
described herein. Many other embodiments may be apparent to those
of skill in the art upon reviewing the disclosure. Other
embodiments may be utilized and derived from the disclosure, such
that structural and logical substitutions and changes may be made
without departing from the scope of the disclosure. Additionally,
the illustrations are merely representational and may not be drawn
to scale. Certain proportions within the illustrations may be
exaggerated, while other proportions may be minimized. Accordingly,
the disclosure and the figures are to be regarded as illustrative
rather than restrictive.
[0085] One or more embodiments of the disclosure may be referred to
herein, individually and/or collectively, by the term "invention"
merely for convenience and without intending to voluntarily limit
the scope of this application to any particular invention or
inventive concept. Moreover, although specific embodiments have
been illustrated and described herein, it should be appreciated
that any subsequent arrangement designed to achieve the same or
similar purpose may be substituted for the specific embodiments
shown. This disclosure is intended to cover any and all subsequent
adaptations or variations of various embodiments. Combinations of
the above embodiments, and other embodiments not specifically
described herein, will be apparent to those of skill in the art
upon reviewing the description.
[0086] The Abstract of the Disclosure is provided to comply with 37
C.F.R. .sctn. 1.72(b) and is submitted with the understanding that
it will not be used to interpret or limit the scope or meaning of
the claims. In addition, in the foregoing Detailed Description,
various features may be grouped together or described in a single
embodiment for the purpose of streamlining the disclosure. This
disclosure is not to be interpreted as reflecting an intention that
the claimed embodiments require more features than are expressly
recited in each claim. Rather, as the following claims reflect,
inventive subject matter may be directed to less than all of the
features of any of the disclosed embodiments. Thus, the following
claims are incorporated into the Detailed Description, with each
claim standing on its own as defining separately claimed subject
matter.
[0087] The above disclosed subject matter is to be considered
illustrative, and not restrictive, and the appended claims are
intended to cover all such modifications, enhancements, and other
embodiments, which fall within the true spirit and scope of the
present invention. Thus, to the maximum extent allowed by law, the
scope of the present invention is to be determined by the broadest
permissible interpretation of the following claims and their
equivalents, and shall not be restricted or limited by the
foregoing detailed description. While various embodiments of the
invention have been described, it will be apparent to those of
ordinary skill in the art that many more embodiments and
implementations are possible within the scope of the invention.
Accordingly, the invention is not to be restricted except in light
of the attached claims and their equivalents.
* * * * *
References