U.S. patent application number 11/675057 was filed with the patent office on 2007-10-04 for group customized search.
Invention is credited to Darya Gennadievna Antonova, Bryan Horling, Adam Klein, Bilgehan Uygar Oztekin, Ye Zhou.
Application Number | 20070233671 11/675057 |
Document ID | / |
Family ID | 38335819 |
Filed Date | 2007-10-04 |
United States Patent
Application |
20070233671 |
Kind Code |
A1 |
Oztekin; Bilgehan Uygar ; et
al. |
October 4, 2007 |
Group Customized Search
Abstract
A computer-implemented method associates a plurality of groups
with a user. Each group may have at least one profile. The method
also includes receiving a search query from the user and
identifying information items associated with the search query. The
method computes adjusted scores for the information items based on
the groups' profiles, and ranks the information items accordingly
before providing the ranked information items to the user.
Alternately, a computer-implemented method associates a group
having a plurality of profiles with a user. The method also
includes receiving a search query from the user and identifying
information items associated with the search query. The method
computes adjusted scores for the information items based on the
group's profiles, and ranks the information items accordingly
before providing the ranked information items to the user.
Inventors: |
Oztekin; Bilgehan Uygar;
(Mountain View, CA) ; Horling; Bryan; (Tuckahoe,
NY) ; Antonova; Darya Gennadievna; (Mountain View,
CA) ; Zhou; Ye; (Forest Hills, NY) ; Klein;
Adam; (San Francisco, CA) |
Correspondence
Address: |
MORGAN, LEWIS & BOCKIUS, LLP.
2 PALO ALTO SQUARE
3000 EL CAMINO REAL
PALO ALTO
CA
94306
US
|
Family ID: |
38335819 |
Appl. No.: |
11/675057 |
Filed: |
February 14, 2007 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11394620 |
Mar 30, 2006 |
|
|
|
11675057 |
Feb 14, 2007 |
|
|
|
Current U.S.
Class: |
1/1 ;
707/999.005; 707/E17.109 |
Current CPC
Class: |
G06F 16/9535
20190101 |
Class at
Publication: |
707/005 |
International
Class: |
G06F 17/30 20060101
G06F017/30 |
Claims
1. A computer-implemented method, comprising: associating a
plurality of groups with a user; accessing at least one profile for
each of the plurality of groups associated with the user; receiving
a search query from the user; identifying a plurality of
information items associated with the search query; computing an
adjusted score based on the profiles of the groups; ranking the
information items in accordance with the adjusted score; and
providing the ranked information items to the user.
2. The computer-implemented method of claim 1, further comprising:
identifying a website associated with the user; determining a
website profile of the website; determining a determined group in
accordance with the website profile; associating the determined
group with the user.
3. The computer-implemented method of claim 1, further comprising:
assigning profile confidence values to at least one respective
profile of at least one respective group of the groups; and
computing the adjusted score in accordance with the profile
confidence value.
4. The computer-implemented method of claim 3, wherein assigning
the profile confidence value further comprises assigning the
profile confidence value in accordance with a query type of the
search query.
5. The computer-implemented method of claim 1, further comprising:
assigning group confidence values to respective groups of the
plurality of the groups; and computing the adjusted score in
accordance with the assigned group confidence values.
6. The computer-implemented method of claim 5, wherein assigning
the group confidence value further comprises: evaluating a
coherence of a respective group of the plurality of groups;
evaluating a type of the respective group; and assigning the group
confidence value in accordance with the coherence and the type of
the respective group.
7. The computer-implemented method of claim 6, wherein evaluating
the type of the respective group further comprises: if the
respective group associated with the user is an internet address
type group, determining a traffic volume of an internet address or
internet address range associated with the user; and assigning the
group confidence value in accordance with the traffic volume.
8. The computer-implemented method of claim 1, wherein associating
the plurality of groups with the user further comprises:
identifying an internet address of the user; determining if the
internet address is associated with an internet address group; and
associating the internet address group with the user in accordance
with the internet address of the user.
9. The computer-implemented method of claim 8, wherein determining
if the internet address is associated with the internet address
group further comprises: evaluating a coherence of the internet
address group; evaluating a number of other users associated with
the internet address group; and if the coherence is greater than a
predetermined coherence and the number is greater than a
predetermined value determining that the internet address is
associated with the internet address group.
10. The computer-implemented method of claim 1, wherein associating
the plurality of groups with the user further comprises:
identifying a location of the user; determining if the location is
associated with a location group; and associating the location
group with the user in accordance with the location of the
user.
11. The computer-implemented method of claim 1, wherein associating
the plurality of groups with the user further comprises:
identifying a history of the user; determining if the history is
associated with a respective group; and associating the respective
group with the user in accordance with the history of the user.
12. The computer-implemented method of claim 11, wherein
identifying the history of the user further comprises identifying
at least one of a website visited by the user, a query submitted by
the user, and user interaction by the user with search results
associated with the query.
13. The computer-implemented method of claim 1, wherein associating
the plurality of groups with the user further comprises identifying
a respective group corresponding to a user profile of the user.
14. The computer-implemented method of claim 1, wherein associating
the plurality of groups with the user further comprises:
identifying a domain name associated with the user; determining if
domain name is associated with a domain name group; and associating
the domain name group with the user in accordance with the domain
name associated with the user.
15. The computer-implemented method of claim 1, wherein computing
an adjusted score based on the profiles of the groups further
comprises: assigning a profile confidence value to each profile of
the groups; assigning a group confidence value to each of the
plurality of the groups; calculating the adjusted score based on
the profile confidence value of each profile of the groups and the
group confidence value of each of the plurality of the groups.
16. A computer system for managing time and place information,
comprising: memory; one or more processors; one or more programs
stored in the memory and configured for execution by the one or
more processors, the one or more programs including: instructions
for associating a plurality of groups with a user; instructions for
accessing at least one profile for each of the plurality of groups
associated with the user; instructions for receiving a search query
from the user; instructions for identifying a plurality of
information items associated with the search query; instructions
for computing an adjusted score based on the profiles of the
groups; instructions for ranking the information items in
accordance with the adjusted score; and instructions for providing
the ranked information items to the user.
17. A computer program product for use in conjunction with a
computer system, the computer program product comprising a computer
readable storage medium and a computer program mechanism embedded
therein, the computer program mechanism comprising: instructions
for associating a plurality of groups with a user; instructions for
accessing at least one profile for each of the plurality of groups
associated with the user; instructions for receiving a search query
from the user; instructions for identifying a plurality of
information items associated with the search query; instructions
for computing an adjusted score based on the profiles of the
groups; instructions for ranking the information items in
accordance with the adjusted score; and instructions for providing
the ranked information items to the user.
18. A computer-implemented method, comprising: associating a first
group having a plurality of profiles with a user; receiving a
search query from the user; identifying a plurality of information
items associated with the search query; computing an adjusted score
based, at least in part, on the profiles of the first group;
ranking the information items in accordance with the adjusted
score; and providing the ranked information items to the user.
19. The computer-implemented method of claim 18, further
comprising: identifying a website associated with the user;
determining a website profile of the website; determining a group
in accordance with the website profile; associating the group with
the user.
20. The computer-implemented method of claim 18, further
comprising: assigning profile confidence values to at least one
respective profile of the plurality of profiles; and computing the
adjusted score in accordance with the profile confidence
values.
21. The computer-implemented method of claim 20, wherein assigning
the profile confidence values further comprises assigning the
profile confidence values in accordance with a query type of the
search query.
22. The computer-implemented method of claim 18, further
comprising: assigning a respective group confidence value to a
respective group and computing the adjusted score in accordance
with the respective group confidence value.
23. The computer-implemented method of claim 22, wherein assigning
the group confidence value further comprises: evaluating a
coherence of the respective group; evaluating a type of the
respective group; and assigning the group confidence value in
accordance with the coherence and the type of the respective
group.
24. The computer-implemented method of claim 23, wherein evaluating
the type of a respective group further comprises: if the respective
group associated with the user is an internet address type group,
determining a traffic volume of an internet address or internet
address range associated with the user; and assigning the group
confidence value in accordance with the traffic volume.
25. The computer-implemented method of claim 18, further
comprising: identifying an internet address of the user;
determining if the internet address is associated with an internet
address group; and associating the internet address group with the
user in accordance with the internet address of the user.
26. The computer-implemented method of claim 25, wherein
determining if the internet address is associated with the internet
address group further comprises: evaluating a coherence of the
internet address group; evaluating a number of other users
associated with the internet address group; and if the coherence is
greater than a predetermined coherence and the number is greater
than a predetermined value determining that the internet address is
associated with the internet address group.
27. The computer-implemented method of claim 18, further
comprising: identifying a location of the user; determining if the
location is associated with a location group; and associating the
location group with the user in accordance with the location of the
user.
28. The computer-implemented method of claim 18, further
comprising: identifying a history of the user; determining if the
history is associated with a respective group; and associating the
respective group with the user in accordance with the history of
the user.
29. The computer-implemented method of claim 28, wherein
identifying the history of the user further comprises identifying
at least one of a website visited by the user, a search query
submitted by the user, and user interaction by the user with search
results associated with the search query.
30. The computer-implemented method of claim 18, further comprises
identifying a group corresponding to a user profile of the
user.
31. The computer-implemented method of claim 18, further
comprising: identifying a domain name associated with the user;
determining if domain name is associated with a domain name group;
and associating the domain name group with the user in accordance
with the domain name associated with the user.
32. The computer-implemented method of claim 18, wherein computing
an adjusted score based, at least in part, on the profiles of the
first group further comprises: assigning a profile confidence value
to each profile; assigning a group confidence value to respective
groups including the first group; calculating the adjusted score
based on the profile confidence value of each profile and the group
confidence value of each of the respective groups.
33. A computer system for managing time and place information,
comprising: memory; one or more processors; one or more programs
stored in the memory and configured for execution by the one or
more processors, the one or more programs including: instructions
for associating a first group having a plurality of profiles with a
user; instructions for receiving a search query from the user;
instructions for identifying a plurality of information items
associated with the search query; instructions for computing an
adjusted score based on the profiles of the first group;
instructions for ranking the information items in accordance with
the adjusted score; and instructions for providing the ranked
information items to the user.
34. A computer program product for use in conjunction with a
computer system, the computer program product comprising a computer
readable storage medium and a computer program mechanism embedded
therein, the computer program mechanism comprising: instructions
for associating a first group having a plurality of profiles with a
user; instructions for receiving a search query from the user;
instructions for identifying a plurality of information items
associated with the search query; instructions for computing an
adjusted score based on the profiles of the first group;
instructions for ranking the information items in accordance with
the adjusted score; and instructions for providing the ranked
information items to the user.
Description
RELATED APPLICATIONS
[0001] This application is a Continuation in Part of U.S. Patent
Application Ser. No. 11/394,620, filed Mar. 30, 2006, entitled
"Website Flavored Search," which is hereby incorporated by
reference.
[0002] This application is related to U.S. patent application Ser.
No. 10/890,854, filed Jul. 13, 2004, entitled "Personalization of
Placed Content Ordering in Search Results," which is hereby
incorporated by reference.
FIELD OF THE INVENTION
[0003] The present invention relates generally to the field of a
search engine in a computer network system, in particular to a
system and method of customizing rankings of search results in
response to search queries submitted by members of one or more user
groups.
BACKGROUND OF THE INVENTION
[0004] Search engines are powerful tools for locating and
retrieving documents from the Internet (or an intranet).
Traditionally, the search results produced by a search engine are
independent of the user who issued the search query. For example,
the search engine generates the same search result for the search
query "apple" irrespective of whether the search query is from
users interested in Apple.RTM. computers or the fruit malus
domestica. Clearly the search results returned for the search query
"apple" are likely to include some results of little interest to
these respective groups of users.
[0005] In view of the aforementioned, it would be desirable to have
a search engine that can customize its search results so as to
highlight information items in the search results that are most
likely to be of interest to the users who submit the search
queries. Further, it would be desirable for such a system to
operate without explicit input from a user with regard to the
user's personal preferences and interests, and for the system to
protect the privacy interests of its users.
SUMMARY
[0006] In some embodiments, a computer-implemented method
associates a plurality of groups with a user. Each group may have
at least one profile. The method also includes receiving a search
query from the user and identifying information items associated
with the search query. The method computes adjusted scores for the
information items based on the groups' profiles, and ranks the
information items accordingly before providing the ranked
information items to the user.
[0007] In some embodiments, a computer-implemented method
associates a group having a plurality of profiles with a user. The
method also includes receiving a search query from the user and
identifying information items associated with the search query. The
method computes adjusted scores for the information items based on
the group's profiles, and ranks the information items accordingly
before providing the ranked information items to the user.
[0008] Some embodiments may be implemented on either the client
side or the server side of a client-server network environment.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] The aforementioned features and advantages of the invention
as well as additional features and advantages thereof will be more
clearly understood hereinafter as a result of a detailed
description of preferred embodiments when taken in conjunction with
the drawings.
[0010] FIG. 1 is a block diagram of an exemplary distributed system
that includes a plurality of groups and clients requesting
information from an information server in accordance with some
embodiments.
[0011] FIG. 2 is a flow diagram of a process for generating a group
profile using search queries, search results and user
activities.
[0012] FIG. 3 is a block diagram of a process for updating a group
profile by merging an incremental group profile into the group
profile.
[0013] FIG. 4 is a prophetic example of a curve characterizing the
popularity distribution of search queries.
[0014] FIG. 5A is a block diagram illustrating how a website group
may be associated with multiple profiles in accordance with some
embodiments.
[0015] FIG. 5B is a block diagram illustrating how an internet
address group may be associated with multiple profiles in
accordance with some embodiments.
[0016] FIG. 5C is a block diagram illustrating how a group may be
associated with multiple profiles in accordance with some
embodiments.
[0017] FIG. 6A is a block diagram of an exemplary category map that
may be used for generating category-based group profiles in
accordance with some embodiments.
[0018] FIG. 6B is a block diagram of an exemplary data structure
that may be used for storing category-based group profiles in
accordance with some embodiments of the present invention.
[0019] FIG. 6C is a block diagram "snapshot" of an exemplary data
structure that may be used for storing groups and profiles
associated with a user at a particular point in time in accordance
with some embodiments.
[0020] FIG. 7 is a block diagram of an exemplary data structure
that may be used for storing term-based group profiles in
accordance with some embodiments.
[0021] FIG. 8 is a block diagram of an exemplary data structure
that may be used for storing link-based group profiles in
accordance with some embodiments.
[0022] FIG. 9 is a flow diagram of a process for generating
group-dependent search results using group profiles in accordance
with some embodiments.
[0023] FIG. 10 is a block diagram of exemplary data structures that
may be used for storing category-based, term-based, and link-based
boost factors for documents in the search results in accordance
with some embodiments.
[0024] FIG. 11 is a flow diagram of another process for generating
group-dependent search results using group profiles in accordance
with some embodiments.
[0025] FIG. 12 is a block diagram of an exemplary information
server in accordance with some embodiments.
[0026] FIG. 13 is a flow diagram of a process for providing a user
with customized group-dependent search results according to some
embodiments.
[0027] FIG. 14 is a flow diagram of another process for providing a
user with customized group-dependent search results according to
some embodiments.
[0028] Like reference numerals refer to corresponding parts
throughout the several views of the drawings.
DESCRIPTION OF EMBODIMENTS
[0029] FIG. 1 is a block diagram of an exemplary environment 100
for implementing some embodiments of the present invention. One or
more websites 102 and clients 103 can be connected to a
communication network 104. The communication network 104 can be
connected to an information server 106. The information server 106
may include a front end server 120, a search engine 122, a document
profiler 125, a group profiler 129, a search result ranker 126, a
document profile database 123, a content database 124, a search
history database 127, and a group profile database 128.
[0030] In some embodiments, the information server 106 contains a
subset or superset of the elements illustrated in FIG. 1. Although
FIG. 1 shows the information server 106 as a number of discrete
items, the figure is intended more as a functional description of
the various features which may be present in the information server
106 rather than a structural schematic of the various embodiments.
In practice, items shown separately could be combined and some
items could be further separated, as would be recognized by one of
ordinary skill in the art of designing such systems. For example,
the four different databases 123, 124, 127, and 128 shown
separately in the figure could be implemented by a single database
server. The actual number of computers constituting the information
server 106 and the allocation of features among the computers will
vary from one implementation to another, and may depend in part on
the amount of traffic that the information server 106 must handle
during peak usage periods as well as during average usage
periods.
[0031] A website 102 is typically a collection of webpages
associated with a domain name on the Internet. Each website (or
webpage) has a universal resource locator (URL) that uniquely
identifies the location of the website (or webpage) on the
Internet. Any visitor can visit the website by entering its URL in
a browser window. A website can be hosted by a web server
exclusively owned by the owner of the domain name or by an Internet
service provider wherein its web server manages multiple websites
associated with different domain names. For illustrative purposes,
the website 102 includes webpage 116, which may have an associated
search box. From the search box, a visitor to the webpage 114 can
search the website 102 or the entire Internet for relevant
information by entering a search query into the search box.
Depending on the context, the term "website" as used in this
document refers to a logical location (e.g., an Internet or
intranet location) identified by a URL, or it refers to a web
server hosting the website represented by the URL. For example,
some "websites" are distributed over multiple Internet or network
locations, but have a shared web server hosting those locations,
and in many situations it is logical to consider those network
locations to all be part of "a website."
[0032] A client 103 can be any of a number of devices (e.g., a
computer, an internet kiosk, a personal digital assistant, a cell
phone, a gaming device, a desktop computer, or a laptop computer)
and can include a client application 132, a client assistant 134,
and/or client memory 136. The client application 132 can be a
software application that permits a user to interact with the
client 103 and/or network resources to perform one or more tasks.
For example, the client application 132 can be a browser (e.g., the
computer program available under the trademark Firefox.RTM.) or
other type of application that permits a user to search for,
browse, and/or use resources (e.g., webpages and web services) at
the website 102 from the client 103 and/or accessible via the
communication network 104. The client assistant 134 can be a
software application that performs one or more tasks related to
monitoring or assisting a user's activities with respect to the
client application 132 and/or other applications. For instance, the
client assistant 134 assists a user at the client 103 with browsing
for resources (e.g., files) hosted by the website 102; processes
information (e.g., search results) received from the information
server 106; and monitors the user's activities on the search
results. In some embodiments the client assistant 134 is part of
the client application 132, available as a plug-in or extension to
the client application 132 (provided, for example, from various
online sources), while in other embodiments the client application
is a stand-alone program separate from the client application 132.
In some embodiments the client assistant 134 is embedded in one or
more webpages or other documents downloaded from one or more
servers, such as the information server 106. Client memory 136 can
store information such as webpages, documents received from the
information server 106, system information, and/or information
about a user.
[0033] The communication network 104 can be any wired or wireless
local area network (LAN) and/or wide area network (WAN), such as an
intranet, an extranet, or the Internet. It is sufficient that the
communication network 104 provide communication capability between
the websites 102, the clients 103 and the information server 106.
In some embodiments, the communication network 104 uses the
HyperText Transport Protocol (HTTP) to transport information using
the Transmission Control Protocol/Internet Protocol (TCP/IP). The
HTTP permits client computers to access various resources available
via the communication network 104. The various embodiments of the
invention, however, are not limited to the use of any particular
protocol. The term "resource" as used throughout this specification
refers to any piece of information or service that is accessible
via a URL and can be, for example, a webpage, a document, a
database, an image, a computational object, a search engine, or
other online information service.
[0034] In order to receive group-dependent search results, a user
from client 103 (for example) may send to a website 102 a request
for a webpage. The website responds by identifying the requested
webpage and returning it to the requesting client 103. The webpage
may include a document of interest to the user (e.g., a newspaper
article). The webpage may also include a search box (e.g., at or
near the top of the webpage). While or after browsing the content
of the webpage, the user may be interested in getting more
information. To do so, the user can enter a search query and submit
the search query to the website 102, a search engine, or the like.
The search query may include one or more query terms.
[0035] As noted above, many websites do not have a dedicated search
engine. Their search requests are actually handled by a third-party
search engine. In some embodiments, upon receipt of the search
query, the website 102 generates and sends a search request to the
information server 106. In some other embodiments, the client 103
generates and sends the search request directly to the information
server 106 without routing the request through the website 102. In
other embodiments, the user may choose to use the website of a
third party search engine directly. In any case, the search request
may include the search query and unique identifiers of one or more
of the following entities: the website 102 being viewed, a website
previously viewed by the user, the requesting user, and the
requesting client 103. The identifier for a website may be a URL of
a particular web page, or a prefix portion of the URL that
identifies the website or a portion of the website. The search
engine 122 or other portion of the information server 106 may
determine the appropriate portion of the URL to use for determining
a group associated with the user.
[0036] Within the information server 106, the front end server 120
is configured to handle a variety of requests from the websites 102
and the clients 103 via their respective connections with the
communication network 104. As shown in FIG. 1, the front end server
120 is connected to the search engine 122 and the search engine 122
is connected to the content database 124, respectively. The content
database 124 stores a large number of indexed documents retrieved
from different websites. Alternately, or in addition, the content
database 124 stores an index of documents stored at various
websites. In one embodiment, each indexed document is assigned a
page rank according to the document's link structure. The page rank
serves as a query-independent measure of the document's
importance.
[0037] The front end server 120 passes the search request onto the
search engine 122. The search engine 122 then communicates with the
content database 124 to select a plurality of information items
(e.g., documents) in response to the search request. The search
engine 122 assigns a generic ranking score to each information item
based on the item's page rank, the text associated with the item,
and the search query. For ease of discussion, information items
will often be referred to as "documents;" but it is to be
understood that information items need not be documents, and may
include other types or forms of information.
[0038] The search engine 122 is also connected to the document
profile database 123. The document profile database 123 may store a
document profile for each indexed document in the content database
124. Both the document profile database 123 and the content
database 124 are connected to the document profiler 125. For each
document in the content database 124, the document profiler
generates a document profile by analyzing the content of the
document and its link structure. The generation of document
profiles is independent of the operation of the search engine 122.
In some embodiments, the document profiler 125 is invoked to
generate a document profile whenever the information server 106
identifies a new document or a new version of an existing document
on the Internet. In other embodiments, the document profiler 125 is
invoked periodically to generate document profiles for all new
files identified during a predetermined time period. In some
embodiments, instead of being two separate entities, the document
profile database 123 and the content database 124 are merged
together so that a document and its associated profile can be
located by a single database query.
[0039] There is a connection from the search engine 122 to the
search result ranker 126. Through this connection, the search
engine 122 sends the identified documents and their associated
document profiles to the search result ranker 126. The search
result ranker 126 has a connection to the group profile database
128. Like the document profile database 123, the group profile
database 128 stores a large number of group profiles including
group profiles of one or more groups associated with a requesting
user. For example, the search result ranker 126 may use a group
profile associated with users of a website 102 (e.g., a website
currently or recently visited by the user) to convert the generic
ranking score of each identified document into a group-dependent
ranking score. The documents are then re-ordered in accordance with
their respective group-dependent ranking scores. Next, the search
result ranker 126 creates a search result in accordance with the
updated order of the documents. The search result includes multiple
document links, at least one for each document. The search result,
or a portion of the search result (e.g., information identifying
the top 10, 15 or 20 information items or documents), is returned
to the requesting client 103 and displayed to the user through the
client application 132. The user, after browsing the search result,
may click one or more document links in the search result to
download and view one or more documents identified by the search
result.
[0040] While the above description divided tasks among the search
engine 122, search result ranker 126 and front end server 120 in a
particular way, this particular division of tasks is exemplary, and
other divisions may be used in other embodiments. For instance, the
group profile (associated with the user from whom a search query is
received) may be transmitted with the search query to the search
engine 122, and the search engine 122 may use that information to
compute group specific document scores for ranking the search
results. In effect, this would merge the search result ranker 126
into the search engine 122. Alternately, an identifier of the group
profile may be transmitted with the search query to the search
engine 122. If the search engine 122 has a copy of the group
profile or has access to the group profile, it can then use that
information to compute group specific document scores. In yet other
embodiments, other divisions of tasks may be used.
[0041] An important aspect of the process of serving
group-dependent search results is the generation and maintenance of
the group profiles stored in the group profile database 128. A
group profile should reflect the interests of the users of the
associated group, and in many embodiments the group profile will be
unique to its associated group. For example, users of a consumer
electronics website should have a group profile that boosts
webpages related to electronic products while users of an on-line
grocery store should have a group profile that promotes webpages
related to food. Users of both the consumer electronics website and
the on-line grocery store may be associated with both groups.
[0042] In most embodiments, a group profile is not static, because
a static group profile is unlikely to result in the information
server 106 serving the most relevant search results to users of the
associated group. Instead, a group profile is updated from time to
time, (i.e., periodically) so as to re-align the group profile with
the current interest of the users associated with the group. While
some group profiles may remain virtually static for long periods of
time (e.g., for groups associated with a small, static population
of users, and/or users having interests focused on a very narrow
range of topics), many group profiles will vary over time as the
associated users change and as the users' interests vary over time.
In an exemplary embodiment, a respective group profile may be based
on dated information, with older information receiving lower
weightings than newer information when constructing a vector or
other representation of the group profile. For example, the
information for each successively older time period may be down
weighted by a predefined scaling factor, so that information from a
period that is more than N (e.g., a value between 5 and 20) periods
old has less than half the impact on the group profile as
information from the current period.
[0043] In some embodiments, the profile for a particular group may
include weights depending on the number of "clicks" or visits by
the users to a particular website, webpage, or set of websites
during a particular window in time. For example, if users
associated with a particular location group (e.g., Alexandria, Va.)
frequently visit the website of a local park (e.g. Mason Neck State
Park), the weight for the local park website within the local group
profile would be increased. In some embodiments, the weights in a
group profile may be associated with one or more half-lives or
other time modulations to allow the information server to provide
the most relevant and timely search results. For example, "clicks"
on the Mason Neck State Park website may be more frequent on
Fridays and Saturdays in the months of April through October than
at other times, so the weight for the Mason Neck Park website for
users associated with the Alexandria, Va. group might be higher for
a search done on a Friday in July than for a search done on a
Tuesday in December. In another example, users associated with a
consumer electronics group may "click" on information about Shakira
at a particularly high frequency at a particular time, but the
frequency of "clicks" may decay over a year, a week, a day, a
minute, or other time period. In some embodiments, the time decay
rate of "clicks" is stored in the form of half lives over several
different periods of time. A group profile can also be based on
characteristics other than user clicks. Other characteristics of
user behavior that can be used in a group profile include one or
more of the following: the length of time that a user interacts
with the website, the proportion of the website viewed by the user,
actions (in addition to clicks) taken by a user while visiting the
website (e.g., printing, bookmarking, cutting and pasting,
annotating), and a user's activity subsequent to the interaction
with the website.
[0044] There are similarities between a group profile and a user
profile, as described above. Both profiles can be used to finely
tune the search results generated by the search engine. Both need
information about at least one user's search history in order to
capture the user's dynamic search interest. But there are also
significant differences between the two types of profiles. A
typical user profile is generated by analyzing an individual user's
search history. This user profile is only used to modulate search
results responsive to search queries submitted by the same user.
For the same search query, two different users may receive
different search results from the same search engine if they have
different user profiles. In contrast, a group profile is usually
generated by analyzing the search history of multiple users so as
to characterize the multiple users' interests. The size of a group
may range, for example from a predefined minimum value to
approximately one million, or as many as the number of users for
whom characteristics may be reasonably stored. The minimum size of
a group may be selected or determined so as to preserve the privacy
interests of the group members. A group profile can be used to
modulate search results responsive to search queries submitted by
any user from the same group, including new users of the group who
made no prior "contribution" to the group profile. Therefore, the
same user submitting the same search query from two different
locations (e.g. websites, computers, ip addresses, or geolocations)
may receive different search results if the two locations have
different group profiles.
[0045] The group profile also has an important advantage over the
user profile in terms of protecting a user's privacy. Thus, in some
embodiments, a group profile is based on information associated
with at least a sufficient number of users to protect the privacy
of those users. In some embodiments a group profile contains at
least approximately 256 users to protect individual users' privacy,
while in other embodiments the minimum size of a respective group
is a value between ten and 512, while in another embodiment the
minimum size is between ten and 10,000). A user profile is
associated with an individual user. To create the user profile, the
individual user, either explicitly or implicitly (e.g., by
monitoring or logging search queries and other online activities of
the user), needs to complete a survey of his or her personal
preferences. This survey indicates what information items may be of
interest to the user. Further, the user must have an account at a
website or a search engine system and the user must log into his or
her account to invoke the user profile to personalize the search
results. In contrast, the creation and usage of the group profile
does not require any personal information from any user. A group
profile is associated with a group of users, not an individual
user. Any individual user's activity is attributed to all the users
in the corresponding group or groups. A user does not need to log
into his or her account at the website in order to use the group
profile. As long as the user may be associated with a group, the
information server automatically "personalizes" the corresponding
search result in accordance with the group profile.
[0046] As shown in FIG. 1, the group profiler 129 is responsible
for generating and updating group profiles. In order to capture the
current user interest associated with a particular group, the group
profiler 129 may need to have access to the users' search history.
The users' search history includes the search queries submitted by
users, the search results responsive to the search queries, the
users' activities on the search results (e.g., selection of a
document link, sometimes called "clicking" on a search result,
amount of time spent at a document after selecting the document
link, mouse hovering time over a document link, or the like), the
current website viewed by the user, the last n websites viewed by
the user (where n is a whole number, typically on the order of five
to ten), the user's favorite websites, or the like.
[0047] In some embodiments, when the front end server 120 receives
a search, it submits a copy of the search query to the search
engine 122 in order to solicit a search result. In addition, the
front end server 120 sends another copy of the search query to the
search history database 127. The search history database 127 then
generates a record, the record including at least the search query
and one or more group identifiers or other information from which
one or more group identifiers can be derived.
[0048] The search result ranker 126 prepares search results
responsive to the search query. The search result (i.e.,
information representing at least a portion of the search results)
is sent back to the requesting client through the front end server
120. A copy of the search result, or a portion of the search
result, is also stored in the search history database 127 together
with the search query record. The client assistant 134 at the
requesting client monitors the requesting user's activities on the
search result, e.g., recording the user's selection(s) of the
document links in the search result and/or the mouse hovering time
on different document links. In some embodiments, the client
assistant 134 or the group profiler 129 determines the document
"dwell time" for a document selected by the user, by determining
the amount of time between user selection of the corresponding
document link and the user exiting from the document. In some
embodiments, the client assistant 134 includes executable
instructions, stored in the webpage(s) containing the search
result, for monitoring the user's actions with respect to the
search results and transmitting information about the monitored
user actions back to the information server 106. In some
embodiments, the search results are served to the requesting users
with an embedded client assistant 134 that sends information about
the user activities on the search results to the group profiler
127. The information server 106, in turn, stores information about
these user activities in the search history database 127 for
subsequent use. The search history database 127 may allocate
amounts of storage space for different groups. As a result, the
volume of search history associated with a group does not exhaust
its designated space or waste too much space before the next
scheduled profile updating.
[0049] For example, the group profiler 129 records the moment that
a user submits a search query (t0), the moment that the user clicks
the first document link in the corresponding search result (t1),
and the moment that the user clicks the second document link in the
search result (t2), etc. The differences between two consecutive
moments (e.g., t1-t0 or t2-t1) are reasonable approximations of the
amount of time spent viewing the search result or the document
whose link was selected by the user. In some embodiments the group
profiler 129 has no information about the user's dwell time for the
last document in the search result that the user selects for
viewing. In some other embodiments (e.g., where at least some users
"opt in" to a version of the client assistant that collects
additional information about the users' online activities), the
group profiler 129 also receives click and timestamp information
for user actions after the user finishes viewing documents from a
search result. Continuing the above example, the group profiler 129
further records the moment that the user submits a second query
(t3), the moment the user selects a document from the second search
results (t4), and so on. Furthermore, the group profiler 129 may
record the moment (t5) when the user either closes the browser
window that was being used to view search results and documents
listed in the search results or navigates away from the webpage or
document with the search results. This additional information
enables the group profiler 129 to determine the user dwell time for
all search result documents (i.e., documents listed in search
results) viewed by a user, which in turn enables the group profiler
129 to generate a more accurate group profile for a group.
[0050] Based on characteristics associated with one or more users,
the group profiler 129 generates a group profile. FIG. 2 is a flow
diagram of a process for generating a group profile using
characteristics associated with the user in accordance with some
embodiments of the present invention. Initially, the group profiler
129 identifies characteristics associated with the user (210). The
group profiler 129 may identify a website; a location (including a
geolocation) such as a country, region, city or the like; an
internet address (e.g. the Internet Protocol (ip) address, TCP/IP
address, Ethernet address, network address, subnet address, or the
like); a domain name or a portion of a domain name; a host name;
users' history (including past viewed websites, past search
queries, selected search results, dwell time or the like); user
profiles or portions thereof, a language; and/or other
characteristics that may be associated with more than one user
(such as a user type, for example a power user or a teenager). Some
groups associated with a user may be determined based on known
affiliations of the user, such as membership in a mailing list,
membership of a website, clubs, teams, businesses, schools, etc. At
least some of these groups may be groups that the user has
explicitly or implicitly joined. These user characteristics may
include some or all search queries submitted by users. In some
cases, the identified search queries may comprise a subset or
sampling of the submitted search queries. Search queries submitted
during a predetermined time period presumably represent the general
interest of the users submitting the queries. The search queries
may capture dynamic user interests that vary by time.
[0051] The group profiler 129 may identify online activities of
users. Identified user activities may include user clicks on
document links in search results. In another example, identified
user activities may include mouse hovering time on the document
links. Generally speaking, a user clicks a document link if the
user is interested in the document's content. Similarly, the fact
that the mouse moves onto a particular document link and stays
there for a substantial amount of time indicates that this document
is relevant to the user's interest. In some embodiments,
information about the mouse hovering time may be unavailable.
[0052] From the user activities on different search results, the
group profiler 129 can identify documents selected by the users. In
some embodiments, the group profiler 129 visits the content
database 124 to retrieve the profiles of the corresponding
documents. As noted above, each identified document may have a
profile (e.g., a document profile) that was previously generated.
If any of the identified documents do not yet have profiles, those
documents can be ignored, or the group profiler may call upon the
document profiler 125 to produce document profiles for those
documents.
[0053] In these ways, the group profiler 129 may also identify
other users having one or more of the same characteristics (215).
User characteristics may be correlated by topic, time or the like
to form groups. The group profiler 129 then associates the user
with other users having the same characteristics in at least one
group (220). One or more group profiles are then generated (230).
The group profiler 129 may generate a group profile based on the
user characteristics, the retrieved document profiles, or the like.
Some of the group profiles may be website profiles, the generation
and use of which are described in patent application Ser. No.
11/394,620. A group profile may be validated, for example, by
comparing the group to an average of other groups of the same type,
by averaging the differences between groups, or other validation
procedures. The group profile may include one or more of the
following: a weighted listing or vector of categories (sometimes
called a category-profile), key terms from search queries and/or
user visited documents (sometimes called a term profile), and
information about links to user visited documents (sometimes called
a link profile). The group profile is stored in the group profile
database 128. The search result ranker 126 can retrieve the group
profile to re-order the ranks of the documents within a search
result.
[0054] In some other embodiments, operation 230 may include a
clustering operation in which user characteristics are clustered
using statistical analysis to determine suitable groupings. The
clustering may be based on, for example, the fact that a user
clicks a link. Alternatively, the group profiler may directly match
a document's URL against a known set of URLs associated with a
particular category. In either case, the group profiler 129 does
not need to access the documents' contents in order to generate the
group profile.
[0055] In yet other embodiments, operation 230 may be augmented by
a process that maps the user characteristics to a set of
categories. For example, the categorization of queries can be based
on the terms in the queries themselves, or by accessing the
profiles of the top N search results (e.g., the top 5, 10, 15 or 20
search results) produced by those queries, merging those document
profiles to produce a query profile for each query, and merging the
query profiles, weighted in accordance with their frequency of
submission to generate a group profile. As discussed below with
reference to FIG. 4, this process may exclude queries that are
deemed to be unlikely to be related to the primary interests of the
users.
[0056] As noted above, a group profile may be updated from time to
time in order to keep track of the current interests of the users
associated with the group. In some embodiments, a group profile is
updated at a predetermined time interval (e.g., every week or every
day). In some other embodiments, a group profile is updated
whenever the number of new search queries by members of the group
reaches a threshold value since a last (i.e., most recent) update.
Whenever it is time to update the group profile, the group profiler
129 repeats the aforementioned process to update the group
profile.
[0057] In some embodiments, different groups generate substantially
different magnitudes of traffic and therefore should be treated
differently in terms of profile updating. For instance, a group
associated with a popular domain name may generate heavy traffic,
on the order of tens of thousands of clicks per day while a smaller
group may have a much lower click rate.
[0058] Some groups may generate so much traffic that the group
definition should be refined. For example, a group based on
grouping the users' ip addresses in a range could include users of
a proxy, which would be unlikely to have any correlation with
user's interests. In such a case, the outlier ip address (the
suspected proxy ip address in this example) could be excluded from
the group. There are two additional issues with significant traffic
during a short time period. First, the group's profile may be
biased by this traffic peak. Special care may be required to make
sure that the group profile has an appropriate balance between the
short-term and long-term interests of the users, such as by
excluding or down-weighting the associated elements of the group
profile. Second, the search history database 127 may not have the
space to store all the search history. One approach to solve this
issue is to intentionally ignore some of the search queries, search
results and user activities. This may be accomplished by sampling
the search queries, search results and/or user activities so as to
produce an unbiased sample of the search history. While the extent
of the sampling may vary from one embodiment to another,
experiments suggest that a search history encompassing several
months of user activities will have sufficient data to generate a
reliable group profile, for most groups, so long as (A) the
sampling is done in a manner that avoids significant biases, and
(B) it includes user activity data corresponding to a few weeks of
representative search history.
[0059] Alternatively, the space shortage issue can be solved by
generating a series of incremental group profiles for different
portions of the search history and merging the incremental group
profiles into the group profile. As shown, for example, in FIG. 3,
the group profiler 129 first generates an incremental profile 311
for the search history section 301. Each search history section
301, 303, 305 may include a predefined quantity of search history
information, or it may include search history information for a
predefined length of time (e.g., an hour), or it may include a
portion of the search history selected in accordance with
predefined selection criteria. The process of generating an
incremental group profile is similar to the process discussed above
in connection with FIG. 2. The incremental profile 311 is
equivalent to the search history section 301 in terms of
characterizing the interests of the users. Once the incremental
profile 311 has been created, the corresponding search history
section 301 in the database can be overwritten by new entries
entering the database. Similarly, the search history section 303
can be overwritten after the incremental profile 313 is generated.
After the creation of the incremental profile 315, the group
profiler 129 can create the new group profile 337 by merging the
incremental profiles 311, 313, and 315 into the old group profile
331. In sum, the group profiler 129 is able to take into account
the entire search history by creating incremental group profiles
for search history sections 301, 303, and 305 and by merging an
existing group profile with incremental profiles 311, 313, and
315.
[0060] A group profile may be used for anonymously "personalizing"
search results responsive to search queries submitted by a user
associated with a group. An underlying assumption in the present
specification is that a user's search queries are, more or less,
related to at least one of the groups associated with the user. If
not carefully filtered out, the search history associated with
popular, but irrelevant, terms may seriously "contaminate" the
group profile and twist the search results in an unexpected
direction. Another source of contamination of the group profile is
query terms that have very low popularity. Special treatment may be
necessary to make sure that user activities with respect to very
low popularity query terms do not significantly bias the search
results.
[0061] In some embodiments the group profiler 129 (FIG. 1)
identifies search queries, and potentially other user behavior or
user information, that differentiate the group from the general
population of online users, and then either filters the received
information or weights it so as to emphasize characteristics of the
group that are distinct from characteristics of the general
population of online users. FIG. 4 is an exemplary curve 400
characterizing the popularity distribution of search queries
submitted by members of a group. All the search queries are divided
into three categories by the two thresholds 415 and 425. The
leftmost category 410 includes those search queries that are
"abnormally" popular, but may be less relevant to the specific
interests of the group. The search query "Britney Spears" being
submitted by a user associated with a Northern California group is
an example of a search query in this category. The group profiler
129 may eliminate or reduce the influence of the search history
associated with these queries on the group profile by giving them
relatively low weights. The middle category 420 includes those
search queries that are reasonably popular and relevant to the
group. The search history corresponding to these search queries may
be granted higher weights to make a major contribution to the group
profile. Finally, the rightmost category 430 includes those queries
that are only made occasionally or by a single user associated with
the group. They may be treated in a manner similar to the queries
in the leftmost category 410.
[0062] There are multiple factors determining the contribution of a
search query (or a corresponding search result) in the middle
category 420 to the group profile. For example, the popularity of
the search query and the amount of user activities on the search
result affect the contribution of the search query and the search
result on the group profile. Time is another important factor. In
some embodiments, recent search history plays a more prominent role
than less recent search history in the formation of the group
profile. One skilled in the art can easily apply similar principles
to other aspects of the search history associated with the
group.
[0063] FIGS. 5A-5C are block diagrams illustrating various types of
groups, including group i, group j, and group k, having multiple
profiles produced by one or more group profilers 129. A particular
user may be associated with these groups, or a subset or superset
thereof.
[0064] FIG. 5A is a block diagram illustrating how website_i group
530 may be associated with multiple profiles in accordance with
some embodiments. A user may be associated with one or more website
groups. For example, i may be any whole number and a user may be
associated with a website.sub.--1 group and a website.sub.--2
group. The profiles 531, 533, 535, and 537 are produced by one or
more group profilers 129, based on the online behavior (sometimes
called the online history) of the users in the group. Online
behavior of the group includes the search history of users
associated with the website_i. The search history involves
different types of information from different sources, such as
search queries 501 submitted by users associated with website_i,
search results 503 generated by the search engine in response to
the search queries from users associated with website_i, and
activities 505 of users associated with website_i on the search
results.
[0065] In some embodiments, group profile generation is divided
into multiple sub-processes. Each sub-process produces a specific
type of group profile characterizing the interests of the website
users from a particular perspective. Four examples of the types of
group profiles that may be produced by sub-processes of the group
profiler 129 (FIG. 1) are: [0066] a category-based group profile
531--this type of profile may be generated by correlating the
search history of a set of users with a set of categories, which
may be organized in a hierarchal fashion, where each category may
be given a weight indicating the relevance of the category to the
interests of users; [0067] a term-based group profile 533--this
type of profile is generated by abstracting the search history for
a set of users with a plurality of terms, wherein each term may be
given a weight indicating the relevance of the term to the
interests of the users; [0068] a link-based group profile 535--this
type of profile is generated by identifying a plurality of links
that are directly or indirectly related to the search history,
where each link may be given a weight indicating the relevance of
the link to the interests of the users; and [0069] a cluster-based
group profile 537--this type of profile is generated by clustering,
or performing statistical analysis of the search history for a set
of users, where each cluster may be given a weight indicating the
relevance of the cluster to the interests of the users.
[0070] In some embodiments, the website_i group 530 may have only a
subset of the group profiles 531, 533, 535, and 537. For example,
the website_i group 530 may include a single term-based group
profile 533. In some embodiments, the group 530 includes a
plurality of group profiles. In some embodiments, at least one of
plurality of group profiles is a combination of two or more of the
aforementioned group profiles 531, 533, 535, and 537. In some other
embodiments, the category-based, term-based and/or link-based group
profiles are further processed to generate a cluster-based group
profile. In yet other embodiments, the cluster-based group profile
appears in the form of multiple cluster-based sub-profiles
characterizing different aspects of the group. In other
embodiments, the cluster based group profile 537 is generated
independently of the category-based, term-based, and link-based
group profiles using statistical methods.
[0071] The category-based group profile 531 may be constructed, for
instance, by mapping search history items (e.g., search queries,
content terms, and/or user-selected documents) to categories, and
then aggregating the resulting sets of the categories and weighting
the categories. The categories may be weighted based on their
frequency of occurrence in the search history items. In addition,
the categories may be weighted based on the relevance of the search
history items to the categories. The search history items
accumulated over a period of time may be treated as a group for
mapping into weighted categories. Other suitable ways of mapping
the search history into weighted categories may also be used. In
addition, category-based group profiles may also be based on, or
take into account, information about the language(s) used by the
websites visited by a group of users, the reading level of such
websites, and other characteristics of the websites that may be
used for re-scoring search engine search results.
[0072] FIG. 5B is a block diagram illustrating how an ip address
type group may be associated with multiple profiles in accordance
with some embodiments. In other embodiments, each of these profiles
may be associated with a separate group, based on the online
behavior of the group. Online group behavior includes the search
history of users associated with the ip address range _j, where j
may be any whole number. The ip address range _j, may be, for
example, the set of Internet Protocol addresses having the same
first 24 bits or having the last 8 bits fall into a predetermined
range. The search history involves different types of information
from different sources, such as the search queries 506 submitted by
users associated with ip address range _j, the search results 507
generated by the search engine in response to the search queries
from users associated with ip address range _j, and the activities
508 of users associated with ip address range _j on the search
results. The ip address range _j group may have one or more of the
following types of group profiles: a category-based group profile
541, a term-based group profile 543, a link-based group profile
545, and a cluster-based group profile 547 as described above with
respect to the website_i group 530. A user may be associated with
one or more groups, including one or more ip address range groups.
For example, a user may be associated with website.sub.--1 group,
ip address range.sub.--2 group, and ip address range.sub.--3 group
or a subset or superset thereof.
[0073] FIG. 5C is a block diagram illustrating how any type of
group_k 550 may be associated with multiple group profiles in
accordance with some embodiments. The group profiles are created
based on online behavior of the group. Online behavior of the group
includes the search history of users associated with the group_k.
The search history involves different types of information from
different sources, such as the search queries 511 submitted by
users associated with group_k, the search results 513 generated by
the search engine in response to the search queries from users
associated with group_k, and the activities 515 of users associated
with group_k on the search results. The group_k group may have one
or more of the following types of group profiles: a category-based
group profile 551, a term-based group profile 553, a link-based
group profile 555, and a cluster-based group profile 557 as
described above with respect to the website_i group 530. A user may
be associated with one or more groups, since k may indicate any
whole number. For example, a user may be associated with
website.sub.--1 group, ip address range.sub.--2 group, and group 3
or a subset or superset thereof.
[0074] FIG. 6A illustrates a hierarchal category map 600 according
to the Open Directory Project (http://dmoz.org/). Starting from the
root level of map 600, documents are organized under several major
topics, such as "Art", "News", "Sports", etc. These major topics
are often too broad to delineate the specific interest of a user.
They are further divided into multiple more specific sub-topics.
For example, the topic "Art" may comprise the sub-topics like
"Movie", "Music", and "Literature" and the sub-topic "Music" may
further comprise sub-sub-topics like "Lyrics", "News", and
"Reviews." Note that each topic (or sub-topic) is associated with a
unique category identifier like 1.1 for "Art", 1.4.2.3 for "Talk
Show", and 1.6.1 for "Basketball."
[0075] The categories shown in FIG. 6A are only for illustrative
purposes. One skilled in the art will appreciate that there are
many other ways of categorizing documents. For example, different
concepts can be extracted from the contents of the documents and
different categories of relevant information are grouped in
accordance with these concepts. The interests of users associated
with a particular group may be associated with multiple categories
at different levels, each having a weight indicative of the
category's relevance to the group users' interest. The categories
and their associated weights can be determined from analyzing the
online behaviors of users associated with the group.
[0076] FIG. 6B is a block diagram of an exemplary data structure, a
category-based group profile table 650, which may be used for
storing category-based group profiles in accordance with some
embodiments of the present invention. The category-based group
profile table 650 includes a table 640 having a plurality of
records 642, each record including a GROUP_ID, a PROFILE_ID and a
pointer pointing to another data structure, such as table 660-1. A
group may have one or more profiles to better serve different user
groups. For example, the group "GROUP.sub.--1" has at least two
different profiles, "PROFILE.sub.--1" and "PROFILE.sub.--2." These
two different profiles may, for example, correspond to search
histories aggregated over different time periods. In other words,
the introduction of different profiles for a group refines the
interests of the group users. This is particularly useful for a
large group associated with a broad spectrum of users. Table 660-1
includes two columns, CATEGORY_ID and WEIGHT. The CATEGORY_ID
column contains a category's identifier, for example as shown in
FIG. 6A, and the value in the WEIGHT column indicates the relevance
of the category to the interests of the users.
[0077] Different profiles for the same time period can be generated
in different ways, to reflect different aspects of the users in the
group (e.g., short term, medium term, or long term interests). This
may be accomplished by putting different emphasis or different
weighting on different portions of the data used to generate the
group profiles. Some group profiles may be generated for special
time periods, such as holidays and events (e.g., Christmas,
Olympics, etc.) during which the behavior of users may change
significantly. The data for such special time periods may also be
removed, or down-weighted, when generating "regular" profiles for a
group of users.
[0078] In some embodiments, the search history items are
automatically classified in different clusters. Clusters may be
more dynamic than categories, since categories are typically
pre-generated. Search history items associated with different
groups are classified against the same set of categories. In
contrast, there may not be a predefined set of clusters for a
particular group. The search history may fall into a dynamically
generated set of clusters. Therefore, clusters may be better
tailored to characterize the interests and preferences of the
group's users or provide additional information about a group to
improve the customization of search results. For convenience, many
of the discussions of profiles in this document use categories as
an example. But it will be clear to one skilled in the art that the
underlying algorithms are also applicable to clusters with no or
little adjustment.
[0079] A category-based group profile, based upon the category map
600, is a topic-oriented implementation of a group profile. The
items in a category-based profile can also be organized in other
ways. In one embodiment, the interests of the website users can be
categorized based on the formats of the documents identified by the
website users, such as HTML, plain text, PDF, Microsoft Word, etc.
Different formats may have different weights. In another
embodiment, the interests of the website users can be categorized
according to the types of the identified documents, e.g., an
organization's homepage, a person's homepage, a research paper, or
a news group posting, each type having an associated weight.
Documents can also be categorized by document origin, for instance
the country associated with each document's host. In yet another
embodiment, two or more of the above-identified category-based
profiles may co-exist, with each one reflecting a respective aspect
of the interests of the website users.
[0080] FIG. 6C is a block diagram "snapshot" of an exemplary data
structure which may be used for storing groups and profiles
associated with a user in accordance with some embodiments. Since
the groups associated with the user may change over time and group
profiles are dynamically changing, the snapshot 670 will change
accordingly. The snapshot 670 includes a plurality of group tables
672, one for each group associated with the user. For example, in
the user associated with snapshot 670 is associated with K groups,
so the snapshot 670 includes group table 672-1 for records
associated with group 1 (website i), group table 672-2 for records
associated with group 2 (ip address range), and group table 672-K
for records associated with group K (location).
[0081] Each group table 672 may store a group confidence value
record 674 for the group, indicating the weight (also called the
confidence value) of the group in the re-ranking process. In some
embodiments, all groups begin with a default weight or group
confidence value of one. Then modifications may be made to the
group confidence value to reflect the appropriate weight of the
group as associated with the user. For example, if a user is
loosely associated with a group, the group confidence value may be
lowered. Similarly, if a group is incoherent or has a low
coherence, the group confidence value may be lowered. (Here
coherence is used in the technical sense to refer to the degree to
which members of a group are similar to other members of the group
and dissimilar to members of other groups.) Group confidence values
may also vary by group type. For example, location type groups may
generally have higher (or perhaps, lower) confidence values than
website type groups. Group confidence values may also vary by any
other factor that affects the preferred weight of a group in
computing the adjusted score for documents in the search result,
including by query type (i.e., varying the group confidence value
when computing the adjusted scored in response to an image query as
opposed to map query, etc.). For example, a high traffic internet
address group may reflect the users' proxy server, which is
unlikely to have any significant relationship to users' interests
and should therefore have a low confidence value relative to groups
that more consistently reflect their users' interests. In some
cases, when the group confidence value approaches or reaches zero,
the user is no longer associated with the group.
[0082] Each group table 672 may also include profile confidence
value records for every group profile of each group, for example
676-1 and 676-2 for Group 1, 676-3 and 676-4 for Group 2, and
676-2X-1 and 676-2X for Group X. Each profile confidence value
record 676 may include a profile confidence value (e.g., Profile 1
Conf. Value) indicating the weight of each profile as associated
with the group in the re-ranking process for a particular user; a
unique identification (ID) of the profile associated with each
group (e.g. Grp1-Prf1 ID); and a pointer to a group profile table,
such as category-based group profile table 650, term-based group
profile table 700, link-based group profile table 800, or the like.
In some embodiments, all profiles begin with a default profile
confidence value of one. Then modifications may be made to the
profile confidence value to reflect the appropriate weight of the
profile as associated with the user and the group as described with
respect to the group confidence value above. For example, group 1,
based on website i, may have a short term profile, profile 1, and a
long term profile, profile 2. The profile confidence value for the
short term profile 1 may be raised for a news query when website i
is www.cnn.com based on the assumption that current news is more
interesting to users associated with www.cnn.com. In some cases,
when the profile confidence value approaches or reaches zero, the
profile is no longer associated with the user or the group.
[0083] In some embodiments, an information server stores
information identifying a limited number of groups per user. In
addition, one or more groups may be associated with a user on the
fly, for instance based on the user's IP address or the website
from which the user is submitting a search query. Any groups not
associated with a user are implicitly assigned a default confidence
value of zero. In some embodiments, group memberships (i.e., groups
associated with a user) are updated from time to time based on
"recent history" (new data concerning recent online behavior of the
user). This may include increasing the confidence of groups for
which there is evidence of continued membership, decreasing the
confidence of groups for which there is a lack of evidence of
continued membership or evidence of decreased activity. A group may
be removed from the set of groups associated with a user when the
confidence value of the group falls below a predefined threshold
value (e.g., 0.2 or any other appropriate value). Furthermore,
since some group membership information may be determined on the
fly, confidence values for those groups may be based on other
information, such as a website's coherence value (see definition of
coherence value, above).
[0084] FIG. 7 is a block diagram of an exemplary data structure, a
term-based group profile table 700, which may be used for storing
term-based group profiles in accordance with some embodiments of
the present invention. The table 700 includes a plurality of
records 710, each record corresponding to a group's term-based
profile. A term-based group profile record 710 includes a plurality
of columns including a GROUP_ID column 720 and multiple columns of
(such as TERM, WEIGHT) pairs 740. The GROUP_ID column stores a
group identifier. Each (TERM, WEIGHT) pair 740 includes a term of
typically one to three words that is deemed relevant to the
interests of the users and a weight associated with the term
indicating the relevance of the term. The weight of a term is not
necessarily a positive value. A negative weight suggests that the
website users disfavor documents including this term in the search
results.
[0085] Besides term-based and category-based profiles, another type
of group profile is referred to as a link-based profile. As
discussed above, the page rank of a document is based on the link
structure that connects the document to other documents on the
Internet. A document having more links pointing to it is often
assigned a higher page rank and is therefore deemed more popular by
the search engine. Link information of documents selected by users
can be used to infer the interests of the users. In one embodiment,
a list of preferred URLs is identified for users by analyzing the
click rate of these URLs. Each preferred URL may be further
weighted according to the mouse hovering time by the users at the
URL. In another embodiment, a list of preferred web hosts is
identified for the users by analyzing the users' visit rate at
different web hosts. When two or more preferred URLs are related to
the same web host, the weights of the two or more URLs may be
combined as the weight of the web host.
[0086] FIG. 8 is a block diagram of an exemplary data structure
that may be used for storing link-based group profiles in
accordance with some embodiments of the present invention. The
link-based profile table 800 includes a table 810 that includes a
plurality of records 820, each record including a GROUP_ID and a
pointer pointing to another data structure, such as table 810-1.
Table 810-1 may include two columns, LINK_ID 830 and WEIGHT 840.
The LINK_ID 830 may be associated with a preferred URL or host. The
actual URL/host may be stored in the table instead of the LINK_ID,
however it is preferable to store the LINK_ID to save storage
space.
[0087] A preferred list of URLs and/or hosts includes URLs and/or
hosts that have been directly identified by the users. The
preferred list of URLs and/or host may further extend to URLs
and/or hosts indirectly identified by using methods such as
collaborative filtering or bibliometric analysis, which are known
to one of ordinary skill in the art. In one embodiment, the
indirectly identified URLs and/or hosts include URLs or hosts that
have links to/from the directly identified URLs and/or hosts. These
indirectly identified URLs and/or hosts are weighted by the
distance between them and the directly identified URLs or hosts.
For example, when a directly identified URL or host has a weight of
1, URLs or hosts that are one link away may have a weight of 0.5,
URLs or hosts that are two links away may have a weight of 0.25,
etc. This procedure can be further refined by reducing the weight
of links that are not related to the topic of the original URL or
host, e.g., links to copyright pages or web browser software that
can be used to view the documents associated with the user-selected
URL or host. Irrelevant Links can be identified based on their
context or their distribution. For example, copyright links often
use specific terms (e.g., "copyright" and "All rights reserved" are
commonly used terms in the anchor text of a copyright link); and
links to a website from many unrelated websites may suggest that
this website is not topically related (e.g., links to the Internet
Explorer.RTM. website are often included in unrelated websites).
The indirect links can also be classified according to a set of
topics or categories and links with very different topics or
categories may be excluded or be assigned a low weight.
[0088] The types of group profiles discussed above are generally
complementary to one another since different profiles characterize
the interests of users from different vantage points. However, this
does not mean that one type of group profile, e.g., the
category-based profile, is incapable of playing a role that is
typically played by another type of group profile. By way of
example, a preferred URL or host in a link-based profile is often
associated with a specific topic, e.g., finance.yahoo.com is a URL
focusing on financial news. Therefore, what is achieved by a
link-based profile that comprises a list of preferred URLs or hosts
may also be achievable, at least in part, by a category-based
profile that has a set of categories that cover the same topics
covered by preferred URLs or hosts.
[0089] FIG. 9 is a flow diagram of a process for generating
group-dependent search results using the various types of group
profiles in accordance with some embodiments of the present
invention. Initially, the search engine 122 receives a search query
submitted by a user (910). In response, the search engine 122 may
optionally generate a query strategy (915). For example, the search
query is normalized so as to be in proper form for further
processing, and/or the search query may be modified in accordance
with predefined criteria so as to automatically broaden or narrow
the scope of the search query. Next, the search engine 122 searches
the content database 124 for documents that match the search query
(or the query strategy, if one is generated). The search engine 122
identifies a set of documents that match the search query (920),
each document having a generic ranking score that depends on the
document's page rank and the search query. All three operations
(910, 915 and 920) are typically conducted by the search engine
122. The net result is an initial set of search results.
[0090] Each group associated with the requesting user is identified
(922). In some embodiments, some or all of the requesting user's
group identifier(s) are embedded in the search query by the client
assistant 134 or other means. Based on the group identifier(s), the
search result ranker 126 identifies the associated group profiles
in the group profile database 128 (925). For each document
identified by the search engine 122 the search result ranker
identifies a document profile (930), based on which a generic
ranking score is derived.
[0091] Next, the search result ranker 126 analyzes each identified
document to determine one or more boost factors using the group and
document profiles (935) and then assigns the document a
group-dependent ranking score using the document's generic ranking
score and the boost factors (940). The search result ranker 126
iterates the process for every identified document (942). The
search result ranker 126 re-orders the list of documents according
to their group-dependent ranking scores (945) to produce re-ordered
search results. At least a portion of the re-ordered search results
(e.g., the top N ranked items, based on the re-ordering), including
links to a list of documents, are sent to the requesting client
103.
[0092] In some embodiments, the analysis of an identified document
at 935 includes determining a correlation between the document's
content and the group profiles. Furthermore, in some embodiments,
this operation includes accessing a previously computed document
profile for the document and then determining a correlation between
the document profile and the group profiles. In some embodiments,
determining the correlation includes one or more operations that
are "dot product" computations, which determine the extent of
overlap, if any, between the document profile and the group
profiles. In addition, instead of determining and then applying a
boost factor (as in operations 935 and 940), some documents may
have their group-dependent ranking score set very high or very low
in accordance with information in a group profile. For instance,
for a group associated with "Apple computers," documents from
websites associated with fruit and produce may be assigned a
predefined very low group-dependent ranking score.
[0093] FIG. 10 is a block diagram of exemplary data structures that
may be used for storing category-based, term-based, and link-based
boost factors for documents in the search results in accordance
with some embodiments of the present invention. For each candidate
document, each identified by a respective DOC_ID, category-based
document information table 1010 includes a plurality of identified
categories and associated weights, term-based document information
table 1030 includes multiple pairs of relevant terms and associated
weights, and link-based document information table 1050 includes a
set of links and corresponding weights.
[0094] The rightmost column of each of the three tables (1010, 1030
and 1050) stores the boost factor (i.e., a computed score) of a
document when the document is evaluated using one specific type of
group profile. A document's boost factor can be determined by
combining the weights of the items associated with the document.
For instance, a category-based or term-based boost factor for users
associated with an ip address range group may be computed as
follows. The users may favor documents related to science with a
weight of 0.6, and disfavor documents related to business with a
weight of -0.2. Thus, when a science document matches a search
query, it will be boosted over a business document. In general, the
document topic classification may not be exclusive. A candidate
document may be classified as being a science document with
probability of 0.8 and a business document with probability of 0.4.
A link-based boost factor may be computed based on the relative
weights allocated to the preferred URLs or hosts in the link-based
profile. In one embodiment, the term-based profile rank can be
determined using known techniques, such as "term frequency-inverse
document frequency" (TF-IDF). The "term frequency" of a term is a
function of the number of times the term appears in a document. The
"inverse document frequency" of a term is an inverse function of
the number of documents in which the term appears within a
collection of documents. For example, very common terms like "word"
occur in many documents and consequently are assigned a relatively
low inverse document frequency, while less common terms like
"photograph" and "microprocessor" are each assigned a relatively
high inverse document frequency.
[0095] In some embodiments, when a search engine generates a search
result in response to a search query, a candidate document D that
satisfies the search query is assigned a query score, QueryScore,
in accordance with the search query. This query score is then
modulated by document D's page rank, PageRank, to generate a
generic ranking score, GenericScore, that is expressed as
GenericScore=QueryScore*PageRank.
[0096] This generic ranking score may not appropriately reflect
document D's relevance to a particular group of users if the users'
interest is measurably different from that of a random user of the
search engine. The relevance of document D to the users can be
characterized by a set of boost factors, based on the correlation
between document D's content and the group's term-based profile,
herein called the TermBoostFactor, the correlation between one or
more categories associated with document D and the group's
category-based profile, herein called the CategoryBoostFactor, and
the correlation between the URL and/or host of document D and the
group's link-based profile, herein called the LinkBoostFactor.
Therefore, document D may be assigned a group-dependent ranking
score that is a function of both the document's generic ranking
score and the various group profile-based boost factors. In one
embodiment, this group-dependent ranking score can be expressed as:
GroupScore=GenericScore*(TermBoostFactor*CategoryBoostFactor*LinkBoostFac-
tor), where a BoostFactor of 1.0 does not modify the generic score,
and a value above or below 1.0 reduces or increases the generic
score, respectively. The relative importance of each type of
profile or boost factor is implemented by controlling the range of
values that are allowed for a given type of boost factor. For
example, a boost factor having a range of 0.1 to 10 has more
importance in determining the GroupScore than a boost factor having
a range of 0.5 to 2.0.
[0097] In other implementations, a linear combination of boost
factors is used:
GroupScore=GenericScore*(Wterm*TermBoostFactor+Wcategory*CategoryB-
oostFactor+Wlink*LinkBoostFactor) or
GroupScore=GenericScore*f(Wterm*TermBoostFactor+Wcategory*CategoryBoostFa-
ctor+Wlink*LinkBoostFactor), where the weights (Wterm, Wcategory,
Wlink) are assigned so that the value in parentheses in the above
equations is equal to about 1.0 if the document is to be neither
promoted nor demoted in rank for the group, above 1.0 if the
document should be promoted, and below 1.0 if the document should
be demoted. The greater the deviation from 1.0, the stronger the
promotion or demotion of the document. The function f( ) in the
last of the equations shown above can be a non-linear function
(e.g., f(x)=x.sup.n where n is any suitable real value) that
emphasizes or de-emphasizes the deviation from 1.0 of the value in
the parentheses. In some embodiments the function f( ) is a
transform function used to normalize the linear combination of
boost factors to a range that is suitable for a combined boost
factor. For example, the argument (input value) of the function f(
) may range from -1 to 1 (or any other suitable input range), while
the value produced by the function f( ) ranges from 0.2 to 2.0 (or
any other suitable output range). The f( ) portion of the above
equation (i.e., the value produced by applying the function)
corresponds to BoostFactor in the equation in the following
paragraph.
[0098] In another embodiment, in which the group has a single
profile, the group-dependent ranking score can be expressed as:
GroupScore=GenericScore*BoostFactor where the "BoostFactor" is
based on the correlation between document D's content and the
group's profile.
[0099] FIG. 11 is a flow diagram of another process for generating
group-dependent search results using group profiles in accordance
with some embodiments of the present invention. Initially, the
search result ranker 126 receives a search query submitted by a
user associated with at least one group (1110). In response, the
search result ranker 126 may optionally generate a generic query
strategy (1115). The search result ranker 126 identifies the group
profiles associated with the user's group(s) in the group profile
database 128 (1125). Unlike the embodiment discussed above in
connection with FIG. 9, the generic query strategy may be modulated
by the group's profile to create a group-dependent query strategy
(1165). For example, relevant terms from the group profile may be
added to the search query with associated weights. In various
embodiments, the group-dependent query strategy is created by the
search engine 122, the front end server 120, or the search result
ranker 126, respectively. In some other embodiments, a copy of the
group profile(s) is generated by the group profiler 129 and the
group-dependent query strategy is created by the website 102 from
which the user is submitting the search query. Next, the search
engine 122 searches the content database 124 (optionally using the
group-dependent query strategy) for documents matching the query
(1170). The documents are then ordered according to their
group-dependent ranking scores (1175). When using the
group-dependent query strategy, the documents identified by the
content database 124 may be implicitly ordered by their associated
query-dependent ranking score.
[0100] FIG. 13 is a flow diagram of a process for providing a user
with customized group-dependent search results using group profiles
according to some embodiments. After receiving a search query from
the user (1300), a database search is performed to identify
information items associated with the search query (1330). In
addition, the process identifies a plurality of groups associated
with the user (131 0). For example, a user might be associated with
an internet address group based on the inclusion of the ip address
of the user's computer in an ip address range associated with the
internet address group and a location group based on the
geolocation of the user's computer as inferred from the computer's
ip address. At least one profile for each group is looked up or
accessed (1320). For example, the search result ranker 126 may
access the group profile database 128 to access each group's
profile. These two steps (1310, 1320) may be done before or after
the process identifies information items associated with the search
query (1330).
[0101] Once search results have been identified, the relevance of
each information item to the user is determined based on the groups
associated with the user. For example, the search result ranker 126
computes an adjusted score or group score for each document based
on the document profile and the groups' profiles (1340). The
adjusted score for each document k is a function of the
GenericScore (as defined above), GroupProfiles(i, j),
GroupConfs(i), ProfileConfs(j), and DocumentProfile(k) for all
groups i and profiles j where i, j, and k are whole numbers. For
example, group profiles GroupProfiles(i, j) as depicted in FIG. 6C
include the profiles for groups 1 through X, such as Grp1-Prf1,
Grp2-Prf2, etc. Thus, in FIG. 6C there are X groups each having two
profiles for a total of 2X profiles. GroupConfs(i) are the group
confidence values stored in records 674-1 through X, while
ProfileConfs(j) are the profile confidence values stored in records
676-1 through 676-2X. The DocumentProfile(k) is the profile of each
document k and may include indications of the document's content
and/or the corresponding document profile stored in document
profile database 123. In some embodiments the adjusted score for
each document k may be expressed as:
AdjustedScore(k)=DocumentProfile(k)*[GenericScore*.SIGMA..sub.iGroupConfs-
(i)*.SIGMA..sub.j[ProfileConfs(j)*GroupProfiles(i, j)]] where
.SIGMA..sub.i indicates summation over all i groups and
.SIGMA..sub.j indicates summation over all j profiles.
[0102] Once the adjusted score for each information item in the
search result is computed, the information items are ranked
accordingly (1350), and then information identifying at least a
portion of the ranked search results (e.g., the top N ranked items,
where N is a suitable integer) is provided to the user (1360). For
example, the search engine and search ranker may produce over one
thousand ranked search results, but the user may be sent a smaller
number of top ranked results, where the number is between ten and
260, or between ten and 130.
[0103] FIG. 14 is a flow diagram of a process for providing a user
with customized group-dependent search results using group profiles
according to some embodiments. After receiving a search query from
the user (1300), a database search is performed to identify
information items associated with the search query (1330). In
addition, the process identifies a group associated with the user
(1410). For example, a user might be associated with an internet
address group based on the inclusion of the ip address of the
user's computer in an ip address range associated with the internet
address group. A plurality of profiles for the group are looked up
or accessed (1420). For example, the search result ranker 126 may
access the group profile database 128 to access the group's long
term and short term profiles. These two steps (1410, 1420) may be
done before or after the process identifies information items
associated with the search query (1330). Then an adjusted score is
computed based on the group's profiles (1430). The computation of
the adjusted score may be done in a manner explained above. Once
the adjusted score for each information item in the search result
is computed, the information items are ranked accordingly (1350),
and information identifying at least a portion of the ranked search
results (e.g., the top N ranked items, where N is a suitable
integer) is provided to the user (1360).
[0104] Referring to FIG. 12, an exemplary information server 1200
typically includes one or more processing units (CPU's) 1202, one
or more network or other communications interfaces 1210, memory
1212, and one or more communication buses 1014 for interconnecting
these components. The communication buses 1014 may include
circuitry (sometimes called a chipset) that interconnects and
controls communications between system components. The system 1200
may optionally include a user interface, for instance a display and
a keyboard. Memory 1212 may include high speed random access memory
and may also include non-volatile memory, such as one or more
magnetic disk storage devices. Memory 1212 may include mass storage
that is remotely located from the CPU's 1202. In some embodiments,
memory 1212 stores the following programs, modules and data
structures, or a subset or superset thereof: [0105] an operating
system 1216 that includes procedures for handling various basic
system services and for performing hardware dependent tasks; [0106]
a network communication module 1218 that is used for connecting the
information server 1200 to other servers or computers via one or
more communication networks (wired or wireless), such as the
Internet, other wide area networks, local area networks,
metropolitan area networks, and so on; [0107] a system
initialization module 1220 that initializes other modules and data
structures stored in memory 1212 required for the appropriate
operation of the information server 1200; [0108] a search engine
122 for processing a search query, identifying and ordering a
search result according to the search query; [0109] a content
database 124 for storing a plurality of indexed document retrieved
from the Internet; [0110] a group profiler 129 for processing data
presenting the online behavior of one or more groups of users and
creating and updating one or more profiles that characterize the
interests of the users associated with the one or more groups;
[0111] a search history database 127 for storing search histories
associated with different groups including search queries, search
results and user activities; [0112] a group profile database 123
for storing group profiles associated with different groups; [0113]
a document profiler 125 for analyzing a document's content and
context and creating a profile for the document; [0114] a document
profile database 123 for storing document profiles associated with
different documents stored in the content database 124; and [0115]
a search result ranker 126 for generating a group-dependent ranking
score for each document identified by the search engine 122 using a
group profile and re-ordering the documents in a search result in
accordance with their group-dependent ranking scores.
[0116] In some embodiments, the information server 106 may not have
access to all the search history associated with a website. For
example, there may be an agreement between a website 102 and the
information server 106 with respect to the search queries submitted
from the website 102. According to the agreement, when a user
visiting the website 1027 submits a search query to the information
server 106, the information server 106 is required to send the
corresponding search result to the website 102 rather than the
requesting user at a client 103. The website 102 may modify the
search result, e.g., attaching advertisements or other information
to the search result, and then serves the modified search result to
the requesting user at the client 103.
[0117] In this scenario, the information server 106 may have no
information identifying the requesting user and the client 103, and
may also be unable to monitor the user's activities on the search
result. For example, the information server 106 may not receive any
information identifying the document links in the search result
that have been clicked by the user. Similarly, the information
server 106 may not receive any information identifying the document
links over which the user moves his or her mouse link and the
corresponding mouse hovering time. In other words, the information
server 106 has very limited or no exposure to the activities of the
website users on the search results. Therefore, the information
server 106 has to rely on the user activities on search results
from other venues to generate the group profile.
[0118] In some embodiments, by examining the search queries
submitted from different websites, the information server 106 may
identify another website similar to the website in question. Two
websites are deemed similar if a predefined number or percentage of
search queries submitted from the two websites is identical. It is
also reasonable to infer that users of the two similar websites may
have similar interests and therefore the user activities associated
with one website are a reasonable proxy of the user activities
associated with the other one. If the information server 106 can
access the user activities associated with one of the two websites
(e.g., there is no agreement to deliver the search results to the
website), the information server 106 can use the same user
activities to create the group profile for the other website.
[0119] When there is no other website similar to the website in
question, the information server 106 may utilize monitored user
activities associated with search queries submitted directly to the
search engine (e.g., search queries submitted using a toolbar
search box or a webpage associated with the information server 106)
as the proxy of a particular website. For instance, the search
query "golf courses in mountain view" may be submitted both to a
golf-focused website, and to a general purpose search engine.
Profile information developed from clicks on the search results of
this search query is used to generate a group profile by combining
or aggregating statistical information related to the queries
received from each respective website.
[0120] Placed content may be displayed to users of search services,
email services, and a variety of other services provided via the
Internet or other wide area networks. For example, when search
results are returned to a user in response to a search query, often
times certain placed content is returned as well. Placed content is
usually in the form of advertising, but could be any type of
content related to the search query or to a document being sent to
the user. Generally, placed is be any type of content where content
providers compete or pay for placement. The techniques discussed
above for selecting and ranking information items can also be used
for selecting and/or ranking placed content to be presented to
users. In particular, in some embodiments, group profiles are used
to select advertisements or other placed content to be presented to
users along with search results. For example, different
advertisements may have different sets of key terms. A correlation
of the key terms of each advertisement in a set of advertisements
with a term-based group profile (or a category-based profile, or
both) associated with a group of users produces a booster factor
for the advertisement. This boost factor may be used to promote or
demote the particular advertisement in response to a search query
submitted by a user associated with the group. For example, when
the information server 106 receives a search query "world cup 2006"
from a member of a group that is positively weighted in the soccer
category, it may promote those advertisements covering soccer gear,
ticket sale for the 2006 FIFA World Cup Germany, and hotel
reservations at the German cities hosting the soccer game, etc.
[0121] The foregoing description, for purpose of explanation, has
been described with reference to specific embodiments. However, the
illustrative discussions above are not intended to be exhaustive or
to limit the invention to the precise forms disclosed. Many
modifications and variations are possible in view of the above
teachings. The embodiments were chosen and described in order to
best explain the principles of the invention and its practical
applications, to thereby enable others skilled in the art to best
utilize the invention and various embodiments with various
modifications as are suited to the particular use contemplated.
* * * * *
References