U.S. patent number RE41,899 [Application Number 10/388,362] was granted by the patent office on 2010-10-26 for system for ranking the relevance of information objects accessed by computer users.
This patent grant is currently assigned to Apple Inc.. Invention is credited to Jeremy J. Bornstein, Dulce B. Ponceleon, Daniel E. Rose, Kevin Tiene.
United States Patent |
RE41,899 |
Rose , et al. |
October 26, 2010 |
**Please see images for:
( Certificate of Correction ) ** |
System for ranking the relevance of information objects accessed by
computer users
Abstract
Information presented to a user via an information access system
is ranked according to a prediction of the likely degree of
relevance to the user's interests. A profile of interests is stored
for each user having access to the system. Items of information to
be presented to a user are ranked according to their likely degree
of relevance to that user and displayed in order of ranking. The
prediction of relevance is carried out by combining data pertaining
to the content of each item of information with other data
regarding correlations of interests between users. A value
indicative of the content of a document can be added to another
value which defines user correlation, to produce a ranking score
for a document. Alternatively, multiple regression analysis or
evolutionary programming can be carried out with respect to various
factors pertaining to document content and user correlation, to
generate a prediction of relevance. The user correlation data is
obtained from feedback information provided by users when they
retrieve items of information. Preferably, the user provides an
indication of interest in each document which he or she retrieves
from the system.
Inventors: |
Rose; Daniel E. (Cupertino,
CA), Bornstein; Jeremy J. (San Francisco, CA), Tiene;
Kevin (Cupertino, CA), Ponceleon; Dulce B. (Palo Alto,
CA) |
Assignee: |
Apple Inc. (Cupertino,
CA)
|
Family
ID: |
22870143 |
Appl.
No.: |
10/388,362 |
Filed: |
March 12, 2003 |
Related U.S. Patent Documents
|
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
Issue Date |
|
Reissue of: |
08231655 |
Apr 25, 1994 |
06202058 |
Mar 13, 2001 |
|
|
Current U.S.
Class: |
706/46; 706/14;
707/999.003 |
Current CPC
Class: |
G06F
16/335 (20190101) |
Current International
Class: |
G06N
5/02 (20060101) |
Field of
Search: |
;706/45,46,14
;707/3 |
References Cited
[Referenced By]
U.S. Patent Documents
Foreign Patent Documents
Other References
Maltz, D., "Distributing Information for Collaborative Filtering on
Usenet Net News," May 1994, M.S. Thesis, Massachusetts Institute of
Technology, Cambridge, MA. cited by other .
Resnick, P., et al., "GroupLens: An Open Architecture for
Collaborative Filtering of Netnews," Proceedings of the 1994 ACM
Conference on Computer Supported Cooperative Work (CSCW), 1994, p.
175-186, ACM, New York, NY. cited by other .
Loeb, S., "Architecting Personalized Delivery of Multimedia
Information," Information Filtering, Communications of the ACM,
Dec. 1992, pp. 39-48, vol. 35, No. 12. cited by other .
Loeb, S., "Delivering Interactive Multimedia Documents Over
Networks," IEEE Communications Magazine, May 1992, pp. 52-59. cited
by other .
Loeb, S., et al., "Lessons from LyricTime.TM.: A Prototype
Multimedia System," Computer Communication Review, ADM SIGCOMM,
1992, pp. 35-36. cited by other .
Loeb, S., et al., "Lessons from LyricTime.TM.: A Prototype
Multimedia System, Extended Abstract," Bell Communicatons Research,
Apr. 3, 1992, pp. 106-113. cited by other .
Yan, T.W. et al., "Index Structures for Information Filtering Under
the Vector Space Model," Stanford University, Nov. 8, 1993, pp.
1-33. cited by other .
Belew, Richard K., "Adaptive Information Retrieval: Using A
Connectionist Representation To Retrieve And Learn About
Documents," 12.sup.th Int'l Conference on Research &
Development in IR (Jun. 1989), Boston, MA. cited by other .
Bookstein, Abraham, "Fuzzy Requests: An Approach To Weighted
Boolean Searches," Journal of the American Society for Information
Science (Jul. 1980), vol. 31, No. 4, pp. 240-247. cited by other
.
Bussey, Howard E. et al., "Service Architecture, Prototype
Description, And Network Implications Of A Personalized Information
Grazing Service," IEEE Infocom (1990), vol. 3, pp. 1046-1053. cited
by other .
Chang, Shih-Chio et al., "And-Less Retrieval Toward Perfect
Ranking," Proceedings of the 50.sup.th ASIS Annual Meeting (Oct.
1987), vol. 24 pp. 30-35. cited by other .
Chang, Shih-Chio et al., "Towards A Friendly Adaptable Information
Retrieval System," Proceedings of the RIAO (Mar. 1988), pp.
172-182. cited by other .
Fischer, Gerhard et al., "Information Access in Complex, Poorly
Structured Information Spaces," CHI '91 Proceedings (Apr.-May
1991), pp. 63-70. cited by other .
Goldberg, David et al., "Using Collaborative Filtering to Weave an
Information Tapestry," Communications of the Association for
Computer Machinery (Dec. 1992), vol. 35, No. 12, pp. 61-70. cited
by other .
Jacobs, Paul S. et al., "Scisor: Extracting Information From
On-Line News," Communications of the Association for Computing
Machinery (Nov. 1990), vol. 33, No. 11, pp. 88-97. cited by other
.
Jennings, Andrew et al., "A Personal News Service Based on a User
Model Neural Network," IEICE Transactions on Information and
Systems, (Mar. 1992), vol. E75-D, No. 2, pp. 198-209. cited by
other .
Jennings, Andrew et al., "Customer Adaptive Communication
Services," IEEE Region 10 International Conference, (Nov. 11-13,
1992), vol. 2, pp. 886-890. cited by other .
Kantardzic, M. et al., "Graphical Knowledge Based Electronic Mail
System," IEEE Conference (May 24, 1991), pp. 1165-1168. cited by
other .
Karlgren, Jussi, "Using Reader Data as a Basis for Measuring
Document Proximity," An Algebra for Recommendations (date unknown),
pp. 1-9. cited by other .
Malone, Thomas W. et al., "The Information Lens: An Intelligent
System for Information Sharing in Organizations," CHI '86
Proceedings (Apr. 1986), pp. 1-8, Boston, MA. cited by other .
Mukhopadhyay, Uttam, et al., "An Intelligent System For Document
Retrieval In Distributed Office Environments," Journal of the
American Society for Information Science (May 1986), vol. 37, No.
3, pp. 123-135. cited by other .
Reynolds, C.F., "On-Line Review: A New Application of the HICOM
Conferencing System," IEEE Colloquium on `Human Factors in
Electronic Mail and Conferencing Systems`, (Feb. 3, 1989), Digest
No. 20, pp. 1-4. cited by other .
Rothman, Matt, "A New Music Retailing Technology says, `Listen
Here`," The New York Times (Sunday Jul. 4, 1993), pp. F8-9. cited
by other .
Salton, Gerard et al., "Extended Boolean Information Retrieval,"
Communications of the ACM (Nov. 1983), vol. 26, No. 11, pp.
1022-1036. cited by other .
Savoy, Jacques, "Searching Information in Hypertext Systems Using
Multiple Sources of Evidence," International Journal fo
Man-Medicine Studies (Jun. 1993), vol. 38, No. 6, pp. 1017-1030.
cited by other .
Sheth, Beerud et al., "Evolving Agents for Personalized Information
Filtering," Proceedings of the Ninth IEEE Conference on Artificial
Intelligence for Applications (Mar. 5, 1993), pp. 345-352. cited by
other .
Spoerri, Anselm, "Visual Tools For Information Retrieval," IEEE
Conference (Aug. 27, 1993), pp. 160-168. cited by other .
Stanfill, Craig, "Massively Parallel Information Retrieval for Wide
Area Information Servers," IEEE International Conference on
Systems, Man, and Cybernetics (Oct. 13-16, 1991), vol. 1, pp.
679-682. cited by other .
Terry, Douglas B., "Replication In An Information Filtering
System," IEEE Conference (Nov. 13, 1992), pp. 66-67. cited by other
.
Wyle, M.F. et al., "A Wide Area Network Information Filter," IEEE
Conference (Oct. 11, 1991), pp.10-15. cited by other .
"Announcement of Bellcore Video Rating System," (Nov. 1, 1993).
cited by other .
Scsior: Extracting information from online news by Jacobs P.S. et
al. Communications of the association for computing machinery, pp.
88-97, Mar. 5, 1993. cited by other .
"Announcement of Bellcore Video Rating System". cited by other
.
Goldberg, David et al, "Using Collaborative Filtering to Weave an
Information Tapestry", Communications of the ACM, Dec. 1991, vol.
35, No. 12, pp. 61-70. cited by other .
Stanfill, "Massively Parallel Information Retrieval for Wide Area
Information Servers", IEEE, Aug. 1991, pp. 679-682. cited by other
.
B. Sheth et al., "Evolving Agents for Personalized Information
Filtering", Proceedings of the Ninth IEEE Conference on Artificial
Intelligence for Applications, CAIA '93, Orlando, Florida, Mar.
'93. cited by other .
Stanfill, Craig, "Massively Parallel Information Retrieval for Wide
Area Information Servers", IEEE, Aug. 1991, pp. 679-682. cited by
other .
Graphical Knowledge based electronic mail system by Kantardzic, M.
et al., IEEE conference paper. pp. 1165-1168, May 24, 1919. cited
by examiner.
|
Primary Examiner: Sparks; Donald
Assistant Examiner: Fernandez Rivas; Omar F
Attorney, Agent or Firm: Fenwick & West LLP
Parent Case Text
.Iadd.More than one reissue application has been filed for the
reissue of U.S. Pat. No. 6,202,058: the reissue applications are
(i) application Ser. No. 10/388,362 (the present application) filed
on Mar. 12, 2003, (ii) application Ser. No. 11/499,819(now
abandoned) filed on Aug. 3, 2006 which is a divisional reissue
application of application Ser. No. 10/388,362, and (iii)
application Ser. No. 11/499,820(now abandoned) filed on Aug. 3,
2006 which is also a divisional reissue application of application
Ser. No. 10/388,362..Iaddend.
Claims
What is claimed:
1. In a computerized information access system, a method for
presenting items of information to users, comprising the steps of:
a) storing user profiles for users having access to the system,
where each user profile is based, at least in part, on the
attributes of information the user finds to be of interest; b)
determining an attribute-based relevance factor for an item of
information which is indicative of the degree to which an attribute
of that item of information matches the profile for a particular
user; c) determining a measure of correlation between the
particular user's interests and those of other users who have
accessed said item of information; d) combining said relevance
factor and said degree of correlation to produce a ranking score
for said item of information; e) repeating steps b, c and d for
each item of information to be presented to said particular user;
and f) displaying the items of information to the user in
accordance with their ranking scores.
2. The method of claim 1, wherein said combining step comprises a
regression analysis of attribute-based and correlation-based
factors for each item of information.
3. The method of claim 1 wherein said combining step comprises
forming a weighted sum of said relevance factor and said degree of
correlation.
4. The method of claim 1, wherein said ranking score is also
related to a date associated with each item of information.
5. The method of claim 1 wherein said step of determining said
degree of correlation includes the steps of obtaining feedback
information from users regarding each user's interest in particular
items of information when each such item is accessed by a user, and
recording said feedback information.
6. The method of claim 5 further including the step of generating a
correlation matrix which indicates the degree of correlation
between respective users based upon commonly accessed items of
information.
7. The method of claim 1 wherein said attribute is the contents of
the item of information.
8. The method of claim 1 wherein said items of information are
displayed in order of their relative rankings to thereby provide
said indication.
9. The method of claim 1 wherein said relevance factor and said
degree of correlation are combined by means of evolutionary
programming techniques to generate a formula that is used to
produce a ranking score for an item of information.
10. The method of claim 9 wherein said evolutionary programming
technique comprises genetic programming.
11. The method of claim 9 wherein said evolutionary programming
technique comprises genetic algorithms.
12. The method of claim 1 wherein said information access system is
an electronic mail system, and said method is employed to filter
messages provided to subscribers of said system.
13. The method of claim 1 wherein said information access system is
an electronic bulletin board system, and said method is employed to
rank items of information in a topic category selected by a
user.
14. A computer-based information access system, comprising: a first
database containing items of information to be provided to users of
said system; means for enabling users to indicate their degree of
interest in particular items of information stored in said first
database; means for determining the correlation between the
indicated interests of respective users and for storing information
related thereto; and means for predicting a given user's likely
degree of interest in a particular item of information on the basis
of said information relating to the determined correlation and at
least one attribute of the item of information.
15. The information access system of claim 14 further including a
user interface for displaying plural items of information with an
indication of their relative predictions regarding likely degree of
interest for a given user.
16. The information access system of claim 14 wherein said
attribute is the contents of the item of information.
17. The information access system of claim 14 further including a
second database containing at least one profile of interests for
each of a number of users of said system, and wherein said
prediction is based on a combination of (i) the relationship of
said attribute to the profile for said given user and (ii) the
correlation between indications provided by the given user and
other users who have had access to said item of information.
18. The information access system of claim 17 wherein each user
profile comprises a vector and said attribute defines a vector for
the item of information, and wherein said relationship is
determined in accordance with the similarities between the vector
for the item of information and the user profile vector.
19. The information access system of claim 14 wherein said
prediction is based on a regression analysis of data related to
said attribute and stored correlation information pertaining to
said given user.
20. The information access system of claim 14 wherein said
prediction is determined by means of evolutionary programming
techniques.
21. The information access system of claim 20 wherein the
evolutionary programming techniques produce a formula which
establishes a combination of attribute-based and correlation-based
factors that determine said prediction.
22. The information access system of claim 20 wherein said
evolutionary programming techniques comprise genetic
programming.
23. The information access system of claim 20 wherein said
evolutionary programming techniques comprise genetic
algorithms.
24. The system of claim 14, wherein said information access system
comprises an electronic mail system.
25. The system of claim 14, wherein said information access system
comprises an electronic bulletin board system.
26. The system of claim 14, wherein said information access system
comprises an electronic search and retrieval system.
27. The method of claim 1 wherein the items of information are
displayed with an indication of their ranking scores.
28. A method for displaying items of information to users,
comprising the steps of: determining a relevance factor for an item
of information, based upon an attribute of the item of information;
defining a relationship between the interests of a given user and
those of other users; determining a correlation factor for the item
of information, based upon said defined relationship; combining
said relevance factor and said correlation factor to produce a
ranking score for the item of information; and displaying the item
of information to the given user in accordance with its ranking
score.
29. The method of claim 28 further including the steps of
determining a ranking score for multiple items of information, and
displaying the items of information in accordance with their
ranking scores.
30. The method of claim 28 wherein the item of information is
displayed with an indication of its ranking score.
.Iadd.31. A method of presenting documents from a document
collection to a user, the method comprising: storing a user profile
vector for the user, the user profile vector in a vector space
derived from terms contained in the document collection and
including a plurality of weights, each weight associated with a
term in the document collection; selecting a plurality of documents
from the document collection, each document associated with a
document vector in the term vector space; for each selected
document: determining a relevance score, the relevance score based
on a relationship between the user profile vector and the document
vector associated with the selected document; determining a
correlation score between the user and other users corresponding to
the selected document; and combining the relevance score and the
correlation score to determine a final ranking score for the
selected document; and presenting the selected documents to the
user according to the final ranking scores..Iaddend.
.Iadd.32. The method of claim 31, wherein determining a correlation
score comprises: storing information relating to users' interest in
the documents in the document collection; storing information
relating to the degree of correlation between the users' interest
in documents; generating the correlation score based upon the
information relating to the users' interest and the information
relating to the degree of correlation..Iaddend.
.Iadd.33. The method of claim 32, wherein: the information relating
to the users' interests in the documents is stored in a user
interest matrix indicating the users' interests in particular
documents; the degree of correlation between the users' interest is
stored in a correlation matrix indicating the degree of correlation
between the users' interest in the documents; and the correlation
score is generated based upon the user interest matrix and the
correlation matrix..Iaddend.
.Iadd.34. The method of claim 32, wherein: storing information
relating to the users' interest comprises generating a user
interest matrix V where each entry V.sub.kj is the weight
indicating the feedback of user k on document j; storing
information relating to the degree of correlation comprises
generating a correlation matrix R where each entry R.sub.jk is a
measure of the degree of correlation between users i and k; and
generating the correlation score comprises calculating a prediction
score P.sub.ij indicating a likelihood of user i's interest in
document j by carrying out an operation, .noteq..times..times.
##EQU00002## .Iaddend.
.Iadd.35. The method of claim 31, wherein the relationship between
the user profile vector and the document vector is a cosine of an
angle between the document vector and the user profile
vector..Iaddend.
.Iadd.36. The method of claim 31, wherein the relationship between
the user profile vector and the document vector is based on the
similarity between the user profile vector and the document
vector..Iaddend.
.Iadd.37. A computer program product for presenting documents from
a document collection to a user, the computer program product
stored on a computer readable medium and adapted to perform a
method comprising: storing a user profile vector for the user, the
user profile vector in a vector space derived from terms contained
in the document collection and including a plurality of weights,
each weight associated with a term in the document collection;
selecting a plurality of documents from the document collection,
each document associated with a document vector in the term vector
space; for each selected document: determining a relevance score,
the relevance score based on a relationship between the user
profile vector and the document vector associated with the selected
document; determining a correlation score between the user and
other users corresponding to the selected document; and combining
the relevance score and the correlation score to determine a final
ranking score for the selected document; and presenting the
selected documents to the user according to the final ranking
scores..Iaddend.
.Iadd.38. The computer program product of claim 37, wherein
determining a correlation score comprises: storing information
relating to users' interest in the documents in the document
collection; storing information relating to the degree of
correlation between the users' interest in documents; generating
the correlation score based upon the information relating to the
users' interest and the information relating to the degree of
correlation..Iaddend.
.Iadd.39. The computer program product of claim 38, wherein: the
information relating to the users' interests in the documents is
stored in a user interest matrix indicating the users' interests in
particular documents; the degree of correlation between the users'
interest is stored in a correlation matrix indicating the degree of
correlation between the users' interest in the documents; and the
correlation score is generated based upon the user interest matrix
and the correlation matrix..Iaddend.
.Iadd.40. The computer program product of claim 38, wherein:
storing information relating to the users' interest comprises
generating a user interest matrix V where each entry V.sub.kj is
the weight indicating the feedback of user k on document j; storing
information relating to the degree of correlation comprises
generating a correlation matrix R where each entry R.sub.jk is a
measure of the degree of correlation between users i and k; and
generating the correlation score comprises calculating a prediction
score P.sub.ij indicating a likelihood of user i's interest in
document j by carrying out an operation, .noteq..times..times.
##EQU00003## .Iaddend.
.Iadd.41. The computer program product of claim 37, wherein the
relationship between the user profile vector and the document
vector is a cosine of an angle between the document vector and the
user profile vector..Iaddend.
.Iadd.42. The computer program product of claim 37, wherein the
relationship between the user profile vector and the document
vector is based on the similarity between the user profile vector
and the document vector..Iaddend.
.Iadd.43. A system for presenting documents to a user, the
documents each associated with a document vector in a vector space
and stored in a document database coupled to the system, the system
comprising: a user database storing a user profile vector for the
user, the user profile vector in the vector space derived from
terms contained in the document database and including a plurality
of weights, each weight associated with a term in the document
collection; and a server coupled to the user database and the
document database for selecting documents from the document
database, wherein the server: determines, for each selected
document, a relevance score, the relevance score based on a
relationship between the user profile vector and the document
vector associated with the selected document; determines, for each
selected document, a correlation score between the user and other
users corresponding to the selected document; combines, for each
selected document, the relevance score and the correlation score to
determine a final ranking score for the selected document; and
presents the selected documents to the user according to the final
ranking scores..Iaddend.
.Iadd.44. The system of claim 43, wherein the server determines the
correlation score by: storing information relating to users'
interest in the documents in the document collection; storing
information relating to the degree of correlation between the
users' interest in documents; generating the correlation score
based upon the information relating to the users' interest and the
information relating to the degree of correlation..Iaddend.
.Iadd.45. The system of claim 44, wherein: the information relating
to the users' interests in the documents is stored in a user
interest matrix indicating the users' interests in particular
documents; the degree of correlation between the users' interest is
stored in a correlation matrix indicating the degree of correlation
between the users' interest in the documents; and the server
generates the correlation score based upon the user interest matrix
and the correlation matrix..Iaddend.
.Iadd.46. The system of claim 44, wherein: the information relating
to the users' interest is stored in a user interest matrix V where
each entry V.sub.kj is the weight indicating the feedback of user k
on document j; the information relating to the degree of
correlation is stored in a correlation matrix R where each entry
R.sub.jk is a measure of the degree of correlation between users i
and k; and the server generates the correlation score by
calculating a prediction score P.sub.ij indicating a likelihood of
user i's interest in document j by carrying out an operation,
.noteq..times..times. ##EQU00004## .Iaddend.
.Iadd.47. The system of claim 43, wherein the relationship between
the user profile vector and the document vector is a cosine of an
angle between the document vector and the user profile
vector..Iaddend.
.Iadd.48. The method of claim 43, wherein the relationship between
the user profile vector and the document vector is based on the
similarity between the user profile vector and the document
vector..Iaddend.
.Iadd.49. A method of presenting information items from an
information item collection to a user, the method comprising:
storing a user profile vector for the user, the user profile vector
in a vector space derived from attributes in the information item
collection and including a plurality of weights, each weight
associated with an attribute in the information item collection;
selecting a plurality of information items from the information
item collection, each information item associated with an
information item vector in the attribute vector space; for each
selected information item: determining a relevance score, the
relevance score based on a relationship between the user profile
vector and the information item vector associated with the selected
information item; determining a correlation score between the user
and other users corresponding to the selected information item; and
combining the relevance score and the correlation score to
determine a final ranking score for the selected information item;
and presenting the selected information items to the user according
to the final ranking scores..Iaddend.
.Iadd.50. The method of claim 49, wherein determining a correlation
score comprises: storing information relating to users' interest in
the information items in the information item collection; storing
information relating to the degree of correlation between the
users' interest in information items; generating the correlation
score based upon the information relating to the users' interest
and the information relating to the degree of
correlation..Iaddend.
.Iadd.51. The method of claim 50, wherein: the information relating
to the users' interests in the information items is stored in a
user interest matrix indicating the users' interests in particular
information items; the degree of correlation between the users'
interest is stored in a correlation matrix indicating the degree of
correlation between the users' interest in the information items;
and the correlation score is generated based upon the user interest
matrix and the correlation matrix..Iaddend.
.Iadd.52. The method of claim 50, wherein: storing information
relating to the users' interest comprises generating a user
interest matrix V where each entry V.sub.kj is the weight
indicating the feedback of user k on information item j; storing
information relating to the degree of correlation comprises
generating a correlation matrix R where each entry R.sub.ik is a
measure of the degree of correlation between users i and k; and
generating the correlation score comprises calculating a prediction
score P.sub.ij indicating a likelihood of user i's interest in
information item j by carrying out an operation,
.noteq..times..times. ##EQU00005## .Iaddend.
.Iadd.53. The method of claim 49, wherein the relationship between
the user profile vector and the document vector is a cosine of an
angle between the document vector and the user profile
vector..Iaddend.
.Iadd.54. The method of claim 49, wherein the relationship between
the user profile vector and the document vector is the distance
between the user profile vector and the document
vector..Iaddend.
.Iadd.55. A computer program product for presenting information
items from an information item collection to a user, the computer
program product stored on a computer readable medium and adapted to
perform a method comprising: storing a user profile vector for the
user, the user profile vector in a vector space derived from
attributes contained in the information item collection and
including a plurality of weights, each weight associated with an
attribute in the information item collection; selecting a plurality
of information items from the information item collection, each
information item associated with an information item vector in the
attribute vector space; for each selected information item:
determining a relevance score, the relevance score based on a
relationship between the user profile vector and the information
item vector associated with the selected information item;
determining a correlation score between the user and other users
corresponding to the selected information item; and combining the
relevance score and the correlation score to determine a final
ranking score for the selected information item; and presenting the
selected information items to the user according to the final
ranking scores..Iaddend.
.Iadd.56. The computer program product of claim 55, wherein
determining a correlation score comprises: storing information
relating to users' interest in the information items in the
information item collection; storing information relating to the
degree of correlation between the users' interest in information
items; generating the correlation score based upon the information
relating to the users' interest and the information relating to the
degree of correlation..Iaddend.
.Iadd.57. The computer program product of claim 56, wherein: the
information relating to the users' interests in the information
items is stored in a user interest matrix indicating the users'
interests in particular information items; the degree of
correlation between the users' interest is stored in a correlation
matrix indicating the degree of correlation between the users'
interest in the information items; and the correlation score is
generated based upon the user interest matrix and the correlation
matrix..Iaddend.
.Iadd.58. The computer program product of claim 56, wherein:
storing information relating to the users' interest comprises
generating a user interest matrix V where each entry V.sub.kj is
the weight indicating the feedback of user k on information item j;
storing information relating to the degree of correlation comprises
generating a correlation matrix R where each entry R.sub.jk is a
measure of the degree of correlation between users i and k; and
generating the correlation score comprises calculating a prediction
score P.sub.ij indicating a likelihood of user i's interest in
information item j by carrying out an operation,
.noteq..times..times. ##EQU00006## .Iaddend.
.Iadd.59. The computer program product of claim 55, wherein the
relationship between the user profile vector and the document
vector is a cosine of an angle between the document vector and the
user profile vector..Iaddend.
.Iadd.60. The computer program product of claim 55, wherein the
relationship between the user profile vector and the document
vector is based on the similarity between the user profile vector
and the document vector..Iaddend.
.Iadd.61. A system for presenting information items to a user, the
information items each associated with an information item vector
in the attribute vector space and stored in an information item
database coupled to the system, the system comprising: a user
database storing a user profile vector for the user, the user
profile vector in a vector space derived from attributes contained
in the information item database and including a plurality of
weights, each weight associated with an attribute in the
information item collection; and a server coupled to the user
database and the information item database for selecting
information items from the information item database, wherein the
server: determines, for each selected information item, a relevance
score, the relevance score based on a relationship between the user
profile vector and the information item vector associated with the
selected information item; determines, for each selected
information item, a correlation score between the user and other
users corresponding to the selected information item; combines, for
each selected information item, the relevance score and the
correlation score to determine a final ranking score for the
selected information item; and presents the selected information
items to the user according to the final ranking
scores..Iaddend.
.Iadd.62. The system of claim 61, wherein the server determines the
correlation score by: storing information relating to users'
interest in the information items in the information item
collection; storing information relating to the degree of
correlation between the users' interest in information items;
generating the correlation score based upon the information
relating to the users' interest and the information relating to the
degree of correlation..Iaddend.
.Iadd.63. The system of claim 62, wherein: the information relating
to the users' interests in the information items is stored in a
user interest matrix indicating the users' interests in particular
information items; the degree of correlation between the users'
interest is stored in a correlation matrix indicating the degree of
correlation between the users' interest in the information items;
and the server generates the correlation score based upon the user
interest matrix and the correlation matrix..Iaddend.
.Iadd.64. The system of claim 62, wherein: the information relating
to the users' interest is stored in a user interest matrix V where
each entry V.sub.kj is the weight indicating the feedback of user k
on information item j; the information relating to the degree of
correlation is stored in a correlation matrix R where each entry
R.sub.ik is a measure of the degree of correlation between users i
and k; and the server generates the correlation score by
calculating a prediction score P.sub.ij indicating a likelihood of
user i's interest in information item j by carrying out an
operation, .noteq..times..times. ##EQU00007## .Iaddend.
.Iadd.65. The server of claim 61, wherein the relationship between
the user profile vector and the document vector is a cosine of an
angle between the document vector and the user profile
vector..Iaddend.
.Iadd.66. The server of claim 61, wherein the relationship between
the user profile vector and the document vector is based on the
similarity between the user profile vector and the document
vector..Iaddend.
.Iadd.67. A method of presenting documents from a document
collection to a user, the method comprising: storing a user profile
for the user, the user profile including terms contained in the
document collection and weights respectively associated with the
terms; selecting a plurality of documents from the document
collection, each document associated with a document profile, the
document profile including terms contained in its associated
document; for each selected document: determining a relevance
score, the relevance score based on a relationship between the user
profile and the document profile associated with the selected
document; determining a correlation score between the user and
other users corresponding to the selected document; and combining
the relevance score and the correlation score to determine a final
ranking score for the selected document; and presenting the
selected documents to the user according to the final ranking
scores..Iaddend.
.Iadd.68. The method of claim 67, wherein the final ranking score
comprises a recommendation score..Iaddend.
.Iadd.69. The method of claim 68, wherein the recommendation score
comprises a movie recommendation score..Iaddend.
.Iadd.70. A method comprising: storing a user profile for a user,
the user profile including terms contained in a document collection
and weights respectively associated with the terms; selecting a
plurality of documents from the document collection, each document
associated with a document profile, the document profile including
terms contained in its associated document; for each selected
document: determining a relevance score, the relevance score based
on a relationship between the user profile and the document profile
associated with the selected document; determining a correlation
score between the user and other users corresponding to the
selected document; and combining the relevance score and the
correlation score to determine a final ranking score for the
selected document; and presenting one or more recommendations to
the user based on the final ranking scores..Iaddend.
.Iadd.71. The method of claim 70, wherein the recommendations
comprise movie recommendations..Iaddend.
.Iadd.72. A method of presenting documents received from a document
collection to a user, the method comprising: retrieving a user
profile vector associated with the user, the user profile vector in
a vector space derived from terms in the document collection;
receiving a plurality of documents from the document collection,
each document having a document vector in the vector space; for
each received document: determining a relevance score for the
document by a vector operation comparing the user profile vector
and the document vector; and determining a correlation score
between the user and other users corresponding to the document; and
ranking the received documents based on a combination of each
received document's relevance score and correlation score for
presentation to the user..Iaddend.
.Iadd.73. The method of claim 72, wherein the vector space is
defined by a set of terms selected from the terms in the document
collection, each user profile vector and each document vector
includes a plurality of vector components, each vector component
corresponding to a weight of one of the terms..Iaddend.
.Iadd.74. The method of claim 72, wherein the vector operation is
the determination of a cosine of an angle between the document
vector and the user profile vector..Iaddend.
.Iadd.75. The method of claim 72, wherein the vector operation is a
geometric operation determining a distance between the user profile
vector and the document vector..Iaddend.
.Iadd.76. The method of claim 72, wherein each user profile vector
and each document vector comprises a plurality of weights, each
weight associated with a term..Iaddend.
.Iadd.77. The method of claim 72, wherein each user profile vector
comprises a plurality of user profile vector weights derived from
the user's interest in documents and each document vector comprises
a plurality of document vector weights indicating the frequency of
occurrence of the terms associated with the document vector weights
in the document..Iaddend.
.Iadd.78. The method of claim 72, further comprising receiving a
user rating of a document; responsive to positive user rating,
modifying the user profile vector of the user so that the user
profile vector is more similar to the document vector of the user
rated document; and responsive to a negative user rating, modifying
the user profile vector of the user so that the user profile vector
is less similar to the document vector of the user rated
document..Iaddend.
.Iadd.79. The method of claim 72, further comprising: receiving a
user rating of a document; and modifying the user profile vector as
a function of the user rating and the document vector of the user
rated document..Iaddend.
.Iadd.80. The method of claim 72, further comprising: receiving a
user rating of a document indicating a user interest in the user
rated document; and modifying the user profile vector by
determining which terms of the user rated document are significant
and increasing the weights corresponding to the significant terms
in the user profile vector..Iaddend.
.Iadd.81. The method of claim 72, wherein the document collection
includes a first document database and a second document database
separate from the first document database, and the user profile
vector associated with the user comprises a first user profile
vector and a second user profile vector, the first and second user
profile vectors corresponding to the first and second document
databases, respectively, the method further comprising: updating
the first user profile vector in response to a user rating of a
document from the first document database; and updating the second
user profile vector in response to a user rating of a document from
the second document database..Iaddend.
.Iadd.82. A computer program product for presenting documents
received from a document collection to a user, the computer program
product stored on a computer readable medium and configured to
perform a method comprising: retrieving a user profile vector
associated with the user, the user profile vector in a vector space
derived from terms in the document collection; receiving a
plurality of documents from the document collection, each document
having a document vector in the vector space; for each received
document: determining a relevance score for the document by a
vector operation comparing the user profile vector and the document
vector; and determining a correlation score between the user and
other users corresponding to the document; and ranking the received
documents based on a combination of each received document's
relevance score and correlation score for presentation to the
user..Iaddend.
.Iadd.83. The computer program product of claim 82, wherein the
vector space is defined by a set of terms selected from the terms
in the document collection, each user profile vector and each
document vector includes a plurality of vector components, each
vector component corresponding to a weight of one of the
terms..Iaddend.
.Iadd.84. The computer program product of claim 82, wherein the
vector operation is the determination of a cosine of an angle
between the document vector and the user profile
vector..Iaddend.
.Iadd.85. The computer program product of claim 82, wherein the
vector operation is a geometric operation determining a distance
between the user profile vector and the document
vector..Iaddend.
.Iadd.86. The computer program product of claim 82, wherein each
user profile vector and each document vector comprises a plurality
of weights, each weight associated with a term..Iaddend.
.Iadd.87. The computer program product of claim 82, wherein each
user profile vector comprises a plurality of user profile vector
weights derived from the user's interest in documents and each
document vector comprises a plurality of document vector weights
indicating the frequency of occurrence of the terms associated with
the document vector weights in the document..Iaddend.
.Iadd.88. The computer program product of claim 82, the method
further comprising: receiving a user rating of a document;
responsive to positive user rating, modifying the user profile
vector of the user so that the user profile vector is more similar
to the document vector of the user rated document; and responsive
to a negative user rating, modifying the user profile vector of the
user so that the user profile vector is less similar to the
document vector of the user rated document..Iaddend.
.Iadd.89. The computer program product of claim 82, the method
further comprising: receiving a user rating of a document; and
modifying the user profile vector as a function of the user rating
and the document vector of the user rated document..Iaddend.
.Iadd.90. The computer program product of claim 82, the method
further comprising: receiving a user rating of a document
indicating a user interest in the user rated document; and
modifying the user profile vector by determining which terms of the
user rated document are significant and increasing the weights
corresponding to the significant terms in the user profile
vector..Iaddend.
.Iadd.91. The computer program product of claim 82, wherein the
document collection includes a first document database and a second
document database separate from the first document database, and
the user profile vector associated with the user comprises a first
user profile vector and a second user profile vector, the first and
second user profile vectors corresponding to the first and second
document databases, respectively, the method further comprising:
updating the first user profile vector in response to a user rating
of a document from the first document database; and updating the
second user profile vector in response to a user rating of a
document from the second document database..Iaddend.
.Iadd.92. A system for presenting documents to a user, the
documents each having a document vector in a vector space and
stored in a document database coupled to the system, the system
comprising: a user database storing a user profile vector
associated with the user, the user profile vector in the vector
space derived from terms in the document database; a server coupled
to the document database and the user database, the server
receiving documents from the document database and determining a
relevance score for each of the received documents by a vector
operation comparing the user profile vector and the document vector
and determining a correlation score for each of the received
documents between the user and other users corresponding to the
document and ranking the received documents based on a combination
of each received document's relevance score and correlation score
for presentation to the user..Iaddend.
.Iadd.93. The system of claim 92, wherein the vector space is
defined by a set of terms selected from the terms in the document
database, each user profile vector and each document vector
includes a plurality of vector components, each vector component
corresponding to a weight of one of the terms..Iaddend.
.Iadd.94. The system of claim 92, wherein the vector operation is
the determination of a cosine of an angle between the document
vector and the user profile vector..Iaddend.
.Iadd.95. The system of claim 92, wherein the vector operation is a
geometric operation determining a distance between the user profile
vector and the document vector..Iaddend.
.Iadd.96. The system of claim 29, wherein each user profile vector
and each document vector comprises a plurality of weights, each
weight associated with a term..Iaddend.
.Iadd.97. The system of claim 92, wherein each user profile vector
comprises a plurality of user profile vector weights derived from
the user's interest in documents and each document vector comprises
a plurality of document vector weights indicating the frequency of
occurrence of the terms associated with the document vector weights
in the document..Iaddend.
.Iadd.98. The system of claim 92, wherein the server receives a
user rating of a document, and: responsive to positive user rating,
modifies the user profile vector of the user so that the user
profile vector is more similar to the document vector of the user
rated document; and responsive to a negative user rating, modifies
the user profile vector of the user so that the user profile vector
is less similar to the document vector of the user rated
document..Iaddend.
.Iadd.99. The system of claim 92, wherein the server receives a
user rating of a document and modifies the user profile vector as a
function of the user rating and the document vector of the user
rated document..Iaddend.
.Iadd.100. The system of claim 92, wherein the server receives a
user rating of a document indicating a user interest in the user
rated document and modifies the user profile vector by determining
which terms of the user rated document are significant and
increasing the weights corresponding to the significant terms in
the user profile vector..Iaddend.
.Iadd.101. The system of claim 92, wherein the document database
includes a first document database and a second document database
separate from the first document database, and the user profile
vector associated with the user comprises a first user profile
vector and a second user profile vector, the first and second user
profile vectors corresponding to the first and second document
databases, respectively, and the server: updates the first user
profile vector in response to a user rating of a document from the
first document database; and updates the second user profile vector
in response to a user rating of a document from the second document
database..Iaddend.
.Iadd.102. A method of presenting information items from an
information item collection to a user, the method comprising:
accessing a user profile associated with the user; for each
information item in the information item collection: determining a
relevance score for the information item based on a relationship
between the user profile and the information item; and determining
a correlation score between the user and other users corresponding
to the information item; and ranking the information items based on
a combination of each information item's relevance score and
correlation score for presentation to the user..Iaddend.
.Iadd.103. A computer program product for presenting information
items from an information item collection to a user, the computer
program product stored on a computer readable medium and configured
to perform a method comprising: accessing a user profile associated
with the user; for each information item in the information item
collection: determining a relevance score for the information item
based on a relationship between the user profile and the
information item; and determining a correlation score between the
user and other users corresponding to the information item; and
ranking the information items based on a combination of each
information item's relevance score and correlation score for
presentation to the user..Iaddend.
.Iadd.104. A system for presenting information items to a user, the
information items stored in an information item database coupled to
the system, the system comprising: a user database storing a user
profile associated with the user; a server coupled to the
information item database and the user database, the server
identifying information items from the information item database
and determining a relevance score for each of the identified
information items based on a relationship between the user profile
and the information item and determining a correlation score for
each of the identified information items between the user and other
users corresponding to the information item and ranking the
identified information items based on a combination of each
identified information item's relevance score and correlation score
for presentation to the user..Iaddend.
Description
FIELD OF THE INVENTION
The present inversion is directed to information access in
multiuser computer systems, and more particularly to a system for
ranking the relevance of information that is accessed via a
computer.
BACKGROUND OF THE INVENTION
The use of computers to obtain and/or exchange information is
becoming quite widespread. Currently, there are three prevalent
types of systems that can be employed to distribute information via
computers. One of these systems comprises electronic mail, also
known as e-mail, in which a user receives messages, such as
documents, that have been specifically sent to his or her
electronic mailbox. Typically, to receive the documents, no
explicit action is required on the user's part, except to access
the mailbox itself. In most systems, the user is informed whenever
new messages have been sent to his or her mailbox, enabling them to
be read in a timely fashion.
Another medium that is used to distribute information is an
electronic bulletin board system. In such a system, users can post
documents or files to directories corresponding to specific topics,
where they can be viewed by other users who need not be explicitly
designated. In order to view the documents, the other users must
actively select and open the directories containing topics of
interest. Articles and other items of information posted to
bulletin board systems typically expire after some time period, and
are then deleted.
The third form of information exchange is by means of text
retrieval from static data bases, which are typically accessed
through dial-up services. A group of users, or a service bureau,
can place documents of common interest on a file server. Using a
text searching tool, individual users can locate documents matching
a specific topical query. Some services of this type enable users
to search personal databases, as well as databases of other
users.
As the use of these types of systems becomes ever more common, the
amount of information presented to users can reach the point of
becoming unmanageable. For example, users of electronic mail
services are increasingly finding that they receive more mail than
they can usefully handle. Part of this problem is due to the fact
that junk mail of no particular interest is regularly sent in bulk
to lists of user accounts. In order to view messages of interest,
the user may be required to sift through a large volume of
undesirable mail.
Similarly, in bulletin board systems, the number of documents in a
particular topical category at any given time can be quite
significant. The user must try to identify documents of interest on
the basis of cryptic titles. As a result, an opportunity to view
documents that are critically relevant may be missed if the user
cannot take the time to view all documents in the category.
Along similar lines, in a text retrieval system, a broadly framed
query can result in the identification of a large number of
documents for the user to view. In an effort to reduce the number
of documents, the user may modify the query to narrow its scope. In
doing so, however, documents of interest may be eliminated because
they do not exactly match the modified query.
In the past, some information access systems, particularly e-mail
systems, have provided the user with the ability to have incoming
information filtered, so that only items of interest would be
presented to the user. The filtering was carried out on the basis
of objective criteria specified by the user. Any messages not
meeting the filtering criteria would be blocked. There is always
the danger in such an objective approach that potentially relevant
items of information can be missed. It is desirable, therefore, to
employ a system for predicting the likely relevance of items of
information to a particular user, so that the items of interest can
be ranked and the need to deal with large amounts of irrelevant
information can be avoided.
Some types of relevance predictors have already been proposed. For
example, the contents of a document can be examined to make a
determination as to whether a user might find that document to be
of interest, based on user-supplied information. While approaches
of this type have some utility, they are limited because the
prediction of relevance is made only on the basis of one attribute,
e.g., word content. It is desirable to improve upon existing
relevance predicting techniques, and provide a system which takes
into account a variety of attributes that are relevant to a user's
likely interest in a particular item of information. In this
regard, it is particularly desirable to provide an information
relevance predicting technique which utilizes community feedback as
one of the factors in the prediction.
SUMMARY OF THE INVENTION
In accordance with the present invention, information to be
presented to a user via an information access system is ranked
according to a prediction of the likely degree of relevance to the
user's interests. A profile of interests is stored for each user
having access to the system. Using this profile, items of
information to be presented to the user, e.g., messages in an
electronic mail network or documents within a particular bulletin
board category, are ranked according to their likely degree of
relevance and displayed with an indication of their relative
ranking. For example, they can be displayed in order of rank.
The prediction of relevance is carried out by combining data
pertaining to one or more attributes of each item of information
with other data regarding correlations of interests between users.
For example, a value indicative of the content of a document can be
added to another value which defines user correlation, to produce a
ranking score for a document. Other information evaluation
techniques, such as multiple regression analysis or evolutionary
programming, can alternatively be employed to evaluate various
factors pertaining to document content and user correlation, and
thereby generate a prediction of relevance.
The user correlation data is obtained through feedback information
provided by users when they retrieve items of information.
Preferably, the user provides an indication of interest in each
document which he or she retrieves from the system.
The relevance predicting technique of the present invention is
applicable to all different types of information access systems.
For example, it can be employed to filter messages provided to a
user in an electronic mail system and search results obtained
through an on-line text retrieval service. Similarly, it can be
employed to route relevant documents to users in a bulletin board
system.
The foregoing features of the invention, as well as the advantages
offered thereby, are explained in greater detail hereinafter with
reference to exemplary implementations illustrated in the
accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a general diagram of the hardware architecture of one
type of information access system in which the present invention
can be implemented;
FIG. 2 is a block diagram of an exemplary software architecture for
a server program;
FIG. 3 is an example of an interface window for presenting a sorted
list of messages to a user;
FIG. 4 is an example of an interface window for presenting the
contents of a message to a user;
FIG. 5A is a graph of content vectors for two documents in a
two-term space;
FIG. 5B is a graph of user profile vectors in a two-term space;
FIG. 6 illustrates the generation of a correlation chart; and
FIG. 7 is an example of an interface window for a movie
recommendation database.
DETAILED DESCRIPTION
To facilitate an understanding of the principles of the present
invention, they are described hereinafter with reference to the
implementation of the invention in a system having multiple
personal computers that are connected via a network. It will be
appreciated, however, that the practical applications of the
invention are not limited to this particular environment. Rather,
the invention can find utility in any situation which provides for
computer access to information. For example, it is equally
applicable to other types of multiuser computer systems, such as
mainframe and mini-computer systems in which many users can have
simultaneous access to the same computer.
The present invention can be employed in various kinds of
information access systems, such as electronic mail, bulletin
board, text search and others. Depending upon the type of system, a
variety of different types of information might be available for
access by users. In addition to more conventional types of
information that are immediately interpretable by a person, such as
text, graphics and sound, for example, the accessible information
might also include data and/or software objects, such as scripts,
rules, data objects in an object-oriented programming environment,
and the like. For ease of understanding, in the following
description, the term "message" is employed in a generic manner to
refer to each item of information that is provided by and
accessible to users, whether or not its contents can be readily
comprehended by the person receiving it. A message, therefore, can
be a memorandum or note that is addressed from one user of an
electronic mail system to another, a textual and/or graphical
document, or a video clip. A message can also be a data structure
or any other type of accessible information.
One example of a hardware architecture for an information access
system implementing the present invention is illustrated in FIG. 1.
The specific hardware arrangement does not form part of the
invention itself. Rather, it is described herein to facilitate an
understanding of the manner in which the features of the invention
interact with the other components of an information access system.
The illustrated architecture comprises a client-server arrangement,
in which a database of information is stored at a server computer
10, and is accessible through various client computers 12, 14. The
server 10 can be any suitable micro, mini or mainframe computer
having sufficient storage capacity to accommodate all of the items
of information to be presented to users. The client computers can
be suitable desktop computers 12 or portable computers 14, e.g.,
notebook computers, having the ability to access the server
computer 10. Such access might be provided, for example, via a
local area network or over a wide area through the use of modems,
telephone lines, and/or wireless communications.
Each client computer is associated with one or more users of the
information access system. It includes a suitable communication
program that enables the user to access messages stored at the
server machine. More particularly, the client program may request
the user to provide a password or the like, by means of which the
user is identified to the server machine. Once the user has been
identified as having authorized access to the system, the client
and server machines exchange information through suitable
communication protocols.
One particular type of information access system in which the
present can be utilized is described in detail hereinafter. It will
be appreciated that this description is for exemplary purposes
only, and that the practical applications of the invention are not
limited to this particular embodiment.
The general architecture of a server program for an information
access system is illustrated in block diagram form in FIG. 2.
Referring thereto, at the highest level the server program contains
a message server 16. The message server carries out communications
with each of the clients, for example over a network, and retrieves
information from two databases, a user database 18 and a message
database 20. The user database 18 contains a profile for each of
the system's users, as described in greater detail hereinafter. The
message database contains stored messages 22 supplied by and to
users of the database. In addition, the message database has
associated therewith an index 24, which provides a representation
of each of the stored messages 22, for example its title. The index
can contain other information pertinent to the stored messages as
well.
In the operation of the system, when a user desires to retrieve
messages, the user accesses the system through the client program
on one of the client machines 12, 14. As part of the access
procedure, the user may be required to log into the system. Through
the use of a password or other appropriate form of identification,
the user's identity is provided to the server 10, which
acknowledges the user's right to access the system or disconnects
the client machine if the user has not been authorized. When the
access procedure is successful, the message server 16 on the server
machine retrieves the user's profile from the user database 18.
This profile is used to rank the messages stored within the system.
The particular information within the user's profile is based upon
a ranking technique that is described in detail hereinafter. Once
the user's profile is retrieved, all of the messages to be provided
to the user are ranked on the basis of a predicted degree of
relevance to the user. For example, in an e-mail system, all of the
messages addressed to that user are ranked. Those messages which
are particularly pertinent to the user's interests are highly
ranked, whereas junk mail messages are given a low ranking.
A list of the ranked messages is provided to the client program,
which displays some number of them through a suitable interface.
Preferably, the messages are sorted and displayed in order from the
highest to the lowest ranking. One example of such an interface is
illustrated in FIG. 3. Referring thereto, the interface comprises a
window 26 containing a number of columns of information. The left
hand column 28 indicates the relative ranking score of each
message, for example in the form of a horizontal thermometer-type
bar 30. The remaining columns can contain other types of
information that may assist the user in determining whether to
retrieve a particular message, such as the date on which the
message was posted to the system, the message's author, and the
title of the message. The information that is displayed within the
window can be stored as part of the index 24. If the number of
messages is greater than that which can be displayed in a single
window, the window can be provided with a scroll bar 32 to enable
the user to scroll through and view all of the message titles.
Other display techniques can be employed in addition to, or in lieu
of, sorting the messages in order of rank. For example, the color,
size and/or intensity of each displayed message can be varied in
accordance with its predicted relevance.
When the user desires to view any particular message, the desired
message is selected within the window, using any suitable technique
for doing so. Once a message has been selected by the user, the
client program informs the server 10 of the selected message. In
response thereto, the server retrieves the complete text of the
message from the stored file 22, and forwards it to the client,
where it is displayed.
An example of an interface for the display of a message is
illustrated in FIG. 4. Referring thereto, the message can be
displayed in an appropriate window 34. The contents of the message,
e.g., its text, is displayed in the main portion of the window.
Located above this main portion is header 36 which contains certain
information regarding the message. For example, the header can
contain the same information as provided in the columns shown in
the interface of FIG. 3, i.e., author, date and title. Located to
the right of this information are two icons which permit the user
to indicate his or her interest in that particular message. If the
user found the message to be of interest, a "thumbs-up" icon 38 can
be selected. Alternatively, if the message was of little of no
interest to the user, a "thumbs-down" icon 40 can be selected. When
either of these two icons is selected, the indication provided
thereby is forwarded to the server 10, where it is used to update
the user profile.
In the example of FIG. 4, the user is provided with only two
possible selections for indicating interest, i.e., "thumbs-up" or
"thumbs-down", resulting in very coarse granularity for the
indication of interest. If desired, finer resolution can be
obtained by providing additional options for the user. For example,
three options can be provided to enable the user to indicate high
interest, mediocre interest, or minimal interest.
Preferably, in order to obtain reliable information about each
user, it is desirable to have the user provide an indication of
degree of interest for each message which has been retrieved. To
this end, the interface provided by the client program can be
designed such that the window 34 containing the content of the
message, as illustrated in FIG. 4, cannot be closed unless one of
the options is selected. More particularly, the window illustrated
in FIG. 4 does not include a conventional button or the like for
enabling the window to be closed. To accomplish this function, the
user is required to select one of the two icons 38 or 40 which
indicates his or her degree of interest in the message. When one of
the icons is selected, the window is closed and the message
disappears from the screen. With this approach, each time a message
is retrieved, feedback information regarding the user's degree of
interest is obtained, to thereby maintain an up-to-date profile for
the user.
Depending upon the particular information access system that is
being used, the type of information presented to the user may vary.
In the embodiment illustrated in FIGS. 1 and 2, all items of
information available to users can be stored in a single database
22. If desired, multiple databases directed to specific categories
of information can be provided. For example, a separately
accessible database of movie descriptions can be provided, to make
movie recommendations to users. Each separate database can have its
own profile for users who access that database. Thus, each time a
user sees a movie, he or she can record his or her reaction to it,
e.g., like or dislike. This information is used to update the
user's profile for the movie database, as well as provide
information to rank that movie for viewing by other users whose
interests in movies are similar or opposed. An example of a user
interface for presenting this information is shown in FIG. 7.
Referring thereto, it can be seen that the title of each movie is
accompanied by a recommendation score 46. This particular example
also illustrates a different technique for quantifying the
relevance ranking of each item. Specifically, the scores 46 are
negative as well as positive. This approach may be more desirable
for certain types of information, for example, to provide a clearer
indication that the viewer will probably dislike certain movies.
The values that are used for the ranking display can be within any
arbitrarily chosen range.
Traditionally, the ranking of messages was based only on the
content of the messages. In accordance with the present invention,
however, the ranking of messages is carried out by combining data
based upon an attribute of the message, for example its content,
with other data relating to correlations of indications provided by
users who have retrieved the message. To derive the content-based
data, certain elements of the message, e.g., each word in a
document, can be assigned a weight, based on its statistical
importance. Thus, for example, words which frequently occur in a
particular language are given a low weight value, while those which
are rarely used have a high weight value. The weight value for each
term is multiplied by the number of times that term occurs in the
document. Referring to FIG. 5A, the results of this procedure is a
vector of weights, which represents the content of the
document.
For non-document types of information, the content data can be
based upon other attributes that are relevant to a user's interest
in that information. For example, in the movie database, the
content vector might take into account the type of movie, such as
action or drama, the actors, its viewer category rating, and the
like.
The example of FIG. 5A illustrates a two-dimensional vector for
each of two documents. In practice, of course, the vectors for
information content would likely have hundreds or thousands of
dimensions, depending upon the number of terms that are monitored.
For further information regarding the computation of vector models
for indexing text, reference is made to Introduction To Modern
Information Retrieval by Gerald Salton and Michael J. McGill
(McGraw-Hill 1983), which is incorporated herein by reference.
Each user profile also comprises a vector, based upon the user's
indications as to his or her relative interest in previously
retrieved documents. Each time a user provides a new response to a
retrieved message, the profile vector is modified in accordance
with the results of the indication. For example, if the user
indicates interest in a document, all of the significant terms in
that document can be given increased weight in the user's
profile.
Each user in the system will have at least one profile, based upon
the feedback information received each time the user accesses the
system. If desirable, a single user might have two or more
different profiles for different task contexts. For example, a user
might have one profile for work-related information and a separate
profile for messages pertaining to leisure and hobbies.
One factor in the prediction of a user's likely interest in a
particular piece of information can be based on the similarity
between the document's vector and the user's profile vector. For
example, as shown in FIG. 5B, a score of a document's relevance can
be indicated by the cosine of the angle between the document's
vector and the user's profile vector. A document having a vector
which is close to that of the user's profile will be highly ranked,
whereas those which are significantly different will have a lower
ranking.
A second factor in the prediction of a user's interest in
information is based upon a correlation with the indications
provided by other users. Referring to FIG. 6, each time a user
retrieves a document and subsequently provides an indication of
interest, the result can be stored in a table 42. From this table,
a correlation matrix R can be generated, whose entries indicate the
degree of correlation between the various users' interests in
commonly retrieved messages. More precisely, element R.sub.ij
contains a measure of correlation between the i-th user and the
j-th user. One example of such a matrix is the correlation matrix
illustrated at 44 in FIG. 6. In this example, only the relevant
entries are shown. That is, the correlation matrix is symmetric,
and the diagonal elements do not provide any additional information
for ranking purposes.
Subsequently, when a user accesses the system, the feedback table
42 and the correlation matrix 44 are used as another factor in the
prediction of the likelihood that the user will be interested in
any given document. As one example of an algorithm that can be used
for this purpose, a prediction score, P.sub.ij for the i-th user
regarding the j-th document, can be computed as:
.noteq..times..times. ##EQU00001## where R.sub.ik is the
correlation of users i and k, the V.sub.kj is the weight indicating
the feedback of user k on document j. Thus, for the corresponding
data in FIG. 6, the prediction score for User C regarding Document
1 is as follows: (0.00*1)+(-0.33*1)+(-1.00*-1)=0.67 In this
formula, each parenthetical product pertains to one of the other
users, i.e., A, B and D, respectively. Within each product, the
first value represents the degree of correlation between the other
user and the current user in question, as indicated by the matrix
44. The second value indicates whether the other user voted
favorably (+1) or negatively (-1) after reading the document, as
indicated in the table 42. The values of +1 and -1 are merely
exemplary. Any suitable range of values can be employed to indicate
various users' interests in retrieved items of information.
In accordance with the invention, a combination of attribute-based
and correlation-based prediction is employed to rank the relevance
of each item of information. For example, a weighted sum of scores
that are obtained from each of the content and correlation
predictors can be used, to determine a final ranking score. Other
approaches which take into account both the attribute-based
information and user correlation information can be employed. For
example, multiple regression analysis can be utilized to combine
the various factors. In this approach, regression methods are
employed to identify the most important attributes that are used as
predictors, e.g., salient terms in a document and users having
similar feedback responses, and how much each one should be
weighted. Alternatively, principal components analysis can be used
to identify underlying aspects of content-based and
correlation-based data that predict a score.
As another example, evolutionary programming techniques can be
employed to analyze the available data regarding content of
messages and user correlations. One type of evolutionary
programming that is suitable in this regard is known as genetic
programming. In this type of programming, data pertaining to the
attributes of messages and user correlation are provided as a set
of primitives. The various types of data are combined in different
manners and evaluated, until the combination which best fits known
results is found. The result of this combination is a program that
describes the data which can best be used to predict a given user's
likely degree of interest in a message. For further information
regarding genetic programming, reference is made to Koza, John R.,
Genetic Programming: On The Programming of Computers By Means of
Natural Selection, MIT Press 1992.
In a more specific implementation of evolutionary programming, the
analysis technique known as genetic algorithms can be employed.
This technique differs from genetic programming by virtue of the
fact that pre-defined parameters pertaining to the items of
information are employed, rather than more general programming
statements. For example, the particular attributes of a message
which are to be utilized to define the prediction formula can be
established ahead of time, and employed in the algorithms. For
further information regarding this technique, reference is made to
Goldberg, David E., Genetic Algorithms in Search, Optimization and
Machine Learning, Addison-Wesley 1989.
In addition to content and correlation scores, other attributes can
be employed. For example, event times can be used in the ranking
equation, where older items might get lower scores. If a message is
a call for submitting papers to a conference, its score might rise
as the deadline approached, then fall when it had passed. These
various types of data can be combined using any of the data
analysis techniques described previously, as well as any other
well-known analysis technique.
From the foregoing, it can be seen that the present invention
provides a system for ranking information which is not based on
only one factor, namely content. Rather, a determination is made on
the basis of a combination of factors. In a preferred
implementation, the present invention provides for social
interaction within the community of users, since each individual
can benefit from the experiences of others. A user who has written
about a particular topic is more likely to have other messages
relating to that same topic presented to him or her, without
awareness of the authors of these other items of information.
The invention takes advantage of the fact that a community of users
is participating in the presentation of information to users. In
current systems, if a large number of readers each believe a
message is significant, any given user is no more likely to see it
than any other message. Conversely, the originator of a relatively
uninteresting idea can easily broadcast it to a large number of
people, even though they may have no desire to see it. In the
system of the present invention, however, the relevance score of a
particular message takes into account not only on the user's own
interests, but also feedback from the community.
To facilitate an understanding of the invention, its principles
have been explained with reference to specific embodiments thereof.
It will be appreciated, however, that the practical applications of
the invention are not limited to these particular embodiments. The
scope of the invention is set forth in the following claims, rather
than the foregoing description, and all equivalents which are
consistent with the meaning of the claims are intended to be
embraced therein.
* * * * *