System And Method Of Using Movie Taste For Compatibility Matching Wallisch; Pascal [PROMETHEAN VENTURES, LLC]

System And Method Of Using Movie Taste For Compatibility Matching

Wallisch; Pascal

Patent Application Summary

U.S. patent application number 12/513651 was filed with the patent office on 2010-08-05 for system and method of using movie taste for compatibility matching. This patent application is currently assigned to PROMETHEAN VENTURES, LLC. Invention is credited to Pascal Wallisch.

Application Number	20100198773 12/513651
Document ID	/
Family ID	39512382
Filed Date	2010-08-05

United States Patent Application	20100198773
Kind Code	A1
Wallisch; Pascal	August 5, 2010

SYSTEM AND METHOD OF USING MOVIE TASTE FOR COMPATIBILITY MATCHING

Abstract

A method of predicting the compatibility of at least one item of interest to a user of a web-based system. The method includes the steps of providing a survey of items for rating by the system user, collecting a set of ratings for the survey of items from the system user, collecting a set of ratings for the survey of items from each of a plurality of raters, calculating a correlation coefficient between the system user and each of the plurality of raters to obtain a set of correlation coefficients for the survey of items, selecting a group of raters from the plurality of raters, the group of raters selected on the basis that each member of the group of raters has provided a rating of the at least one item of interest and predicting the compatibility of the at least one item of interest to the system user from the ratings provided by the group of raters and the correlation coefficients calculated between the system user and each of the group of raters.

Inventors:	Wallisch; Pascal; (New York, NY)
Correspondence Address:	ROBERTS MLOTKOWSKI SAFRAN & COLE, P.C.;Intellectual Property Department P.O. Box 10064 MCLEAN VA 22102-8064 US
Assignee:	PROMETHEAN VENTURES, LLC Chicago IL
Family ID:	39512382
Appl. No.:	12/513651
Filed:	November 6, 2007
PCT Filed:	November 6, 2007
PCT NO:	PCT/US07/83712
371 Date:	March 25, 2010

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
60856831	Nov 6, 2006

Current U.S. Class:	706/54 ; 455/466
Current CPC Class:	G06Q 30/02 20130101
Class at Publication:	706/54 ; 455/466
International Class:	G06N 5/02 20060101 G06N005/02

Claims

1. A method of predicting the compatibility of at least one item of interest to a user of a web-based system, comprising the steps of: (a) providing a survey of items for rating by the system user; (b) collecting a set of ratings for the survey of items from the system user; (c) collecting a set of ratings for the survey of items from each of a plurality of raters; (d) calculating a correlation coefficient between the system user and each of the plurality of raters to obtain a set of correlation coefficients for the survey of items; (e) selecting a group of raters from the plurality of raters, the group of raters selected on the basis that each member of the group of raters has provided a rating of the at least one item of interest; and (f) predicting the compatibility of the at least one item of interest to the system user from the ratings provided by the group of raters and the correlation coefficients calculated between the system user and each of the group of raters.

2. The method of claim 1, wherein the survey of items is a list of movies.

3. The method of claim 2, wherein the plurality of raters are selected from a list of system users.

4. The method of claim 2, wherein the plurality of raters are selected from a group of movie critics who have each rated the list of movies.

5. The method of claim 4, wherein the set of ratings for the survey of items is obtained using at least one web crawler.

6. The method of claim 1, wherein said step of calculating a correlation coefficient between the system user and each of the plurality of raters to obtain a set of correlation coefficients for the survey of items comprises the steps of: (i) obtaining a list of items from the survey of items that the system user and the plurality of raters have rated; (ii) storing the list of items obtained in step (i) in a temporary list in the form of rating pairs; (iii) calculating a mean user rating of survey items and a mean rater rating for each of the plurality of raters of survey items from the temporary list; (iv) calculating the difference between each user rating from the survey of items and the mean user rating; and (v) calculating the difference between each rater rating from the survey of items and the mean rater rating.

7. The method of claim 6, wherein said step of calculating a correlation coefficient between the system user and each of the plurality of raters to obtain a set of correlation coefficients for the survey of items further comprises the steps of: (vii) multiplying the difference between each user rating and the mean user rating and the difference between each rater rating and the mean rater rating for each movie rate; (viii) summing the multiplied differences from step (vii) to determine a coefficient of variance; (ix) dividing the sum obtained in step (viii) by the number of items on the temporary list to arrive at a mean coefficient of variance; (x) calculating a first standard deviation for the system user ratings on the temporary list and a second standard deviation of the rater ratings on the temporary list; (xi) calculating the product of the first and second standard deviations; and (xii) dividing the mean coefficient of variance obtained in step (ix) by the product of the first and second standard deviations obtained in step (xi) to determine a correlation coefficient between a system user and a rater.

8. The method of claim 7, wherein said step of predicting the compatibility of the at least one item of interest to the system user from the ratings provided by the group of raters and the correlation coefficients calculated between the system user and each of the group of raters comprises the steps of: (i) selecting each system rater from the plurality of raters that has rated the item of interest; (ii) selecting and storing the correlation coefficients determined in step (d) between the system user and the system raters selected in step (i); (iii) raising each correlation coefficient to the power of k to obtain a weight; (iv) calculating the sum of all weights; (v) multiplying each weight by a corresponding rating and summing the product of each weight and corresponding rating to yield a raw score; and (vi) dividing the raw score of step (v) by the sum of all weights determined in step (iv) to obtain an estimate of prediction of compatibility of the at least one item of interest to the system user.

9. The method of claim 8, wherein said step of predicting the compatibility of the at least one item of interest to the system user from the ratings provided by the group of raters and the correlation coefficients calculated between the system user and each of the group of raters further comprises the steps of: (vii) calculating an average item rating for the system user; (viii) calculating an average item rating for each system rater selected in step (i); (ix) calculating a correction factor by subtracting the average item rating determined in step (viii) from the average item rating determined in step (vii); and (x) adding the correction factor determined in step (ix) to the estimate of rating prediction determined in step (vi) to obtain the prediction of compatibility of the at least one item of interest to the system user.

10. The method of claim 1, wherein said step of predicting the compatibility of the at least one item of interest to the system user from the ratings provided by the group of raters and the correlation coefficients calculated between the system user and each of the group of raters comprises the steps of: (i) selecting each system rater from the plurality of raters that has rated the item of interest; (ii) selecting and storing the correlation coefficients determined in step (d) between the system user and the system raters selected in step (i); (iii) raising each correlation coefficient to the power of k to obtain a weight; (iv) calculating the sum of all weights; (v) multiplying each weight by a corresponding rating and summing the product of each weight and corresponding rating to yield a raw score; and (vi) dividing the raw score of step (v) by the sum of all weights determined in step (iv) to obtain an estimate of prediction of compatibility of the at least one item of interest to the system user.

11. The method of claim 10, wherein said step of predicting the compatibility of the at least one item of interest to the system user from the ratings provided by the group of raters and the correlation coefficients calculated between the system user and each of the group of raters further comprises the steps of: (vii) calculating an average item rating for the system user; (viii) calculating an average item rating for each system rater selected in step (i); (ix) calculating a correction factor by subtracting the average item rating determined in step (viii) from the average item rating determined in step (vii); and (x) adding the correction factor determined in step (ix) to the estimate of rating prediction determined in step (vi) to obtain the prediction of compatibility of the at least one item of interest to the system user.

12. The method of claim 1, further comprising the step of: (g) providing a list of items of interest to the system user and their corresponding predictions of compatibility over the mobile web to a web-enabled handheld device.

13. The method of claim 12, wherein the list of items of interest is a list of most recently released movies.

14. The method of claim 1, further comprising the step of: (g) providing a list of items of interest to the system user and their corresponding predictions of compatibility over the cellular telephone network to a cellular telephone via text message.

15. The method of claim 14, wherein the list of items of interest is a list of most recently released movies.

16. In a web-based system, a method of predicting the compatibility of a first user of the system to at least one other user of the system, comprising the steps of: (a) providing a survey of items for rating by the first system user; (b) collecting a set of ratings for the survey of items from the first system user; (c) collecting a set of ratings for the survey of items from each of a plurality of raters; (d) calculating a correlation coefficient between the first system user and each of the plurality of raters to obtain a set of correlation coefficients for the survey of items; (e) predicting the compatibility of the first system user to at least one of the plurality of raters from the correlation coefficients calculated between the system user and each of the group of raters; and (f) providing to the first system user at least one other user selected on the basis of correlation to the first system user from the plurality of raters.

17. The method of claim 16, wherein the survey of items is a list of movies.

18. The method of claim 17, wherein the plurality of raters are selected from a list of system users.

19. The method of claim 16, wherein said step of calculating a correlation coefficient between the system user and each of the plurality of raters to obtain a set of correlation coefficients for the survey of items comprises the steps of: (i) obtaining a list of items from the survey of items that the system user and the plurality of raters have rated; (ii) storing the list of items obtained in step (i) in a temporary list in the form of rating pairs; (iii) calculating a mean user rating of survey items and a mean rater rating for each of the plurality of raters of survey items from the temporary list; (iv) calculating the difference between each user rating from the survey of items and the mean user rating; and (v) calculating the difference between each rater rating from the survey of items and the mean rater rating.

20. The method of claim 19, wherein said step of calculating a correlation coefficient between the system user and each of the plurality of raters to obtain a set of correlation coefficients for the survey of items further comprises the steps of: (vii) multiplying the difference between each user rating and the mean user rating and the difference between each rater rating and the mean rater rating for each movie rate; (viii) summing the multiplied differences from step (vii) to determine a coefficient of variance; (ix) dividing the sum obtained in step (viii) by the number of items on the temporary list to arrive at a mean coefficient of variance; (x) calculating a first standard deviation for the system user ratings on the temporary list and a second standard deviation of the rater ratings on the temporary list; (xi) calculating the product of the first and second standard deviations; and (xii) dividing the mean coefficient of variance obtained in step (ix) by the product of the first and second standard deviations obtained in step (xi) to determine a correlation coefficient between a system user and a rater.

21. A multi-user web-based computer system for predicting the compatibility of at least one item of interest to a user of the system, comprising: (a) a web server for communicating with users of the web-based computer system and components thereof; (b) a user profile database for storing information on system users; (c) a user rating module for providing a survey of items for rating by the system user and collecting a set of ratings for storage within said user profile database; (d) a computational process module, said computational process module having a correlation module for calculating a correlation coefficient between the system user and each of a plurality of raters to obtain a set of correlation coefficients for the survey of items and a predictive module for predicting the compatibility of the at least one item of interest to the system user from the ratings provided by the plurality of raters and the correlation coefficients calculated between the system user and each of the plurality of raters; (e) a compatibility-based matched items table for receiving an output regarding the compatibility of the at least one item of interest to the system user from the computational process module; and (f) a recommendation process module for receiving information from said compatibility-based matched items table and returning the information to said web server for transmitting to the system user.

22. The web-based computer system of claim 21, wherein the survey of items is a list of movies.

23. The web-based computer system of claim 22, wherein the plurality of raters are selected from a list of system users.

24. The web-based computer system of claim 22, wherein the plurality of raters are selected from a group of movie critics who have each rated the list of movies.

25. The web-based computer system of claim 24, further comprising a web crawler for obtaining the set of ratings for the survey of items.

26. The web-based computer system of claim 21, wherein said correlation module of said computational process module comprises: (i) means for obtaining a list of items from the survey of items that the system user and the plurality of raters have rated; (ii) means for storing the list of items in a temporary list in the form of rating pairs; (iii) means for calculating a mean user rating of survey items and a mean rater rating for each of the plurality of raters of survey items from the temporary list; (iv) means for calculating the difference between each user rating from the survey of items and the mean user rating; and (v) means for calculating the difference between each rater rating from the survey of items and the mean rater rating.

27. The web-based computer system of claim 26, wherein said correlation module of said computational process module further comprises: (vii) means for multiplying the difference between each user rating and the mean user rating and the difference between each rater rating and the mean rater rating for each movie rate; (viii) means for summing the multiplied differences from step (vii) to determine a coefficient of variance; (ix) means for dividing the sum obtained said summing means by the number of items on the temporary list to arrive at a mean coefficient of variance; (x) means for calculating a first standard deviation for the system user ratings on the temporary list and a second standard deviation of the rater ratings on the temporary list; (xi) means for calculating the product of the first and second standard deviations; and (xii) means for dividing the mean coefficient of variance by the product of the first and second standard deviations to determine a correlation coefficient between a system user and a rater.

28. The web-based computer system of claim 27, wherein said predictive module of said computational process module comprises: (i) means for selecting each system rater from the plurality of raters that has rated the item of interest; (ii) means for selecting and storing the correlation coefficients between the system user and the system raters selected by said means for selecting each system rater determined by said correlation module of said computational process module; (iii) means for raising each correlation coefficient to the power of k to obtain a weight; (iv) means for calculating the sum of all weights; (v) means for multiplying each weight by a corresponding rating and summing the product of each weight and corresponding rating to yield a raw score; and (vi) means for dividing the raw score by the sum of all weights determined to obtain an estimate of prediction of compatibility of the at least one item of interest to the system user.

29. The web-based computer system of claim 28, wherein said predictive module of said computational process module further comprises: (vii) means for calculating an average item rating for the system user; (viii) means for calculating an average item rating for each system rater selected by said means for selecting each system rater; (ix) means for calculating a correction factor by subtracting the average item rating for each system rater from the average item rating for the system user; and (x) means for adding the correction factor to the estimate of rating prediction to obtain the prediction of compatibility of the at least one item of interest to the system user.

30. The web-based computer system of claim 21, wherein said predictive module of said computational process module comprises: (i) means for selecting each system rater from the plurality of raters that has rated the item of interest; (ii) means for selecting and storing the correlation coefficients between the system user and the system raters selected by said means for selecting each system rater determined by said correlation module of said computational process module; (iii) means for raising each correlation coefficient to the power of k to obtain a weight; (iv) means for calculating the sum of all weights; (v) means for multiplying each weight by a corresponding rating and summing the product of each weight and corresponding rating to yield a raw score; and (vi) means for dividing the raw score by the sum of all weights determined to obtain an estimate of prediction of compatibility of the at least one item of interest to the system user.

31. The web-based computer system of claim 30, wherein said predictive module of said computational process module further comprises: (vii) means for calculating an average item rating for the system user; (viii) means for calculating an average item rating for each system rater selected by said means for selecting each system rater; (ix) means for calculating a correction factor by subtracting the average item rating for each system rater from the average item rating for the system user; and (x) means for adding the correction factor to the estimate of rating prediction to obtain the prediction of compatibility of the at least one item of interest to the system user.

32. The web-based computer system of claim 21, further comprising: (g) means for providing a list of items of interest to the system user and their corresponding predictions of compatibility over the mobile web to a web-enabled handheld device.

33. The web-based computer system of claim 32, wherein the list of items of interest is a list of most recently released movies.

34. The web-based computer system of claim 21, further comprising: (g) means for providing a list of items of interest to the system user and their corresponding predictions of compatibility over the cellular telephone network to a cellular telephone via text message.

35. The web-based computer system of claim 34, wherein the list of items of interest is a list of most recently released movies.

Description

CROSS REFERENCE TO RELATED CASES

[0001] The present invention claims priority under 35 U.S.C. .sctn.119(e) to U.S. Provisional Patent Application Ser. No. 60/856,831 of WALLISCH, entitled "SYSTEM AND METHOD OF USING MOVIE TASTE FOR COMPATIBILITY MATCHING," filed on Nov. 6, 2006, the entire contents of which is incorporated by reference herein.

FIELD OF THE INVENTION

[0002] The present invention relates generally to methods and systems for predicting user preferences. More specifically, the present invention is directed to the use of movie taste ratings in arriving at a compatible match.

BACKGROUND OF THE INVENTION

[0003] People, when in the process of selecting an item, such as a movie to view, a book to read, music to listen to or any other item of content to experience, are often overwhelmed by the sheer volume of available selections. A critical question is how to benefit from the available information relating to such a selection, without getting bogged down by the overwhelming volume.

[0004] One possibility is to make use of the opinions that others have formed when experiencing an individual content item. Taken together, the collection of such opinions becomes a resource that could be used to sift through the available items. In the physical world, this technique is applied informally, through word-of-mouth and through forwarded mail, news, and web pages in the virtual world. However, these informal processes are not powerful enough to deal with the volume of new content being created.

[0005] To assist with this problem, recommendation services are available. A recommendation service can be a computer-implemented service that recommends items from a database of items. The recommendations are customized to particular users based on information known about the users. One common application for recommendation services involves recommending products to online customers. For example, online merchants commonly provide services for recommending products to customers based on profiles that have been developed for such customers. Recommendation services are also common for recommending websites, articles, and other types of informational content to users.

[0006] One technique commonly used by recommendation services is known as content-based filtering. Pure content-based systems operate by attempting to identify items which, based on an analysis of item content, are similar to items that are known to be of interest to the user. For example, a content-based website recommendation service may operate by parsing the user's favorite web pages to generate a profile of commonly-occurring terms, and then use this profile to search for other web pages that include some or all of these terms.

[0007] Another common recommendation technique is known as collaborative filtering. Collaborative filtering seeks to understand the relationships between people and to use those relationships to help people meet their needs more effectively. Ratings are entered by the user to indicate his or her opinion of the item of content to the collaborative filtering system. Based on previously entered ratings by other users, predictions are made for a user of the value of an item to that user. Ratings often represent the user's evaluation of the item of content along one or more dimensions. There are many possible dimensions, including overall enjoyment, value to the task at hand, interest in the topic, reputation of the author or producer, appropriateness for the context, quality of the material, etc. Ratings along each of these dimensions can be either explicit, requiring special user interaction, or implicit, captured from ordinary user actions.

[0008] The most common explicit rating methods in collaborative filtering systems are single keystrokes entered by users. The keystrokes usually represent values along a single ordered dimension. Ratings can also be entered through graphical sliders, which are similar, except that they often support more possible values. Another common rating method is textual ratings. Textual ratings are either keyword or free-form. Keyword textual ratings often focus on characterizing the topic. Keyword textual ratings that focus on measuring the quality are very similar to keystroke ratings. Free-form textual ratings can be valuable for users, but are difficult to process automatically. Free-form textual ratings are more common in domains in which the total number of documents is relatively low, so users can peruse a substantial fraction of them.

[0009] Implicit ratings may be collected by non-intrusively monitoring the user's use of the item of content. Observations about what the user does with the content may lead to insights into the value of the content to the user. For instance, if a user reads the title or abstract of a document, but chooses not to read the document, that may indicate low interest in the topic of the document. On the other hand, if the user chooses to save a document to a file, or to forward it to a colleague, that may indicate higher interest in the document. The time that a user spends reading a document is another implicit rating. Intuitively, users are likely to spend longer with documents they find valuable than with documents they find uninteresting.

[0010] Collaborative filtering systems have largely focused on explicit ratings. In small tightly focused groups with substantial shared interests, textual ratings have proven valuable. However, in larger groups with more diverse interests, a more structured ratings system with automatic computation of personalized predictions would be beneficial.

[0011] In a system using explicit ratings, the user responds to each item with a keystroke or other indication of preference. The system uses the user's response to influence its prediction algorithms for this user in the future. Users can informally combine their ratings along any of the possible ratings dimensions to create this single rating. Existing prediction algorithms do a good job of making predictions for users based on explicit ratings along this single dimension. However, there are many known prediction algorithms.

[0012] An area of scientific study which is focused on this problem is known as predictive utility. Predictive utility refers generally to the value of having predictions for an item before deciding whether to invest time or money in consuming that item. The concept is general enough to include physical items such as movies, books or videotapes, as well as information items, such as news articles or web pages. A domain with high predictive utility is one where users will adjust their decisions a great deal based on predictions. A domain with low predictive utility is one where predictions will have little effect on user decisions.

[0013] Predictive utility is a function of the relative quantity of desirable and undesirable items and the quality of predictions. The desirability of an item is a measure of a particular user's personal value for that item. Items are not intrinsically good or bad; an item is good for a user if that user finds it desirable and good in general if a large majority of users finds it desirable.

[0014] The cost-benefit analysis for a consumption decision compares the value of consuming a desirable item, a hit, the cost of missing a desirable item, a miss, the value of skipping over an undesirable item, a correct rejection, and the cost of consuming an undesirable item, a false positive. For watching a movie, the value of finding desirable movies is high to movie fans, but the cost of missing some good ones is low since there are many desirable movies for most movie fans. The cost of false positives is the price of the ticket plus the amount of time before the watcher decides to leave for each one, and the value of correct rejections is high because there are so many undesirable movies that without rejecting many of them it would be impractical to see movies at all. Restaurant selection can be seen to follow a similar pattern, though the risk of going to an undesirable restaurant is higher since you typically still have the meal and the bill. Legal research is very different. The cost of missing a relevant and important precedent is very high, and may outweigh the cost of sifting through all of the potentially relevant cases, especially when that cost is being billed to the client and serves as protection against malpractice.

[0015] The costs of misses and false positives represent the risk involved in making a prediction. The values of hits and correct rejection represent the potential benefit of making predictions. Predictive utility is the difference between the potential benefit and the risk. Thus, the risk of mistakes is lowest for movies and the potential benefit is highest for movies, articles and restaurants.

[0016] One important component of the cost-benefit analysis is the total number of desirable and undesirable items. If 90% of the items being considered are desirable, filtering will generally not add much value because there are few correct rejections and the probability of a hit is high even without a prediction. Of course, in some cases, users may refine their desires to select only the most interesting of the interesting ones given their limited time. On the other hand, if there are many items and only 1% are good, then filtering can add significant value because the aggregate value of correct rejections becomes high.

[0017] The issues involving compatibility matching may be extended beyond the items discussed to matching compatible people with one another. As may be appreciated, good matchmaking is the art of addressing an almost universal problem that has faced humans since ancient times; "How do I find the right life-partner for me?" This issue is of extreme importance, since failure results in some of the most significant sources of human suffering, namely remaining alone or, sometimes no less harmful, the fallout due to the choice of an essentially incompatible mate.

[0018] Given the fundamental structure of the mate choice challenge, such problems should not be surprising. The task of finding an appropriate mate seems almost intractable, even if one only focuses on solely psychological factors. It is conceptually and empirically well established that an enormously complex nervous system gives rise to a complex personality, capable of a wide range of sophisticated behaviors that are hard to predict. Given the intricacies of the nervous system, it comes as no surprise that there is ample room for an almost infinite number of variations, leading to a large variety of complex personalities.

[0019] At the same time, the opportunities for seeking a mate ("dating") are typically limited to one's own surroundings. Depending on the physical setting and the activity of the individual, estimates for the pool of potential mates range from several hundred to several thousand potential partners. Depending on the variability of personality types and the importance of personality in successful mate choice, this sample is most likely too small to allow a good match. Even if one disregards mate-choice relevant factors like physical attractiveness or financial resources and abstracts from complexities like the competition of rivals, and if one grants potential "perfect" mates the cognitive ability to recognize each other, the endeavor is likely to fail due to the lack of a suitable mate in the pool of available potential mates, because of the relatively small size of said pool, compared to the varied and idiosyncratic nature of personalities.

[0020] A natural solution to this vexing problem is to cast a wider net in order to vastly expand the pool of potential mates. Online dating services are a logical solution to this problem, as they allow one to cast the net arbitrarily wide, far beyond the reach of conventional or accidental dating opportunities. A problem for these services is their current lack of ability to match users with people that are truly compatible. In principle, the idea is simple: The huge number of potential mates yielded by online dating services has to be narrowed down to a small number of matches using criteria that are both rather strict, eliminating many unsuitable mates, and relevant. The failure of conventional online dating services can be attributed to a number of simple yet crippling reasons.

[0021] One problem is that they use categories that are too broad. For example, there are dating services catering to specific ethnic or religious groups, like dating services that are aimed at Catholics, Jews or Indian Americans. Such a constraint is much too weak, leaving enough within category variance to render it ineffective. Moreover, the validity of such criteria depends on personal preferences, in other words if religion or race is a parameter in the mate choice search space of the user.

[0022] Another problem is that the questions used are not specific enough. For example, users are asked if they like movies or if they like to listen to music. Ultimately, this is both not constraining enough as well as of rather dubious validity.

[0023] Still another problem lies in the fact that many of the questions employed can't principally be answered introspectively. For example, users are typically asked how outgoing they are. It is highly unlikely that the user can answer this accurately by mere introspection. The basis of his or her answer would necessarily be self-perception while it is unclear how this relates to what the question really asks, namely how outgoing the individual is perceived by others. This problem is compounded by an enormous bias introduced by social desirability. Who would possibly admit that they are not outgoing when looking for a mate?

[0024] Typically, users are presented with a profile of potential mates. There is no compelling evidence that people are good at synthesizing the usually verbose information in a profile into the single relevant parameter: "How likely is it that this a good match?" As a matter of fact, people are notoriously bad at predicting what, and by proxy, who will make them happy.

[0025] From the foregoing it will be apparent that there is still a need for a method and system of predicting the compatibility of at least one item of interest to a person, which bases that prediction on the individual tastes of that person.

SUMMARY OF THE INVENTION

[0026] In one aspect, provided is a method of predicting the compatibility of at least one item of interest to a user of a web-based system. The method includes the steps of providing a survey of items for rating by the system user, collecting a set of ratings for the survey of items from the system user, collecting a set of ratings for the survey of items from each of a plurality of raters, calculating a correlation coefficient between the system user and each of the plurality of raters to obtain a set of correlation coefficients for the survey of items, selecting a group of raters from the plurality of raters, the group of raters selected on the basis that each member of the group of raters has provided a rating of the at least one item of interest and predicting the compatibility of the at least one item of interest to the system user from the ratings provided by the group of raters and the correlation coefficients calculated between the system user and each of the group of raters.

[0027] In another aspect, provided is a method of predicting the compatibility of a first user of the system to at least one other user of the system. The method includes the steps of providing a survey of items for rating by the first system user, collecting a set of ratings for the survey of items from the first system user, collecting a set of ratings for the survey of items from each of a plurality of raters, calculating a correlation coefficient between the first system user and each of the plurality of raters to obtain a set of correlation coefficients for the survey of items, predicting the compatibility of the first system user to at least one of the plurality of raters from the correlation coefficients calculated between the system user and each of the group of raters and providing to the first system user at least one other user selected on the basis of correlation to the first system user from the plurality of raters.

[0028] In yet another aspect, provided is a multi-user web-based computer system for predicting the compatibility of at least one item of interest to a user of the system. The system includes a web server for communicating with users of the web-based computer system and components thereof, a user profile database for storing information on system users, a user rating module for providing a survey of items for rating by the system user and collecting a set of ratings for storage within the user profile database, a computational process module, the computational process module having a correlation module for calculating a correlation coefficient between the system user and each of a plurality of raters to obtain a set of correlation coefficients for the survey of items and a predictive module for predicting the compatibility of the at least one item of interest to the system user from the ratings provided by the plurality of raters and the correlation coefficients calculated between the system user and each of the plurality of raters, a compatibility-based matched items table for receiving an output regarding the compatibility of the at least one item of interest to the system user from the computational process module and a recommendation process module for receiving information from the compatibility-based matched items table and returning the information to the web server for transmitting to the system user.

[0029] In one form, the step of calculating a correlation coefficient between the system user and each of the plurality of raters to obtain a set of correlation coefficients for the survey of items includes the steps of obtaining a list of items from the survey of items that the system user and the plurality of raters have rated, storing the list of items in a temporary list in the form of rating pairs, calculating a mean user rating of survey items and a mean rater rating for each of the plurality of raters of survey items from the temporary list, calculating the difference between each user rating from the survey of items and the mean user rating, calculating the difference between each rater rating from the survey of items and the mean rater rating, multiplying the difference between each user rating and the mean user rating and the difference between each rater rating and the mean rater rating for each movie rate, summing the multiplied differences to determine a coefficient of variance, divided the sum so obtained by the number of items on the temporary list to arrive at a mean coefficient of variance, calculating a first standard deviation for the system user ratings on the temporary list and a second standard deviation of the rater ratings on the temporary list, calculating the product of the first and second standard deviations and dividing the mean coefficient of variance by the product of the first and second standard deviations to determine a correlation coefficient between a system user and a rater.

[0030] In another form, the step of predicting the compatibility of the at least one item of interest to the system user from the ratings provided by the group of raters and the correlation coefficients calculated between the system user and each of the group of raters includes the steps of selecting each system rater from the plurality of raters that has rated the item of interest, selecting and storing the correlation coefficients between the system user and the system raters selected, raising each correlation coefficient to the power of k to obtain a weight, calculating the sum of all weight, multiplying each weight by a corresponding rating and summing the product of each weight and corresponding rating to yield a raw score, dividing the raw score by the sum of all weights to obtain an estimate of prediction of compatibility of the at least one item of interest to the system user.

[0031] In yet another form, the step of predicting the compatibility of the at least one item of interest to the system user from the ratings provided by the group of raters and the correlation coefficients calculated between the system user and each of the group of raters further includes the steps of calculating an average item rating for the system user, calculating an average item rating for each system rater selected, calculating a correction factor by subtracting the second average item rating from the first average item rating determined and adding the correction factor to the estimate of rating prediction to obtain the prediction of compatibility of the at least one item of interest to the system user.

[0032] In still yet another form, a list of items of interest to the system user and their corresponding predictions of compatibility are provided over the mobile web to a web-enabled handheld device.

[0033] In further form, a list of items of interest to the system user and their corresponding predictions of compatibility are provided over the cellular telephone network to a cellular telephone via text message.

[0034] These and other features are described herein with specificity so as to make the present invention understandable to one of ordinary skill in the art.

BRIEF DESCRIPTION OF THE DRAWINGS

[0035] The invention is further explained in the description that follows with reference to the drawings illustrating, by way of non-limiting examples, various embodiments of the invention wherein:

[0036] FIG. 1 is a schematic presenting the fundamental item rating scenario;

[0037] FIG. 2 presents a graphical depiction of a solution to the problem of compatibility matching involving a user, expert raters and items to be rated;

[0038] FIG. 3 presents a flowchart of a routine for calculating correlation coefficients for use in a correlation module;

[0039] FIG. 4 presents a flowchart of a routine for predicting user compatibility for use in a predictive module;

[0040] FIG. 5 presents a graphical depiction of a solution to the problem of compatibility matching among users;

[0041] FIG. 6 presents a graphical depiction showing a pooled rating vector of all people in the population for a given movie and a "wizard" that matches the pooled vector;

[0042] FIG. 7 illustrates a web-based computer system for predicting the compatibility of at least one item of interest to a user;

[0043] FIG. 8 presents a flowchart of a routine for providing item ratings over the mobile web; and

[0044] FIG. 9 presents a flowchart of a routine for providing item ratings over a cellular network via text message.

DETAILED DESCRIPTION OF THE INVENTION

[0045] Disclosed herein is a system and method for using movie taste for comparability matching, each now described in specific terms sufficient to teach one of skill in the practice thereof. In the description that follows, numerous specific details are set forth by way of example for the purposes of explanation and in furtherance of teaching one of skill in the art to practice the invention. It will, however, be understood that the invention is not limited to the specific embodiments disclosed and discussed herein and that the invention can be practiced without such specific details and/or substitutes therefor. The present invention is limited only by the appended claims and may include various other embodiments which are not particularly described herein but which remain within the scope and spirit of the present invention.

[0046] In exploring the fundamental nature of movie ratings ("star ratings"), most studies conducted to date have been concerned with the impact of movie reviews by professional critics on the financial success of a movie or the coherence of the ratings of professional movie critics. Surprisingly, four questions have been virtually ignored: Are non-experts able to rate the quality of a movie in a consistent way--in other words, is there an inherent movie quality? Do professional experts or critics have better access to this inherent movie quality? What is the relationship between the judgment of critics and non-experts? Is movie taste among lay-people homogenous? The lack of relevant knowledge is even more surprising, as it is known that people use recommendations by critics to choose which movie to see, while it remains unknown how accurate these recommendations are.

[0047] As disclosed herein, it has been discovered that movies have an inherent quality that randomly picked people can agree upon. However, this agreement is very limited, reflected by an average correlation of about 0.26 between rating vectors of randomly picked individuals. Based on an understanding that is this low, vehement and decisive disagreement about the quality of a movie is to be expected frequently. Surprisingly, professional critics have, on average, no better access to this inherent quality than non-experts. This is true even for the most popular reviewers like Roger Ebert, whose correlation to the average non-expert is also about 0.26. Moreover, critics and non-experts seem to be out of phase. Non-experts are better at predicting non-experts, critics are better at predicting critics. On average, pooled non-expert judgments seem to give the best predictions to non-experts. The average correlation is on the order of 0.45 and close to the theoretically possible maximum, given the inherent variance of movie taste in the sample of non-experts. This theoretical maximum lies at around 0.49 and can not be surpassed by unweighted, averaged raters. This implies an enormous variance in the movie taste of individuals. Moreover, these results are extremely robust. Additionally, the 6 month retest-reliability of the survey is about 0.85, which is extremely high for this kind of data. The predictive utility of movie star ratings derives from the functional structure of the movie rating situation; that is, critics rate movies and people from the general public rate the same movies.

[0048] In one form, provided is a method of predicting the compatibility of at least one item of interest to a user of a web-based system. The method includes the steps of providing a survey of items for rating by the system user, collecting a set of ratings for the survey of items from the system user, collecting a set of ratings for the survey of items from each of a plurality of raters, calculating a correlation coefficient between the system user and each of the plurality of raters to obtain a set of correlation coefficients for the survey of items, selecting a group of raters from the plurality of raters, the group of raters selected on the basis that each member of the group of raters has provided a rating of the at least one item of interest and predicting the compatibility of the at least one item of interest to the system user from the ratings provided by the group of raters and the correlation coefficients calculated between the system user and each of the group of raters.

[0049] In another form, provided is a method of predicting the compatibility of a first user of the system to at least one other user of the system. The method includes the steps of providing a survey of items for rating by the first system user, collecting a set of ratings for the survey of items from the first system user, collecting a set of ratings for the survey of items from each of a plurality of raters, calculating a correlation coefficient between the first system user and each of the plurality of raters to obtain a set of correlation coefficients for the survey of items, predicting the compatibility of the first system user to at least one of the plurality of raters from the correlation coefficients calculated between the system user and each of the group of raters and providing to the first system user at least one other user selected on the basis of correlation to the first system user from the plurality of raters.

[0050] In yet another form, provided is a multi-user web-based computer system for predicting the compatibility of at least one item of interest to a user of the system. The system includes a web server for communicating with users of the web-based computer system and components thereof, a user profile database for storing information on system users, a user rating module for providing a survey of items for rating by the system user and collecting a set of ratings for storage within the user profile database, a computational process module, the computational process module having a correlation module for calculating a correlation coefficient between the system user and each of a plurality of raters to obtain a set of correlation coefficients for the survey of items and a predictive module for predicting the compatibility of the at least one item of interest to the system user from the ratings provided by the plurality of raters and the correlation coefficients calculated between the system user and each of the plurality of raters, a compatibility-based matched items table for receiving an output regarding the compatibility of the at least one item of interest to the system user from the computational process module and a recommendation process module for receiving information from the compatibility-based matched items table and returning the information to the web server for transmitting to the system user.

[0051] Referring now to FIG. 1, there are three entities critical to the process: people 10, critics 20 and movies 30 that can be related through the rating information contained in the star ratings. Both people and critics rate movies as a measure of how much they appreciate seeing them, which are indicated by arrows 12 and 22, respectively. It can be readily understood that FIG. 1 is somewhat simplified, in that these three categories consist of individual persons, movies and critics that collectively make up the population of people, movies and critics. Of course, people 10, movies 20 and critics 30 are related in more than these ways, since critics 30 also review movies 20 and people 10 buy movies 20, although the focus herein is on ratings and the way it connects these entities.

[0052] In one form, the present invention provides a method and system for matching critics to individual users. As may be appreciated, potential movie viewers face the problem of deciding which movie to select. This is by no means an easy problem, as the sheer number of movies prohibits a trivial solution, such as watching them all, and most publicly available information is a highly unreliable predictor of movie enjoyment (e.g. marketing campaigns, etc.)

[0053] Professional movie critics are one potential remedy to this problem. In theory, they can advise the movie-going public about what movies to see. Unfortunately, it can be shown that they are essentially just voicing their opinion and it is unlikely that this opinion is in tune with the taste of any given person. Moreover, it can be shown that people are generally poor at determining which critic reflects their tastes accurately when relying on intuitive judgment alone.

[0054] Referring now to FIG. 2, a schematic representation of this situation from the perspective of a given person 100 is shown. Person 100 has rated the movies 120. Individual critics 130, 132 and 136 have also rated the same movies. Arrow 112 represents a vector of judgments, in this case, the star ratings for movies the person 100 has seen, and hence the "movie taste" of the individual, as captured by these ratings. The particular taste vector 112 of person 100 can then be matched with the closest taste, that is, the most similar vector of judgments, from the group of critics. As shown in FIG. 2, this is schematically illustrated through the use of the gray level of the vectors and, in this example, vector 124 of the second critic 132 matches the taste of person 100 best, although, perhaps not perfectly.

[0055] Matching can be implemented in many different fashions. The most straightforward one is correlation. Correlation summarizes the strength of relationship between two variables. Several different correlation coefficients can be calculated, but the two most commonly used in the art are Pearson's correlation coefficient and Spearman's Rank Correlation coefficient. Pearson's correlation coefficient requires both variables to be measured on an interval or ratio scale and the calculation is based on the actual values. The Spearman Rank Correlation is a nonparametric, distribution-free, rank statistic proposed by Spearman in 1904 as a measure of the strength of the associations between two variables. Spearman's Rank Correlation coefficient requires data that are at least ordinal and the calculation, which is the same as for Pearson correlation, is carried out on the ranks of the data. Each variable is ranked separately by putting the values of the variable in order and numbering them: the lowest value is given rank 1, the next lowest is given rank 2 and so on. If two data values for the variable are the same they are given averaged ranks, so if they would have been ranked 14 and 15 then they both receive rank 14.5.

[0056] Spearman's Rank Correlation coefficient is used as a measure of linear relationship between two sets of ranked data, that is, it measures how tightly the ranked data cluster around a straight line. Spearman's Rank Correlation coefficient, like all other correlation coefficients, will take a value between -1 and +1. A positive correlation is one in which the ranks of both variables increase together. A negative correlation is one in which the ranks of one variable increase as the ranks of the other variable decrease. A correlation of +1 or -1 will arise if the relationship between the two variables is exactly linear. A correlation close to zero means there is no linear relationship between the ranks.

[0057] To use Pearson's correlation coefficient, it is necessary to assume that both variables have a normal distribution. No such assumption is necessary for tests using Spearman's rank correlation. Thus, Spearman's coefficient is preferred over Pearson's coefficient if either the data are ordinal or ranked or if it is unreasonable to assume that the variables are normally distributed.

[0058] In the practice of the method disclosed herein, the Spearman correlation is preferred, due to the fact that the data are on an ordinal scale and has been shown to yield sufficiently close matches. The Spearman rank correlation coefficient can be used to give an R-estimate, and is a measure of monotone association that is used when the distribution of the data make Pearson's correlation coefficient undesirable or misleading.

[0059] The Spearman rank correlation coefficient is defined by:

r ' .ident. 1 - 6 d 2 N ( N 2 - 1 ) , ##EQU00001##

where d is the difference in statistical rank of corresponding variables, and is an approximation to the exact correlation coefficient

r .ident. .SIGMA. .times. y .SIGMA. .times. 2 .SIGMA. y 2 ##EQU00002##

computed from the original data. Because it uses ranks, the Spearman rank correlation coefficient is much easier to compute.

[0060] As appreciated by those skilled in the art, the procedure for using Spearman's Rank Correlation may be given as follows: [0061] 1. State the null hypothesis i.e. "There is no relationship between the two sets of data;" [0062] 2. Rank both sets of data from the highest to the lowest. Make sure to check for tied ranks; [0063] 3. Subtract the two sets of ranks to get the difference d; [0064] 4. Square the values of d; [0065] 5. Add the squared values of d to get Sigma d.sup.2; [0066] 6. Use the formula:

[0066] r ' .ident. 1 - 6 d 2 N ( N 2 - 1 ) , ##EQU00003## [0067] where N is the number of ranks; [0068] 7a. If the value is -1, there is a perfect negative correlation; [0069] 7b. If the value falls between -1 and -0.5, there is a strong negative correlation; [0070] 7c. If the value falls between -0.5 and 0, there is a weak negative correlation; [0071] 7d. If the value is 0, there is no correlation; [0072] 7e. If the value falls between 0 and 0.5, there is a weak positive correlation; [0073] 7f. If the value falls between 0.5 and 1, there is a strong positive correlation [0074] 7e. If the value is 1, there is a perfect positive correlation between the 2 sets of data; and [0075] 8. If the value is 0, state that null hypothesis is accepted. Otherwise, state that it is rejected.

[0076] Of course, as those skilled in the art will plainly recognize, it is possible to implement this matching with any number of other similarity measures, most of which should give results that are roughly equivalent. The closeness of the match and the amount of remaining variation to be expected can also be quantified. This is particularly important for individuals with rather idiosyncratic tastes.

[0077] In essence, the method disclosed herein provides the user with information about which movie critic judges movies in a fashion that reflects his or her own movie taste most closely. A sufficiently large database of movie critics will almost certainly yield a close match. This is important since once the user has found a reviewer that objectively matches his own movie taste, he or she can also access a large reservoir as well as a current stream of reviews, since many professional critics pride themselves in their timely reviews as well as their huge archives of existing movie reviews. This service might be interesting to most critics as well, since there is evidence that a few popular critics are not necessarily very accurate for the majority of people and they can monopolize the market for movie reviews.

[0078] In another form, the present invention provides a method and system for matching movies to individual users. As may be appreciated, there is a large choice of potential movies to view, yet only limited money and in time in which to see them all. Hence, the user faces a serious choice problem, as he or she has to optimize his enjoyment under uncertainty and a large choice set. Moreover, time is also a constraint, in that it is also required for reading movie reviews. Even if one finds a matching critic, as described hereinabove, reading and sifting through movie reviews can take a significant amount of time. More importantly, the best match to a critic might still not be as good as the limit given by the retest-reliability (above 0.85). Hence, the suggestions by the critic might not be the best possible. As such, the algorithm for movie recommendations disclosed herein is superior.

[0079] Referring again to FIG. 2, a user 100 rates movies 120, as do critics 130, 132, 136, etc. As may be appreciated, one could take all of the ratings of the critics, average them for each movie and arrive at a predictor vector of all of the combined critics. This methodology is available and provided at websites such as rottentomatoes.com and metacritic.com. One can imagine that the grey-scale color of the combined rating vector would be somewhere in the middle of the spectrum, as one takes all the different flavors of the critics' tastes, equally, into account. Unfortunately, this vector can be shown to empirically match the vector of the individual user 100 with a correlation of only 0.42. This is due to the fact that critics are systematically biased and individual movie taste is very different. Hence, the equal weights solution is very suboptimal. On the other hand, a pooled vector is theoretically optimal, given that it is weighted by the preferences of the user 100. In other words, if the vector 122 of an individual critic 130 closely matches the vector 112 of the user 100, it will be assigned a greater weight in determining the luminance, or grey level, of the final vector than those of a critic that does not match the vector 122 of the user 100. This two-step process of weighted pooling can then be extended to predict star ratings of movies that the user 100 has not already seen. Predicting the star rating of movies 120 that the user has already seen gives a sense of how accurate the predictions are, on average, as one can compare the predicted pooled, weighted vector from the critics with the actual vector 112 of the user 100.

[0080] There are several ways to implement this. One formula is:

s * = ( r i k s j ) r i k ##EQU00004##

where s* is the predicted movie rating for a given movie, r.sub.i is the correlation between the star rating of a given user and a particular critic (for all movies except the one under prediction), s.sub.j is the star rating from a particular critic for the given movie under prediction and k is a scaling factor that is to be optimized empirically.

[0081] Alternatively, the weighted pooled vector from all non-critic persons can be used, but this might be sub-optimal as the number of movies seen is typically larger for critics, hence r can be expected to be more robust. However, as the optimum is arrived at quickly, as i becomes relatively large, this factor is minimized. Also, the introduction of bounding factors to bound the result to values between 0 and 4 or 0 and 100 or -10 and 10 may be desirable.

[0082] Referring now to FIG. 3, a correlation module 500 for use in a system for correlating a given user with a given rater or reviewer (critic) is depicted. As shown, correlation module 500 executes the following steps:

[0083] In step 510, all movies that both the reviewer and the user have rated are found and the ratings of these movies stored in a temporary list of rating pairs. At step 520, a check is made to determine if the list has more than 10 pairs; if so, continue to step 530; if not, continue to step 540 to determine if there is another critic. As shown, information is supplied from step 505, which provides a list of all critics from which movie ratings have been obtained. If so, another critic is entered into the system at step 550; if not, the process must terminate, since the correlation will have little value for a small number of pairs.

[0084] At step 530, a mean user rating of movies for the temporary list (xmean) and a mean critic rating of movies for the temporary list (ymean) are calculated. At step 560, for each movie rating, the differences between the rating and the respective means are calculated and stored as xdiff and ydiff. As may be appreciated, there will be as many values for xdiff and ydiff as there are movies on the temporary list. At step 570, the differences for each movie are multiplied (xdiff1.times.ydiff1; xdiff2.times.ydiff2, etc.) and stored as mdiff1, mdiff2, etc.

[0085] At step 580, the multiplied differences are summed to arrive at the coefficient of variance (cov=mdiff1, mdiff2, etc.) and the sum (cov) is divided by the number of movies on the temporary list to arrive at mcov. At step 590, the standard deviation of the user ratings on the list (stdevx) and the standard deviation of the critic ratings on the list (stdevy) are calculated.

[0086] At step 600, a check is made to determine if either stdevx or stdevy is equal to 0. If yes, the process is aborted and the correlation coefficient assumed to be 0 at step 610. Of course, to expedite calculations, this could be executed as step 530, but is presented as step 600 due to the logic of correlation. If no, the product of the two standard deviations is calculated and stored as pdev (pdev=stdevx.times.stdevy) and mcov is divided by pdev to arrive at the correlation. As may be appreciated, this yields the Pearson Product-Moment Correlation coefficient (corr) between a given user and a given rater, reviewer (critic), for a given list of commonly seen movies.

[0087] As may be appreciated, the estimate may need to be bounded. For example, as mentioned above, if the sum of the weights is close to 0, the final prediction would be close to the minimal or maximal value assigned by predictive module 700 (0 to 4 or 0 to 10), depending on the sign of denominator and numerator. Similar problems may arise if the summed numerator is less than 1. In this case, all weights may be assigned the value of 1 to determine the prediction, although this case will be rare.

[0088] This strategy exploits the correlational structure inherent in the movie taste of critics and non-critic persons and can be shown to be superior to any individual recommendation source (be it critic or other individual), for any given user, since it takes large amounts of information into account in an optimal fashion. This method yields extremely accurate results and is believed to be very similar to the algorithm the brain of rhesus monkeys employs to arrive at perceptual decisions about moving objects. In effect, this presents the user with the best possible speed/accuracy tradeoff. He or she has to spend no time reading reviews, yet the accuracy of the recommendations increases up to the theoretically possible maximum.

[0089] Referring now to FIG. 4, in order to predict the compatibility of at least one item of interest, for example, but not by way of limitation, a movie or book or the like, to a user, a predictive module 700 executes the following steps.

[0090] At step 710, select all system raters, which may be experts, movie critics, other system users, or the like, who have given this item a rating, (rating1, rating2, etc.) At step 720, select and store the correlation coefficients between the user and the each of the system users who have given this item a rating (corr1, corr2, etc.). At step 730, each correlation coefficient is raised to the power of k, where k is an empirically derived number that minimizes the mean squared error, for example, but not by way of limitation, k may be 3. (weight1, weight2, etc.).

[0091] At step 740, the sum of all weights is taken. At step 750, a check is made to determine if the sum of all weights is close to 0; if so, then all weights are set to 1 at step 770. If the sum of all weights is not close to 0, then, each weight is multiplied by its corresponding rating at step 760, and then those products are summed. This yields a raw score.

[0092] At step 780, the raw score of step 760 is divided by the sum of the weights, derived above. The result of this computation is the estimate of the rating prediction x. At step 790, the average item rating is calculated for the system user (a) in order to account for different scale use by different raters and to assure that the absolute ratings are meaningful. At step 800, the average item rating is calculated for the system raters (b). At step 810, a correction factor (y) is calculated to account for the difference in scale use between the system user and the system raters. The correction factor y is determined by subtracting b from a. At step 820, the correction factor y is applied to the initial prediction x, determined in step 780, to obtain a final a rating prediction. The final rating prediction is obtained by adding y to the initial prediction x.

[0093] As movie distribution moves online, it introduces a very long "tail". The "long tail" is a phenomenon that has been, so far, chiefly observed in the online sale of book and music. For example, Amazon has an inventory of several million titles, most of them rather obscure, generally not available in offline book stores and with rather low individual sales volume. However, it has been shown that Amazon generates the majority of its sales revenue outside of the top 130,000 sellers. As may be appreciated, movie taste is rather idiosyncratic. A "good" movie is simply a movie that many people enjoy watching, not necessarily something that is inherent to the movie or relative to other movies. Shifting the focus on the enjoyment of the individual makes it irrelevant, from the perspective of the individual, how many other people enjoy watching a given movie. Hence, it will be of chief importance to match the right flavor of idiosyncratic taste with the right flavor of idiosyncratic movie. There is a total of about 850,000 movies in existence, although even the better offline rental stores typically carry only a mere several thousand titles. As such, the "long tail" problem is likely to become an increasingly important issue in movie selection as online distribution on demand becomes more prevalent.

[0094] In another form, the present invention provides a method and system for matching people to one another. The present invention addresses the issue that current online dating services, while providing a much larger sample of potential mates, are plagued by the use of constraints that are both too non-restrictive and often invalid, leading to a failure to actually elicit a good match.

[0095] In many ways, the current dating situation mirrors the search engine market before the advent of Google. In 1997, Altavista had cataloged the entire internet in a huge index and was able to provide a response to any query within fractions of a second. Yet, users were dissatisfied since the results of their search queries rarely matched what they were looking for. The success of Google is largely based on the fact that Google provides a way to rank-order pages in terms of relevance. The present invention achieves similar benefits by automatically rank-ordering potential matches in a way that is meaningful to the user.

[0096] Data obtained on couples, failed couples and people that have not been in a relationship, can be correlated with movie ratings. As pointed out above, the correlation between two randomly picked participants in the study is 0.26, on average. The correlation between couples is significantly higher, about twice that. Conversely, the correlation between failed couples is marginally, yet insignificantly lower than the correlation between randomly picked people.

[0097] There are several factors that determine the rating that a given individual assigns to a given movie. First of all, the objective movie quality. Second, emotional, social and environmental factors present both during encoding (watching the movie) and retrieval (assigning the rating) of the movie information. Third, random noise and uncertainty in assigning a number to a movie. These factors, taken together, are rather insignificant as the re-test reliability is above 0.85. Finally, if these factors are constant or insignificant, personality necessarily has to be the key factor accounting for the tremendous variance observed for every single movie in the study. Ultimately, it needs to be explained why the correlation between two randomly picked individuals is as low as it is and why the spread of ratings for any given movies is so large.

[0098] This large variance is a nuisance when trying to make accurate movie recommendations. On the other hand, turning the problem on its head, this variance becomes an immediate treasure trove when trying to estimate personality based on movie ratings. Due to the inherent variance in these ratings, the inverse problem seems to be much easier and rich in information. Of course, this strongly depends on the question if this variance in ratings for a given movie is systematic or not. The observed pattern of correlations between couples, failed couples and randomly picked people that don't date suggests exactly that; that the pattern is highly systematic.

[0099] Movie ratings may be employed to rank potential mates, for example, in online dating. It has been shown that high similarity in personality parameters is the basis for a good relationship. Hence, a higher correlation implies a higher chance of a good match. This approach seems feasible for a number of reasons. First, users seem to be perfectly able to effortlessly make these movie ratings and there is evidence that a good number of them enjoy doing so. Second, the "questions" (individual movies) are very specific and to the point. Third, social desirability will be unclear in most cases, based on the premise that a higher correlation is better and that the movie taste space is highly dimensional in itself, users would be well advised to be as honest as possible, since there is no single "solution" to each movie question, if they want to find an actual match. Finally and most importantly, movie taste seems to tap personality in a unique way. Hence, the same survey that was previously used to predict best individual critics and movies can be successfully employed for the online dating situation.

[0100] What ultimately determines which movies someone likes or hates? It may come down to someone's outlook on life, their philosophy, their humor, likes and dislikes, political positions, intelligence, all of which are prone to produce emotional reactions when confronted with and triggered by movies, creating the stable and distinctive ratings observed. In other words, movie ratings implicitly contain information about the so-called "inner values" that are so notoriously hard to probe, yet so important for a happy relationship.

[0101] Referring now to FIG. 3, the implementation of this procedure is depicted. Arrows 212, 214, 216 and 218 represent the rating vectors and the different shades of grey represent particular movie tastes. As shown, person 200 and person 204 have a closely matched movie taste. Hence, we would recommend them as a "match", out of the many possible persons 200, 202, 204, 206 through person n. As described hereinabove, this can be implemented in many ways, with the Spearman Rank Correlation between the rating vectors being the most straightforward. As such, movie ratings provide the opportunity to rank potential mates based on their suitability as a mate, solving the problems of conventional dating sites due to the fact that they provide a relevant dimension of potentially arbitrary restrictiveness.

[0102] The approach disclosed herein also helps with social networking and movie recommendations, not necessarily just dating. As pointed out hereinabove, the average retest reliability of movie rating is about 0.85. It is relatively easy to find "statistical twins" in a large enough sample of persons via their movie rating vectors and match them up with each other. They can then keep each other informed about the quality of recent releases. Theoretically, the judgment of a person's "statistical twin" should be as good as if the person saw the movie him or herself.

[0103] In another form, the present invention provides a method and system for person to pooled person matching that can be tailored to assist in producing better movies. The present invention can be employed to ensure that the industry delivers a product that people will want to see. Current practice dictates that after the initial script-selection and green lighting process performed by producers and others, the movie is handed over to professional artists that make the movie. Then, standard industry practice calls for test-screenings after a movie is shot and edited to make it more palatable to the target audience. These test-screening are typically attended by a large and diverse crowd of whatever demographic the studio aims for reaching with the movie. The problem with this approach is that, while it is trivial to match the test audience in terms of the desired demographic parameters, it remains essentially unclear how well the test audience represents the population at large in terms of movie taste.

[0104] Referring now to FIG. 4, in accordance herewith, the screening of large numbers of people for their movie taste should allow the industry to identify individuals ("wizards") that correlate with pooled population measures like the averaged ratings of all other participants combined essentially perfectly (>0.9). As shown, on the left, the pooled rating vector 370 of all people 350 in the population for a given movie 320. On the right is the rating vector 360 of a "wizard" person 340 that matches the pooled vector 370, compared to the vector 312 of a randomly picked non-wizard person 312.

[0105] These taste experts ("wizards") 340 could consult movie-makers at every stage of the production process, ultimately delivering a better and more enjoyable product. This could cut down on expenses for advertising, since a strong product can rely more on word of mouth, test screenings, a few "expert" consultants would do, as well as increase the chances of delivering a movie that audiences actually like to see, want to see and will go to see, at which point the interests of movie goers who want to see an enjoyable movie and studios, wanting to make a bankable movie, converge.

[0106] FIG. 5 illustrates the basic components of a compatibility matching web-based computer system 430, including the components used to implement the compatibility matching or recommendation service. The arrows in FIG. 5 show the general flow of information that is used by the recommendation service. As illustrated by FIG. 5, the web-based computer system 430 includes a web server application 432 ("web server") which processes HTTP (Hypertext Transfer Protocol) requests received over the Internet from user computers 434. The web server 432 accesses a database 436 of HTML (Hypertext Markup Language) content which includes movie or other item information pages and other browsable information about the various items. The "items" that are the subject of the recommendation service are the titles of movies or other items of content employed that are found within this database 436.

[0107] The web-based computer system 430 also includes a "user profiles" database 438 which stores user-specific information about users of the web site. As illustrated by FIG. 5, the data stored for each user may include one or more of the following types of information, among other things, that can be used to generate matches or recommendations in accordance with the invention: (a) the user's past movie viewing history, (b) the user's item ratings profile, and (c) other user-specific information.

[0108] As depicted by FIG. 5, the web server 432 communicates with various external components 440 of the web-based computer system 430. These external components 440 include, for example, a search engine and associated database (not shown) for enabling users to interactively search for information on particular items. Other external components 440 may include various order processing modules (not shown) for accepting and processing orders, and for updating the purchase histories of the users.

[0109] The external components 440 may also include an optional shopping cart process (not shown) which adds and removes items from the users' personal shopping carts based on the actions of the respective users. As used herein, the term "process" is used to refer generally to one or more code modules that are executed by a computer system to perform a particular task or set of related tasks. The shopping cart process may also generate and maintains the user-specific listings of recent shopping cart contents.

[0110] The external components 440 also include compatibility matching recommendation service components 444 that are used to implement the web-based computer system's various recommendation services. Recommendations generated by the compatibility matching recommendation services are returned to the web server 432, which incorporates the recommendations into personalized web pages transmitted to users containing the matched items.

[0111] The recommendation service components 444 include a user rating application process 450 which implements a user rating service for a plurality of items. Users of the user rating service are provided the opportunity to rate individual movies or other items from a pre-selected list. The movie titles or other items are rated according to a four-star scale, in half-star increments, wherein zero is bad and four is excellent.

[0112] As depicted in FIG. 5, the user rating application 450 records the ratings within the user's items rating profile. For example, if a user of the user rating service gives the movie Gone with the Wind a score of "4 stars," the user rating application 450 would record the item, by title (or identifier), and the score within the user's item ratings profile. The user rating application 450 uses the users' item ratings to generate taste vectors, as described herein-above for use in ultimately generating personal recommendations, which can be requested by the user by selecting an appropriate hyperlink.

[0113] The compatibility matching recommendation services components 444 also include a recommendation process 452, a compatibility-based matched items table 460, and a computational process 466, which collectively implement the compatibility matching recommendation service. The computational process 466 includes correlation module 500 (see FIG. 3) and predictive module 700 (see FIG. 4), each of which described in detail hereinabove. As depicted by the arrows in FIG. 5, the recommendation process 452 generates personal recommendations based on information stored within the compatibility-based matched items table 460, and based on the items that are known to be of interest. The items of known interest are identified based on information stored in the user's profile.

[0114] A webcrawler expert rating collection process 470 may be provided, which searches the web for movie or other item ratings information for use in generating expert taste vectors, as described in detail hereinabove. The output of the webcrawler expert rating collection process 470 is fed to computational process 466 for use in the compatibility matching recommendation process 452.

[0115] The various processes 450, 452, 466 and 470 of the recommendation services may run, for example, on one or more Unix or Windows-based workstations or physical servers (not shown) of the web-based computer system 430. The compatibility-based matched items table 460 may be stored in a data structure that permits efficient look-up, and may be replicated across multiple machines, together with the associated code of the recommendation process 452, to accommodate heavy loads.

[0116] The general form and content of the matched items table 460 will now be described with reference to FIG. 5. As this table can take on many alternative forms, the details of the table are intended to illustrate, and not limit, the scope of the invention.

[0117] As indicated above, the compatibility-based matched items table 460 maps items to lists of similar items based at least upon the taste vector of another user or expert or the weighted vectors of several users or experts, as described herein above, that has been matched to a particular user selected from the community of users. The compatibility-based matched items table 460 is preferably generated periodically (e.g., once per day) by the computational process 466. In the form described herein, the matched items table 460 is, therefore, generated exclusively from the user ratings of the community of users and/or experts. In other embodiments, the compatibility-based table 460 may additionally be generated from other indicia of user-item interests, including indicia based on shopping cart activities, and rating profiles for other item categories (books, for example).

[0118] In other forms involving sales of products, the compatibility-based table 460 may include entries for compatibility-matched, recommended products of the online merchant. In this form, several different types of items (movies, books, CDs, etc.) may be included within the same compatibility-based table 460, although separate tables could alternatively be generated for each type of item. Each matched items table 464 consists of at least one list containing N (e.g., 5) items which are predicted to be the most compatible with the user's taste.

[0119] The items are represented within the matched items table 460 using movie titles or relevant product IDs, or other identifiers. Although the recommendable items in the described system are in the form of movie titles, book titles or music titles, it will be appreciated that the underlying methods and data structures can be used to recommend a wide range of other types of items, including compatible persons for dating, as has been described hereinabove.

[0120] It is also contemplated that information from system 400 can be made available over the mobile web to a plurality of handheld devices 472 or over the cellular telephone network to a plurality of cellular telephones 474 via text message.

[0121] Referring now to FIG. 8, a process for providing movie ratings or other items of interest 800 to a system user 810 over the mobile web is depicted. As shown, at step 820, user 810 makes a request via a web-enabled mobile device. System 400 (see FIG. 5), responds with a Website Meta Language (WML) interface, prompting the user to log into the system 400. At step 840, a check is made to determine if user 810 logged in successfully. If so, system 400 responds by sending to device 472 a list, for example, but not by way of limitation, of the most recently released movies and their predicted ratings, the predictions determined as described hereinabove. If the user 810 has not logged in successfully, an error page is transmitted at step 860 to device 472.

[0122] Referring now to FIG. 9, a process for providing movie ratings or other items of interest 900 to a system user 910 over a cellular telephone network, via text message, is depicted. As shown, at step 920, user 910 registers for the service via a cellular telephone device 474 and sends a text message at step 930 to system 400 (see FIG. 5). System 400 responds, for example, but not by way of limitation, with a list of most recently released movies and their predicted ratings, the predictions determined as described hereinabove.

[0123] As has been shown, adequately gathered movie ratings provide a rich source of information for a diverse range of potential uses.

Example

[0124] A survey was designed consisting of 210 movies, picked largely at random, while ensuring movie popularity. Data were collected from about 2000 subjects. The data were generated by having subjects rate how much they enjoyed a given movie from the list on a 9-point scale, that is, from 0 to 4 "stars," in increments of half-stars.

[0125] These data revealed that movies have an inherent quality that randomly picked people can agree upon. However, this agreement has been found to be very limited, reflecting by an average correlation of about 0.26.

[0126] Professional critics were found to, on average, no better access to this inherent quality than non-experts. This is true even for the most popular reviewers like Roger Ebert, whose correlation to the average non-expert was found to be 0.26.

[0127] On average, pooled non-expert judgments seem to give the best predictions to non-experts. The average correlation was found to be on the order of 0.45 and close to the theoretically possible maximum, given the inherent variance of movie taste in the sample of non-experts. This theoretical maximum lies at around 0.49 and can not be surpassed by unweighted, averaged raters. This implies an enormous variance in the movie taste of individuals.

[0128] Using a weighted average-based algorithm to create a most likely rating for a given movie by weighing the ratings from others by their overall correlation to the subject user, minus the given movie, achieving an average correlation of 0.72, substantially better than the 0.49 barrier for untailored recommendations.

[0129] All patents, test procedures, and other documents cited herein, including priority documents, are fully incorporated by reference to the extent such disclosure is not inconsistent with this invention and for all jurisdictions in which such incorporation is permitted.

[0130] While the illustrative embodiments of the invention have been described with particularity, it will be understood that various other modifications will be apparent to and can be readily made by those skilled in the art without departing from the spirit and scope of the invention. Accordingly, it is not intended that the scope of the claims appended hereto be limited to the examples and descriptions set forth herein but rather that the claims be construed as encompassing all the features of patentable novelty which reside in the invention, including all features which would be treated as equivalents thereof by those skilled in the art to which the invention pertains.

* * * * *