U.S. patent application number 14/036610 was filed with the patent office on 2014-01-23 for content ranking system and method.
This patent application is currently assigned to PulsePoint, Inc.. The applicant listed for this patent is Pulsepoint, Inc.. Invention is credited to Lawrence A. Birnbaum, Kristian J. Hammond, Sanjay C. Sood, Erik Sundelof, Amra Q. Tareen.
Application Number | 20140025690 14/036610 |
Document ID | / |
Family ID | 40363778 |
Filed Date | 2014-01-23 |
United States Patent
Application |
20140025690 |
Kind Code |
A1 |
Tareen; Amra Q. ; et
al. |
January 23, 2014 |
CONTENT RANKING SYSTEM AND METHOD
Abstract
A relevance of a content item to an event is determined. A user
rating of the content item is received. A ranking of the content
item is determined with respect to the event, relative to one or
more other content items associated with the event based at least
in part on the relevance and the user rating. In some embodiments,
the ranking is determined based at least in part on a sentiment
score associated with the content item. A ranking algorithm
provides a novel method for retrieving, scoring, ranking, and
presenting content related to an event based on a combination of
scoring contributions such as user rating, relevance, and
sentiment. Embodiments provide users with the most relevant,
engaging, and informative content concerning events of interest to
them.
Inventors: |
Tareen; Amra Q.; (San
Francisco, CA) ; Sundelof; Erik; (Palo Alto, CA)
; Birnbaum; Lawrence A.; (Evanston, IL) ; Hammond;
Kristian J.; (Chicago, IL) ; Sood; Sanjay C.;
(Claremont, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Pulsepoint, Inc. |
New York |
NY |
US |
|
|
Assignee: |
PulsePoint, Inc.
New York
NY
|
Family ID: |
40363778 |
Appl. No.: |
14/036610 |
Filed: |
September 25, 2013 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
12215684 |
Jun 27, 2008 |
8548996 |
|
|
14036610 |
|
|
|
|
60937685 |
Jun 29, 2007 |
|
|
|
Current U.S.
Class: |
707/749 |
Current CPC
Class: |
G06F 16/903 20190101;
G06F 16/24578 20190101 |
Class at
Publication: |
707/749 |
International
Class: |
G06F 17/30 20060101
G06F017/30 |
Claims
1. A method, comprising: determining, by a content ranking device
having a processor and a memory, a plurality of scoring
contributions for a content item, the plurality of scoring
contributions including a user rating score, a relevance score, and
an emotional score, the user rating score based at least in part by
a number of user votes by different users on the content item and a
value of the user votes based on user reputation associated
therewith, the relevance score based at least in part on a
similarity between the content item and a container of content
items, the emotional score based at least in part on data used to
determine features or feature combinations indicative of particular
sentiment values for the content item; and scoring the content item
by computing a weighted sum of the plurality of scoring
contributions.
2. The method according to claim 1, further comprising: prior to
the scoring, normalizing the user rating score, the relevance
score, and the emotional score.
3. The method according to claim 1, further comprising: adjusting
the plurality of scoring contributions due to an increase in the
number of user votes on the content item.
4. The method according to claim 1, further comprising: adjusting
the plurality of scoring contributions due to a change in the value
of the user votes based on the user reputation associated
therewith.
5. The method according to claim 1, wherein the relevance score is
further based on a geographical overlap between the content item
and the container of content items, a temporal overlap between the
content item and the container of content items, or a combination
thereof.
6. The method according to claim 1, wherein the determining and the
scoring are performed by the content ranking device dynamically in
response to a query or set of queries received by the content
ranking device over a network.
7. The method according to claim 1, further comprising: retrieving
and presenting one or more content items related to an event to a
user over a network, the retrieving based at least in part on how
the one or more content items are scored relative to the event,
each of the one or more content items scored by the content ranking
device based on a combination of scoring contributions.
8. A content ranking apparatus, comprising: at least one processor;
and at least one non-transitory computer readable medium storing
instructions executable by the at least one processor to perform:
determining a plurality of scoring contributions for a content
item, the plurality of scoring contributions including a user
rating score, a relevance score, and an emotional score, the user
rating score based at least in part by a number of user votes by
different users on the content item and a value of the user votes
based on user reputation associated therewith, the relevance score
based at least in part on a similarity between the content item and
a container of content items, the emotional score based at least in
part on data used to determine features or feature combinations
indicative of particular sentiment values for the content item; and
scoring the content item by computing a weighted sum of the
plurality of scoring contributions.
9. The content ranking apparatus of claim 8, wherein the
instructions are further executable by the at least one processor
to perform: prior to the scoring, normalizing the user rating
score, the relevance score, and the emotional score.
10. The content ranking apparatus of claim 8, wherein the
instructions are further executable by the at least one processor
to perform: adjusting the plurality of scoring contributions due to
an increase in the number of user votes on the content item.
11. The content ranking apparatus of claim 8, wherein the
instructions are further executable by the at least one processor
to perform: adjusting the plurality of scoring contributions due to
a change in the value of the user votes based on the user
reputation associated therewith.
12. The content ranking apparatus of claim 8, wherein the relevance
score is further based on a geographical overlap between the
content item and the container of content items, a temporal overlap
between the content item and the container of content items, or a
combination thereof.
13. The content ranking apparatus of claim 8, wherein the
determining and the scoring are performed dynamically in response
to a query or set of queries received by the content ranking
apparatus over a network.
14. The content ranking apparatus of claim 8, wherein the
instructions are further executable by the at least one processor
to perform: retrieving and presenting one or more content items
related to an event to a user over a network, the retrieving based
at least in part on how the one or more content items are scored
relative to the event, each of the one or more content items scored
by the content ranking apparatus based on a combination of scoring
contributions
15. A computer program product having at least one non-transitory
computer readable medium storing instructions executable by at
least one processor of a content ranking device to perform:
determining a plurality of scoring contributions for a content
item, the plurality of scoring contributions including a user
rating score, a relevance score, and an emotional score, the user
rating score based at least in part by a number of user votes by
different users on the content item and a value of the user votes
based on user reputation associated therewith, the relevance score
based at least in part on a similarity between the content item and
a container of content items, the emotional score based at least in
part on data used to determine features or feature combinations
indicative of particular sentiment values for the content item; and
scoring the content item by computing a weighted sum of the
plurality of scoring contributions.
16. The computer program product of claim 15, wherein the
instructions are further executable by the least one processor to
perform: prior to the scoring, normalizing the user rating score,
the relevance score, and the emotional score.
17. The computer program product of claim 15, wherein the
instructions are further executable by the least one processor to
perform: adjusting the plurality of scoring contributions due to an
increase in the number of user votes on the content item.
18. The computer program product of claim 15, wherein the
instructions are further executable by the least one processor to
perform: adjusting the plurality of scoring contributions due to a
change in the value of the user votes based on the user reputation
associated therewith.
19. The computer program product of claim 15, wherein the relevance
score is further based on a geographical overlap between the
content item and the container of content items, a temporal overlap
between the content item and the container of content items, or a
combination thereof.
20. The computer program product of claim 15, wherein the
instructions are further executable by the least one processor to
perform: retrieving and presenting one or more content items
related to an event to a user over a network, the retrieving based
at least in part on how the one or more content items are scored
relative to the event, each of the one or more content items scored
by the content ranking device based on a combination of scoring
contributions.
Description
CROSS-REFERENCE TO RELATED APPLICATION(S)
[0001] This application is a continuation of, and claims a benefit
of priority under 35 U.S.C. 120 of the filing date of U.S. patent
application Ser. No. 12/215,684, by inventors Tareen et al.,
entitled "RANKING CONTENT ITEMS RELATED TO AN EVENT," filed on Jun.
27, 2008, which in turn claims the benefit of priority under 35
U.S.C. .sctn.119 to U.S. Provisional Patent Application No.
60/937,685, entitled "SYSTEM AND METHODS FOR RETRIEVING, RANKING,
AND PRESENTING CONTENT RELATED TO AN EVENT," filed Jun. 29, 2007,
which are both incorporated herein by reference for all
purposes.
BACKGROUND OF THE INVENTION
[0002] When displaying content items to a user, the content items
are typically ranked so that a desired set of content items is
displayed first. One problem is that, when a user rating is used to
rank items, a content item with only a few ratings that are very
high can be displayed within the desired set. However, this content
item may not be appropriately ranked because the few very high
ratings may not represent the content item's rating after more
rating are aggregated.
BRIEF DESCRIPTION OF THE DRAWINGS
[0003] Various embodiments of the invention are disclosed in the
following detailed description and the accompanying drawings.
[0004] FIG. 1 is a graph illustrating a simulated history of the
user interaction score, us(t), in one embodiment.
[0005] FIG. 2 is a graph illustrating an example of user
reputation, h.sub.m(t): t.sub.0=21 days, t.sub.1=365 days,
.omega..sub.0=0.99, .omega..sub.0=0.01 in some embodiments.
[0006] FIG. 3A, 3B, and 3C are graphs illustrating embodiments of
the distribution of the different scoring contributions. The scores
are normalized to make it easier to compare the different scoring
contributions, and the normalized frequency refers to relatively
how many documents obtain a particular score.
[0007] FIG. 4A and 4B are graphs illustrating an example of a user
rating score, U(t), and the corresponding weighting function, f(U),
for U1=20000 as a typical history of user votes, s, in one
embodiments.
[0008] FIG. 5A, 5B, 5C, and 5D are graphs illustrating an example
of a total Masala score for a content piece for a user score
history for U1=20000, .omega..sub.1.sup.0=0.2,
.omega..sub.2.sup.0=0.4, and .omega..sub.3.sup.0=0.4.
DETAILED DESCRIPTION
[0009] The invention can be implemented in numerous ways, including
as a process; an apparatus; a system; a composition of matter; a
computer program product embodied on a computer readable storage
medium; and/or a processor, such as a processor configured to
execute instructions stored on and/or provided by a memory coupled
to the processor. In this specification, these implementations, or
any other form that the invention may take, may be referred to as
techniques. In general, the order of the steps of disclosed
processes may be altered within the scope of the invention. Unless
stated otherwise, a component such as a processor or a memory
described as being configured to perform a task may be implemented
as a general component that is temporarily configured to perform
the task at a given time or a specific component that is
manufactured to perform the task. As used herein, the term
`processor` refers to one or more devices, circuits, and/or
processing cores configured to process data, such as computer
program instructions.
[0010] A detailed description of one or more embodiments of the
invention is provided below along with accompanying figures that
illustrate the principles of the invention. The invention is
described in connection with such embodiments, but the invention is
not limited to any embodiment. The scope of the invention is
limited only by the claims and the invention encompasses numerous
alternatives, modifications and equivalents. Numerous specific
details are set forth in the following description in order to
provide a thorough understanding of the invention. These details
are provided for the purpose of example and the invention may be
practiced according to the claims without some or all of these
specific details. For the purpose of clarity, technical material
that is known in the technical fields related to the invention has
not been described in detail so that the invention is not
unnecessarily obscured.
[0011] Ranking content items is disclosed. A user input is received
from each of one or more users indicating an opinion of the user
with respect to a content item included in a plurality of content
items. Based at least in part on a number of users from whom user
input has been received, a degree is determined to which a ranking
of the content item relative to one or more other content items in
the plurality of content items is determined by user input. In some
embodiments, the content item is associated with an event. In some
embodiments, the user input is weighted by a user reputation. In
some embodiments, the degree of contribution of user ratings to the
overall ranking varies depending on how many user ratings there
are; if there are very few, they are not counted much (e.g., to
discount being a non-representative sample); if there are more,
they are counted more; at some point though the contribution levels
out with regard to the number of user ratings.
[0012] A relevance of a content item to an event is determined. A
user rating of the content item is received. A ranking of the
content item is determined with respect to the event, relative to
one or more other content items associated with the event based at
least in part on the relevance and the user rating. In some
embodiments, the ranking is determined based at least in part on a
sentiment score associated with the content item.
[0013] In some embodiments, the ranking algorithm disclosed
provides a novel method for retrieving, scoring, ranking, and
presenting content related to an event based on a combination of:
Geography, Time, Similarity, Sentiment, User rating and User
preferences. It is aimed at providing users with the most relevant,
engaging, and informative content concerning events of interest to
them.
[0014] 2 Description
[0015] This document presents the components of a content ranking
algorithm and describes the various scoring contributions that it
utilizes.
[0016] 2.1 Conceptual Description
[0017] The ranking algorithm dynamically combines the various
scoring contributions in different ways over time, as the
importance of those respective contributions (see section 2.3) will
vary over time. The overall scoring method for a content item and
an event are both defined in section 2.4. An example of an example
scoring history for a content item in the system can be seen in
FIGS. 5A-5D.
[0018] The most basic element of scoring a content item is its
relevance to a given event in the information retrieval sense,
i.e., some characterization of similarity to a query or set of
queries, or to a base or "seed" content item or set of items,
associated with the event. This scoring contribution provides a
baseline characterization of how topically related the content item
is to that event. The model of similarity is fundamentally text
overlap, for example, using the standard term frequency-inverse
document frequency (TF-IDF) model, overlap of named entities, or
similar measures.
[0019] Two additional scoring contributions also address the
topical relevance of a content item to an event: temporal and
geographic proximity of times and locations mentioned in the
content item, or attached as metadata to the content item, to times
and locations associated with the event.
[0020] The ranking algorithm also assesses the emotional impact of
content items, in terms of valence (i.e., positive or negative and
with what intensity), as part of the process of ranking those
items. This scoring contribution allows the system to identify and
prioritize potentially high-impact content.
[0021] Finally, the ranking algorithm also employs user ratings as
one of its scoring contributions. The user rating score (U(t), see
section 2.3.1) is defined by both the number and value of user
votes (s(t)) with user reputation (h.sub.m(t)) taken into account.
Basically, a few votes by highly reputable users will have the same
effect as a large number of votes from less reputable users. This
behavior is defined by the equations (3) and (4) described
below.
[0022] For small numbers of user votes, the overall score of a
content item will be primarily defined by the relevance score
(R(t), see section 2.3.2) and the emotional score (E(t), see
section 2.3.3). As the number of user votes grows, i.e., becomes
statistically reliable, the effect of the user votes increases.
[0023] Finally, as the number of user votes increases further, the
importance of additional user votes decreases. This behavior is
modeled by the weighting function defined in equation (9) described
below.
[0024] User reputation is based on both long term and short term
interaction with the system.
[0025] By defining the user reputation as a function of both long
term and short term interactions in this way, the algorithm is
forgiving of isolated "mistakes" by users, such as submission of
low quality content or less frequent visits to the system. On the
other hand, long term abusive behavior will result in much slower
growth of a user's reputation. The exact definition of the user
reputation, h.sub.m(t), is found in section 2.2.
[0026] An event score is defined by a weighted sum of the user
rating score of the event itself and the (possibly weighted)
average total score of the content items associated with that
event. It can easily be extended to be valid for non-linear
weights.
[0027] 2.2 Definition of the User Reputation, h.sub.m(t)
[0028] The user interaction score, u.sub.s(t), measures the
interactivity of a user with the system. Some general
characteristics of the user interaction score include:
[0029] 1 The larger the number of visits, the higher user
interaction score.
[0030] 2 The higher frequency of revisits, the higher user
interaction score.
[0031] 3 Attempts to cram the system, as well as abusive behavior
or language, will lower the user's interaction score.
[0032] 4 The quality of a user's submitted content is also an
important factor in his or her interaction score.
[0033] An example of a user interaction history, u.sub.s(t), is
shown in FIG. 2.
[0034] User reputation is defined as a weighted sum of the long
term user interaction history (defined by the model parameter
t.sub.1) as well as the more recent user interaction history
(defined by the model parameter t.sub.0) inside the system. (See
FIG. 2 for the effect of variations of these model parameters.) The
user reputation, h.sub.m(t), of a user m is then given by a
weighted sums of the different partial user reputations,
h.sub.i,m,
h m ( t ) = { .omega. 1 h - 1 , m for t < t 0 .omega. 1 h - 1 ,
m + .omega. 0 h 0 , m for t 0 .ltoreq. t < t 1 .omega. 1 h 1 , m
+ .omega. 0 h 0 , m for t .gtoreq. t 1 ( 1 ) ##EQU00001##
where .omega..sub.1 and .omega..sub.0 are weights and
.omega..sub.0+.omega..sub.1=1. The respective partial user
reputation h.sub.-1,m h.sub.0,m and h.sub.1,m . . . is defined
as,
h i , m ( t ) = { .intg. T i , 0 t u s ( .tau. ) exp ( .tau. - t t
i ) .tau. for i = - 1 and t < t 0 .intg. T i , 0 t u s ( .tau. )
exp ( .tau. - t t i ) .tau. for i = - 1 , 0 and t 0 .ltoreq. t <
t 1 .intg. T i , 0 t u s ( .tau. ) exp ( .tau. - t t i ) .tau. for
i = 0 , 1 and t < t 1 ( 2 ) ##EQU00002##
where t.sub.-1=t.sub.1, .tau..sub.-1,0=0, .tau..sub.0,0=t-t.sub.0
and .tau..sub.1,0=t-t.sub.1 and u.sub.s(.tau.) is the user
interaction score at time .tau.. Note that .tau.-t.ltoreq.0 for all
t.gtoreq.0. The integral can be computed using simple numerical
integration.
[0035] The model parameters--t.sub.0, t.sub.1 and
.omega..sub.0--enable control of the relative weight between recent
and long term actions. A too high ratio of t.sub.1/t.sub.0 will
damp fluctuations of the user reputation over time. For
t.sub.1=1000 days (almost 3 years), there is a visual artifact that
one can obtain an overall growth in the user reputation,
h.sub.m(t), for higher t.sub.1. This artifact is purely an effect
of the formulation of equation (1) and can be eliminated by
introducing time-dependent weights .omega..sub.0(t) and
.omega..sub.1(t). Large values of .omega..sub.0 will drastically
damp high-frequency fluctuations in the user reputation in time.
Basically, choosing large values of .omega..sub.0 results in
low-frequency fluctuations in the user reputation, h.sub.m(t), in
time.
[0036] 2.3 Definition of the Scoring Contributions
[0037] The scoring distributions are shown in FIGS. 3A-3C. FIGS.
3A, 3B, and 3C are graphs illustrating the distribution of the
different scoring contributions in one embodiment. The scores are
normalized to make it easier to compare the different scoring
contributions, and the normalized frequency refers to relatively
how many documents obtain a particular score.
[0038] 2.3.1 User Rating Score, U(t)
[0039] The importance of user rating contribution should increase
over time, but should also be bounded for large t. Furthermore, to
handle different abusive behaviors in online communities such as
creation of multiple accounts to vote multiple times, other
cramming attempts, and/or to fight online trolls, each user vote,
s.sub.m.sup.j, should be weighted depending on their user
reputation, h.sub.m. The user rating contribution can easily be
extended to vary in different stages of the event detection
process.
[0040] As the system will automatically detect such interactions,
such attempts will result in a lower user interaction score and
consequently a lower effective user rating score. The system allows
the user interaction behavior to be dynamic over time, and the
formulation of the user reputation ensures that the system
automatically responds to such attempts and recalculates the total
score of a content item in response.
[0041] Let s be the user vote by user m on the content piece j,
h.sub.m(t), the reputation of the user m and .omega.(u) be the
weighting function depending of the user reputation h(t). (Note
that .omega.(u) is a different weight than the previously mentioned
.omega..sub.0 and .omega..sub.1.) Furthermore, let
S.sup.j=S.sup.j(t), be the set of users who have voted on a content
piece j. The total user rating score, U.sup.j=U.sup.j(t) for a
content piece j, will then be defined as,
U j ( t ) = n .di-elect cons. S j ( t ) .omega. ( h m ( t ) ) s n j
= n .di-elect cons. S j ( t ) S e , n j , ( 3 ) ##EQU00003##
where s.sup.j.sub.e,n is the effective user rating score. The
weighting function below has been specifically designed to solve
the issues previously mentioned. It gives low weight for users with
low reputation and for the intermediate interval of the user
reputation has an only slightly varying constant weight (Part 1)
and controls the obtained weight for users with higher user
reputation (Part 2). Also as the user reputation varies with time,
the hierarchy in the system is time-dependent and thus also the
effective vote, s.sup.j.sub.e,m, by a user m on the content item
j.
[0042] The weighting function, .omega.(h.sub.m), is therefore
defined as,
.omega. ( h m ) = 1 + ( h m - h m 1 h m 1 ) h m 2 Part 1 exp ( - h
m h m 3 ) Part 2 ( 4 ) ##EQU00004##
where h.sup.i.sub.m are are model parameters.
[0043] 2.3.2 Relevance Score, R(t)
[0044] The relevance score, R(t), measures how relevant an item of
content is to a particular event. An event is a collection of
stories, blogs, user contributions, videos and images and can thus
be considered as a container of content items j. (Typical events
are, for instance, "The Conflict in Lebanon", "The tragic death of
Josh Hancock" or "Tom Cruise and Katie Holmes wedding"). The
relevance score of an item will depend on: [0045] Geographical
overlap between the content item and the event [0046] Temporal
overlap between the content item and the event [0047] Similarity
between the content item and the event as described earlier.
[0048] 2.3.3 Emotional Score, E(t)
[0049] An important element of any story, video or image is the
emotional element, which determines the emotional impact a
particular item of content has. The emotional score, E(t), provides
a quantitive measure on the sentiment, attitude and/or opinion of
piece of content--whether it is positive or negative, and how much.
Sentiment of a content item in this sense can be assessed using a
variety of statistical or other methods, for the most part derived
from corpora of labeled data used to determine features or feature
combinations indicative of particular sentiment values.
[0050] 2.4 Definition of the Scores
[0051] 2.4.1 The Score of a Content Item
[0052] A content item can be a document, an image, a video or any
other type of content in the system. The score of a content item j
is denoted, g.sup.j, and is defined as a weighted sum of its
scoring contributions. Let the vector of the scoring contributions
be defined as
v.sup.j(t)=[U.sup.j(t) R.sup.j(t) E.sup.j(t)],
where v.sub.i.sup.j(t) denotes the i.sup.th element of the vector
v.sup.j(t) The score, g.sup.j, then defined by,
g j ( t ) = g j ( U j ( t ) , R j ( t ) , E j ( t ) ) = i = 1 3
.omega. i j ( U j ( t ) , R j ( t ) , E j ( t ) ) N i v i j ( t ) i
= 1 3 .omega. i j ( U j ( t ) , R i ( t ) , E j ( t ) ) ( 5 )
##EQU00005##
where N.sub.i is the normalization constant of contribution i and
.omega..sub.i.sup.j are the weighting function for the vector
element v.sub.i.sup.j(t). The scores need to be normalized as the
different scores result from different sorts of computation, have
different magnitudes, and are not directly comparable. The
weighting functions used in the scoring model are,
.omega..sub.1.sup.j=.omega..sub.1.sup.0f(U.sup.j(t)) (user rating
score weight) (6)
.omega..sub.2.sup.j=.omega..sub.2.sup.0 (relevance score weight)
(7)
.omega..sub.3.sup.j=.omega..sub.3.sup.0 (emotional score weight)
(8)
where .omega..sub.i.sup.0 is the model parameter for weight of
scoring contribution i. The weighting function, f(U.sup.j(t)), is
defined as,
f ( U j ( t ) ) = { exp ( U j ( t ) - U 1 U 1 ) for U j ( t )
.ltoreq. U 1 exp ( U 1 - U j ( t ) U 1 ) otherwise ( 9 )
##EQU00006##
[0053] FIGS. 4A and 4B are graphs illustrating an example of a user
rating score, U(t), and the corresponding weighting function, f(U),
for U1=20000 as a typical history of user votes, s, in one
embodiment.
[0054] Some examples and basic sensitivity analysis of the score
are presented in FIGS. 5A-5D. FIG. 5A, 5B, 5C, and 5D are graphs
illustrating an example of a total score for a content piece for a
user in one embodiment for U1=20000, w.sup.0.sub.1=0.2,
w.sup.0.sub.2=0.4, and w.sup.0.sub.3=0.4.
[0055] 2.4.2 The Event Score
[0056] Let the user rating score for the event k itself be
U.sup.0.sub.k, the average score for the associated content to
event k, g.sup.ac.sub.k , and .omega..sub.k a weight factor. The
total event score can then be defined as,
gk(t)=.omega..sub.kU.sup.0.sub.k(t)+(1-.omega..sub.k)
g.sup.ac.sub.k(t)
[0057] Note that the total event score is then only given by one
model function 107 .sub.k, and the respective scores
U.sub.k.sup.0(t) and g.sup.-ac.sub.k(t) can be defined via one
SQL-query.
[0058] 3 Description of the Preferred Embodiment
[0059] The system provides users with the ability to access and
contribute content about events of interest to them, defining these
events, tracking the evolution of the events over time, and
organizing the events and the associated content to ensure users
receive the most interesting and engaging experience. An event in
its basic form is a collection of content items that refer to the
same situation or circumstance. The scope of each event will be
dynamic and vary over time. A content item may be a story, an image
and/or a video. Both the content items and the events will be
stored in databases and files, including XML-files.
[0060] The ranking algorithm is distributed throughout the system,
and consists of both on-demand and batch processes, including steps
of both a serial and parallel nature.
TABLE-US-00001 Step I: Defining the scores and/or updating the user
interaction record: A content item is registered at the system: The
content item is matched against existing seeds and events: If the
content item matches the contextual scope of an event: Update user
interaction history for the particuiar user and/or source
submitting the content item If no matching event or seed is found
to an item: .cndot. Update user interaction record for the
particuiar user and/or source submitting the content item .cndot.
Create a new event instance (seed) in the database Calculate
scoring contributions: Calculate the emotional score of the content
item Update the contextual state of the event Calculate the
relevance scores for content items associated with the event A user
interacts with the system: Switch depending on the types of
interaction with the system: A user vote is registered: .cndot.
Update the user interaction record for 1. the user initially
submitting/creating the particular content item and/or event 2. the
user voting on the particular content item and/or event .cndot.
update the user votes A user abuse report is registered: .cndot.
Update the user interaction record for 1. the user initially
submitting on the content item or event 2. the user filing the
abuse report on the content item or event All other user
interactions with the system: Update the user interaction records
for the users affected by the particular interaction Redefine user
reputations of affected users and calculate the user rating score
Recalculate the user reputations for the users affected Calculate
user rating score using the defined user reputations Step II:
Calculate the final scores: Calculate total score for the content
item Update total event score Step III: Update the state of the
system Reprioritize content items for each event and the events
Update the content storage
[0061] 4 Advantages
[0062] The ranking algorithm provides an appropriate prioritization
of the content related to an event, automatically and without
relying on human intervention, and is therefore scalable. The
algorithm utilizes a combination of different scoring contributions
-user rating, relevance or similarity, geography, time, and
emotional impact. The user ratings are weighted by user reputation
to get the necessary slight community hierarchy structure, as it is
well-known that a too-flat authority hierarchy internally can
destroy groups as much as one that is too steep. The scoring
algorithm solves some of the common problems with user ratings by
varying its contribution vis a vis other scoring contributions in a
dynamic manner. The community help decide what is important,
intelligent information retrieval technology help to decide what is
relevant, and emotional analysis help to decide what is impactful.
Therefore the ranking algorithm can assess and prioritize content
along all these three dimensions--community, information retrieval,
emotional analysis -jointly. The ranking algorithm, together with
the event detection model, also enables the swift detection of
breaking events, which any model based on human editing or
community assessment alone will be unable to match.
[0063] The simplicity of the ranking algorithm makes it easy to
implement but also makes the dynamics of the score predictable. The
algorithm can also be easily extended to include additional scoring
contributions, and it can be optimized via only a few model
parameters. The formulation of the ranking algorithm makes it
possible to introduce more complex weighting functions for
different types of content, different users, and different types of
events, varying over time as information about those events
evolves.
[0064] Although the foregoing embodiments have been described in
some detail for purposes of clarity of understanding, the invention
is not limited to the details provided. There are many alternative
ways of implementing the invention. The disclosed embodiments are
illustrative and not restrictive.
* * * * *