U.S. patent application number 12/055618 was filed with the patent office on 2008-07-24 for content monitoring in a high volume on-line community application.
This patent application is currently assigned to International Business Machines Corporation. Invention is credited to Daniel F. Gruhl, Kevin Haas.
Application Number | 20080177834 12/055618 |
Document ID | / |
Family ID | 39618572 |
Filed Date | 2008-07-24 |
United States Patent
Application |
20080177834 |
Kind Code |
A1 |
Gruhl; Daniel F. ; et
al. |
July 24, 2008 |
CONTENT MONITORING IN A HIGH VOLUME ON-LINE COMMUNITY
APPLICATION
Abstract
Disclosed are embodiments a system and method for managing an
on-line community. Electronic postings are pre-screened based on
one or more metrics to determine a risk value indicative of the
likelihood that an individual posting contains objectionable
content. These metrics are based on the profile of a poster,
including various parameters of the poster and/or the poster's
record of objectionable content postings. These metrics can also be
based on the social network profile of a poster, including the
average of various parameters of other users in the poster's social
network and/or a compiled record of objectionable content postings
of other users in the poster's social network. If the risk value is
relatively low, the posting can be displayed to the on-line
community immediately. If the risk value is relatively high,
display of the posting can be delayed until further content
analysis is completed. Finally, if the risk value is above a
predetermined high risk threshold value, the posting can be removed
automatically.
Inventors: |
Gruhl; Daniel F.; (San Jose,
CA) ; Haas; Kevin; (San Jose, CA) |
Correspondence
Address: |
FREDERICK W. GIBB, III;Gibb & Rahman, LLC
2568-A RIVA ROAD, SUITE 304
ANNAPOLIS
MD
21401
US
|
Assignee: |
International Business Machines
Corporation
Armonk
NY
|
Family ID: |
39618572 |
Appl. No.: |
12/055618 |
Filed: |
March 26, 2008 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11622112 |
Jan 11, 2007 |
|
|
|
12055618 |
|
|
|
|
Current U.S.
Class: |
709/204 |
Current CPC
Class: |
Y10S 707/99942 20130101;
Y10S 707/99943 20130101; Y10S 707/99948 20130101; G06Q 10/00
20130101; Y10S 707/99945 20130101 |
Class at
Publication: |
709/204 |
International
Class: |
G06F 15/16 20060101
G06F015/16 |
Claims
1. A system for managing an on-line community, said system
comprising: a database adapted to store a plurality of metrics
based on information related to users of said on-line community,
wherein said information comprises parameters associated with each
of said users, records of objectionable content postings for each
of said users, and social networks of each of said users wherein
said objectionable content is defined by preset standards; and a
content management system in communication with said database and
adapted to monitor postings from users to said on-line community
for said objectionable content, wherein said content management
system comprises a pre-screener adapted to determine, based on at
least one predetermined metric, a risk value that indicates a
likelihood that a given posting by a given user contains said
objectionable content, and wherein said content management system
is further adapted to allow said given posting to be displayed to
said on-line community, if said risk value is below a threshold
value.
2. The system of claim 1, all the limitations of which are
incorporated herein by reference, wherein said at least one
predetermined metric comprises at least one of the following: a
metric based on parameters associated with said given user; a
metric based on a record of objectionable content postings by said
given user; a metric based on average predetermined parameters
associated with other users in a social network of said given user;
a metric based on a compiled record of objectionable content
postings by other users in a social network of said given user,
wherein said other users in said social network have a direct
relationship with said given user; and a metric based on a compiled
record of objectionable content postings by other users in a social
network of said given user, wherein at least some of said other
users in said social network have an indirect relationship with
said given user.
3. The system of claim 1, all the limitations of which are
incorporated herein by reference, wherein said content management
system is further adapted to automatically remove said given
posting of said given user, if said risk value is above a second
threshold value.
4. The system of claim 1, all the limitations of which are
incorporated herein by reference, wherein said content management
system further comprises a content filter adapted to analyze
content of each of said postings to determine an objectionable
content score, wherein an order in which each of said postings is
analyzed is based on said risk value.
5. The system of claim 1, all the limitations of which are
incorporated herein by reference, wherein said content management
system further comprises a content filter adapted to analyze
content of each of said postings to determine an objectionable
content score, wherein said objectionable content score is weighted
based on said risk value.
6. A method for managing an on-line community, said method
comprising: receiving postings to said on-line community from
users; prior to analyzing each of said postings for objectionable
content, as determined by preset standards, determining for each
given posting from each given user a risk value based on at least
one predetermined metric, wherein said risk value indicates a
likelihood that said given posting by said given user contains said
objectionable content; and, if said risk value is below a threshold
value, displaying said given posting to said on-line community.
7. The method of claim 6, all the limitations of which are
incorporated herein by reference, wherein said at least one
predetermined metric comprises a metric based on parameters
associated with said given user.
8. The method of claim 6, all the limitations of which are
incorporated herein by reference, wherein said at least one
predetermined metric comprises a metric based on a record of
objectionable content postings by said given user.
9. The method of claim 6, all the limitations of which are
incorporated herein by reference, wherein said at least one
predetermined metric comprises a metric based on average parameters
associated with other users in a social network of said given
user.
10. The method of claim 6, all the limitations of which are
incorporated herein by reference, wherein said at least one
predetermined metric comprises a metric based on a compiled record
of objectionable content postings by other users in a social
network of said given user, wherein said other users in said social
network have a direct relationship with said given user.
11. The method of claim 6, all the limitations of which are
incorporated herein by reference, wherein said at least one
predetermined metric comprises a metric based on a compiled record
of objectionable content postings by other users in a social
network of said given user, wherein at least some of said other
users in said social network have an indirect relationship with
said given user.
12. The method of claim 6, all the limitations of which are
incorporated herein by reference, further comprising, if said risk
value is above said threshold value, automatically removing said
posting.
13. The method of claim 6, all the limitations of which are
incorporated herein by reference, further comprising, if said risk
value is above said threshold value, requesting posting
confirmation from said given user and notifying said given user of
ramifications for violations of said standards.
14. The method of claim 6, all the limitations of which are
incorporated herein by reference, further comprising, dynamically
determining an order for analyzing said postings for said
objectionable content, wherein said order is based on said risk
value of each of said postings.
15. The method of claim 6, all the limitations of which are
incorporated herein by reference, further comprising, analyzing
each of said postings to determine an objectionable content score,
wherein said objectionable content score is weighted based on said
risk value.
16. A computer program product comprising a computer useable medium
having a computer readable program, wherein said computer readable
program when executed causes said computer to perform a method an
on-line community, said method comprising: prior to analyzing
postings to said on-line community from users for objectionable
content, as determined by preset standards, determining for each
given posting from each given user a risk value based on at least
one predetermined metric, wherein said risk value indicates a
likelihood that said given posting by said given user contains said
objectionable content; and if said risk value is below a threshold
value, displaying said given posting to said on-line community.
17. The computer program product of claim 16, all the limitations
of which are incorporated herein by reference, wherein said at
least one predetermined metric comprises at least one of the
following metrics: a metric based on parameters associated with
said given user; a metric based on a record of objectionable
content postings by said given user; a metric based on average
parameters associated with other users in a social network of said
given user; and a metric based on a compiled record of
objectionable content postings by other users in a social network
of said given user, wherein at least some of said other users in
said social network have an indirect relationship with said given
user.
18. The computer program product of claim 16, all the limitations
of which are incorporated herein by reference, wherein said method
further comprises, if said risk value is above said threshold
value, automatically removing said posting.
19. The computer program product of claim 16, all the limitations
of which are incorporated herein by reference, wherein said method
further comprises, dynamically determining an order for analyzing
said postings for said objectionable content, wherein said order is
based on said risk value of each of said postings.
20. The computer program product of claim 16, all the limitations
of which are incorporated herein by reference, wherein said method
further comprises, analyzing each of said postings to determine an
objectionable content score, wherein said objectionable content
score is weighted based on said risk value.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. application Ser.
No. 11/622,112 filed Jan. 11, 2007, the complete disclosure of
which, in its entirety, is herein incorporated by reference.
BACKGROUND
[0002] 1. Field of the Invention
[0003] The embodiments of the invention generally relate to on-line
communities and, more particularly, to a system and method for
filtering the content of postings to on-line communities.
[0004] 2. Description of the Related Art
[0005] Online communities allow groups of people to communicate and
interact via various online media, such as blogs, wikis, internet
forums, chat rooms, instant messaging, electronic mail lists, etc.
Each of these online communities may have its own community
standards related to the content of online media postings. For
example, as with real world media, such online media may have
standards to prevent the dissemination of objectionable content as
defined by each community's own preset or pre-established
standards. These standards are typically enforced by way of manual
and/or automated content review systems which are used to filter
out and remove such objectionable postings. However, as these
online communities continue to increase in size, limitations in the
ability of current manual and/or automated content review systems
to remove objectionable material from community websites in a
timely manner have become increasingly apparent. Thus, additional
workflows and methods are needed to ensure that community standards
are maintained.
SUMMARY
[0006] In view of the foregoing, disclosed herein are embodiments a
system and an associated method for managing an on-line community
and, particularly, for monitoring and filtering the content of
electronic postings to an on-line community (e.g., to a community
website). The system and method incorporate a pre-screening process
in which individual postings are pre-screened based on one or more
predetermined metrics to determine a risk value indicative of the
likelihood that a posting contains objectionable content. These
metrics are not content based. Rather, they can be based on the
profile of an individual poster, including various parameters of
the poster and/or the poster's record of objectionable content
postings. These metrics can also be based on the social network
profile of an individual poster, including the average of various
parameters of other users in the poster's social network and/or a
compiled record of objectionable content postings of other users in
the poster's social network. If the risk value is relatively low
(e.g., below a low risk threshold value), the posting can be
displayed on the website immediately, thereby keeping delay low. If
the risk value is relatively high (e.g., above the low threshold
value), display of the posting can be delayed until further
automated and/or manual content analysis is completed. Finally, if
the risk value is above a high risk threshold value, removal of the
posting can be made automatic without requiring additional
analysis.
[0007] More particularly, disclosed herein are embodiments of a
system for managing an on-line community and, particularly, for
monitoring and filtering the content of postings to an on-line
community. The system of the invention can comprise a database and
a content management system (e.g., a computer or computer
program-based web server or wireless application protocol (WAP)
server) in communication with the database.
[0008] The database can be adapted to compile and store a plurality
of metrics based on collected information related to users of a
website (e.g., information related to members of the on-line
community). This information can comprise various parameters
associated with each of the users (i.e., posters), such as age,
gender, educational background, location, length of time as a
member of the online community, etc. This information can also
include records of objectionable content postings for each of the
users onto the website and, if practicable, onto other websites.
The information can also include the social networks of each of the
users.
[0009] The content management system (e.g., web server, WAP server,
etc.) can be adapted to receive postings from users, to monitor
those postings for objectionable content, as defined by preset
standards (e.g., community standards), and to determine whether or
not to display the postings (e.g., on a community website). In
order to accomplish this, the system of the invention also
comprises both a pre-screener and a content filter, which can be
integral components of the content management system or can be
separate components in communication with the content management
system.
[0010] The pre-screener is adapted to determine, based on at least
one predetermined metric, a risk value that indicates the
likelihood that a given posting by a given user contains
objectionable content. The metric(s) that are used to determine
this risk value are not content-based, but rather are based on the
individual poster's profile and/or the social network profile of
the individual poster. Specifically, the following are examples of
metrics that may be used to determine a risk value of a given
posting by a given user: (1) a metric based on parameters
associated with the given user; (2) a metric based on a record of
objectionable content postings by the given user; (3) a metric
based on average predetermined parameters associated with other
users in a social network of the given user; and (4) a metric based
on a compiled record of objectionable content postings by other
users in a social network of the given user. A given user's social
network can be limited to other users within the online community
with which the given user has a direct relationship (e.g., other
users on the given user's contacts or friends list, other users
with which the given user has exchanged instant messages, other
users with which the given user has communicated on a forum, etc.).
The user's social network may also be expanded to include other
users with which the given user has an indirect relationship (e.g.,
contacts of contacts, friends of friends, etc.).
[0011] The content management system (CMS) (e.g., web server, WAP
server, etc.) can further be adapted to perform different
processes, depending upon whether or not the risk value of a given
posting, as determined by the pre-screener, is below a
predetermined low risk threshold value, above a predetermined high
risk threshold value or somewhere in between. For example, to
minimize delay for low risk postings, the CMS can be adapted to
allow the given posting to be immediately displayed on the website,
if the risk value is below the predetermined low risk threshold
value. To minimize the risk of exposure of online community members
to objectionable content, the CMS can be adapted to automatically
remove a given posting from the website without further review, if
the risk value is above a predetermined high risk threshold value.
However, if the risk value is above the low risk threshold value
(e.g., between the low risk threshold value and the high risk
threshold value), the CMS can be adapted to request a posting
confirmation and/or to analyze the posting itself for objectionable
content.
[0012] Specifically, as mentioned above, the system of the
invention can comprise a content filter. This content filter can be
adapted to analyze the content of each of the postings to determine
an objectionable content score, which can optionally be weighted
based on the risk value. The content management system can further
be adapted to display or remove a posting from the website, based
on this weighted objectionable content score. The order in which
each of the postings is analyzed automatically by the content
filter or, for that matter, manually by a website administrator can
be dynamically determined by the content management system based on
the risk value.
[0013] Also disclosed are embodiments of a method for managing an
on-line community and, particularly, for monitoring and filtering
the content of postings to an on-line community.
[0014] The method can comprise receiving from users (e.g., from
members of an online community) postings to the on-line community
(e.g., to a community website). Then, prior to analyzing each of
the postings for objectionable content, as determined by preset
community standards, a risk value is determined for each given
posting from each given user. This risk value indicates a
likelihood that the given posting by the given user contains
objectionable content and is determined based on at least one
predetermined metric.
[0015] The metric(s) that are used to determine this risk value are
not content-based, but rather are based on the individual poster's
profile and/or the social network profile of the individual poster.
Specifically, the following are examples of metrics that may be
used to determine a risk value of a given posting by a given user:
(1) a metric based on parameters associated with the given user;
(2) a metric based on a record of objectionable content postings by
the given user; (3) a metric based on average predetermined
parameters associated with other users in a social network of the
given user; and (4) a metric based on a compiled record of
objectionable content postings by other users in a social network
of the given user. A given user's social network can be limited to
other users within the online community with which the given user
has a direct relationship (e.g., other users on the given user's
contacts or friends list, other users with which the given user has
exchanged instant messages, other users with which the given user
has communicated on a forum, etc.). The user's social network may
also be expanded to include other users with which the given user
has an indirect relationship (e.g., contacts of contacts, friends
of friends, etc.).
[0016] Once the risk value is determined, then different method
steps are performed depending upon whether or not the risk value of
a given posting is below a predetermined low risk threshold value,
above a predetermined high risk threshold value or somewhere in
between. For example, if the risk value is below the predetermined
low risk threshold value, then to minimize delay for low risk
postings, a given posting can be immediately displayed on the
website. Whereas, if the risk value is above a predetermined high
risk threshold value, then to minimize the risk of exposure of
online community members to objectionable content, a given posting
can be automatically removed from the site without further review.
However, if the risk value is above the low risk threshold value
(e.g., between the low risk threshold value and the high risk
threshold value), additional method steps can be performed.
[0017] For example, a posting confirmation can be requested from
the given user. This request can include a notice setting out the
ramifications for violations of the community standards.
Additionally, the order in which each of the postings is to be
analyzed manually (e.g., by a web administrator) and/or
automatically (e.g., by a content filter) can be dynamically
determined based on the risk value. Then, the content of each of
the postings can be analyzed to determine an objectionable content
score, which can optionally be weighted based on the risk value.
Based on this weighted objectionable content score, a final
decision can be made regarding displaying the posting or removing
it from the website.
[0018] These and other aspects of the embodiments of the invention
will be better appreciated and understood when considered in
conjunction with the following description and the accompanying
drawings. It should be understood, however, that the following
descriptions, while indicating preferred embodiments of the
invention and numerous specific details thereof, are given by way
of illustration and not of limitation. Many changes and
modifications may be made within the scope of the embodiments of
the invention without departing from the spirit thereof, and the
embodiments of the invention include all such modifications.
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] The embodiments of the invention will be better understood
from the following detailed description with reference to the
drawings, in which:
[0020] FIG. 1 is a schematic box diagram illustrating an embodiment
of a system of the invention;
[0021] FIG. 2 is a flow diagram illustrating an embodiment of a
method of the invention; and
[0022] FIG. 3 is a schematic representation of a computer system
suitable for implementing the method of the invention as described
herein.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0023] The embodiments of the invention and the various features
and advantageous details thereof are explained more fully with
reference to the non-limiting embodiments that are illustrated in
the accompanying drawings and detailed in the following
description. It should be noted that the features illustrated in
the drawings are not necessarily drawn to scale. Descriptions of
well-known components and processing techniques are omitted so as
to not unnecessarily obscure the embodiments of the invention. The
examples used herein are intended merely to facilitate an
understanding of ways in which the embodiments of the invention may
be practiced and to further enable those of skill in the art to
practice the embodiments of the invention. Accordingly, the
examples should not be construed as limiting the scope of the
embodiments of the invention.
[0024] As mentioned above, online communities allow groups of
people to communicate and interact via various online media, such
as blogs, wikis, internet forums, chat rooms, instant messaging,
electronic mail lists, etc. Each of these online communities may
have its own community standards related to the content of online
media postings. For example, as with real world media, such online
media may have standards to prevent the dissemination of
objectionable content as defined (preset, predetermined) by each
community's standards. However, as these online communities
continue to increase, workflows and methods are needed to ensure
that community standards are maintained. For the most part, the
filtering methods used are limited to contextual-based
applications, such as image or text analysis of each posting's
content. However, the challenge of identifying and dealing with
objectionable information postings to online communities differs
from traditional content identification problems in that the
figures of merit are based on "Delay" and "Bad Information On Site"
(BIoS) time, rather than traditional precision and recall. More
particularly, content of one of two types (i.e., good content
(C.sub.g) and bad content (C.sub.b)) can be posted to a web site at
time T.sub.p and can be displayed at time T.sub.d. If the content
is objectionable, it can be removed from the site at time T.sub.r.
The cost of the solution can be defined as follows. The delay cost
is
Delay g = C g T d - T p ##EQU00001##
without consideration of bad content. The cost of Bad Information
(i.e., objectionable content) on the web site (BIoS) is
BIoS = C b Badness * ( T r - T d ) ##EQU00002##
where Badness is an indication of how much of a problem having the
objectionable content posted will cause among the members. For
simplicity, Badness can be based on lack of unity among the members
of the online community. The goal of a web site managing system and
method should be to reduce both Delay.sub.g and BIoS as much as
possible. The relative importance of these two will vary based on
the application
[0025] In attempting to meet this goal, it has been determined that
the probability of objectionable content being uploaded onto a web
site (e.g., into an online community forum) can be estimated
without deep analysis of the information, based on prior behavior
of the user and/or the social network of the user. That is,
historically well-behaved users (i.e., users that historically post
unobjectionable content) and historically misbehaved users (i.e.,
users that historically post objectionable content) tend to
maintain their pattern of behavior. Additionally, both well-behaved
users and misbehaved users tend to belong to independent social
networks and the individuals within any given social network tend
to have similar behavioral patterns. As a result, the likelihood
that an individual will be well-behaved or misbehaved user can be
inferred from the historic behavior of the user as well as the
behavior of others in that user's social network. Additionally,
some classes of users (e.g., based on age, gender, length of time
as member of the online community, etc.) are more or less likely to
offend. For example, those who have been members of an online
community for a relatively long period of time are less likely to
post objectionable content, than those that have been members for a
relatively short period of time.
[0026] In view of the foregoing, disclosed herein are embodiments a
system and an associated method for managing an on-line community
and, particularly, for monitoring and filtering the content of
electronic postings to an on-line community (e.g., to a community
website). The system and method incorporate a pre-screening process
in which individual postings are pre-screened based on one or more
predetermined metrics (i.e., scores) to determine a risk value
indicative of the likelihood that a posting contains objectionable
content. These metrics or scores are not content based. Rather,
they can be based on the profile of an individual poster, including
various parameters of the poster and/or the poster's record of
objectionable content postings. These metrics can also be based on
the social network profile of an individual poster, including the
average of various parameters of other users in the poster's social
network and/or a compiled record of objectionable content postings
of other users in the poster's social network. If the risk value is
relatively low (e.g., below a low risk threshold value), the
posting can be displayed on the website immediately, thereby
keeping delay low. If the risk value is relatively high (e.g.,
above the low threshold value), display of the posting can be
delayed until further automated and/or manual content analysis is
completed. Finally, if the risk value is above a high risk
threshold value, removal of the posting can be made automatic
without requiring additional analysis.
[0027] More particularly, referring to FIG. 1, disclosed herein are
embodiments of a system 100 for managing an on-line community 110
(such as a website containing blogs, wikis, forums, chat rooms,
etc.) and, particularly, for monitoring and filtering the content
of electronic postings to an on-line community. The system 100 can
comprise a database 130 and a content management system 120 (e.g.,
a computer or computer program based web server or wireless
application protocol (WAP) server).
[0028] The database 130 can be adapted to compile and store a
plurality of metrics based on collected information related to
users of the on-line community (e.g., information related to
members of the on-line community). This information can comprise
various parameters associated with each of the users (i.e.,
posters), such as age, gender, educational background, location,
etc. This information can also include records of objectionable
content postings for each of the users onto the website and, if
practicable, onto other websites. The information can also include
the social networks of each of the users.
[0029] The information that is stored in the database 130 can be
collected using known techniques. For example, user parameters and
social networks (e.g., friend lists, contact lists, etc.) can be
input and periodically updated by users via remote computers
102a-b. Alternatively, user parameters can be determined by
conventional mining applications that are adapted to scan the
contents of the website for such information. Similarly, social
network information can be determined by applications adapted
maintain records of online communications between users (e.g.,
records of instant messaging, records of forum discussions, etc.)
and to determine direct and indirect relationships based on those
records.
[0030] The content management system 120 can be in communication
with the database 130 and can be adapted to receive electronic
postings (e.g., text, images, video, audio, etc. postings) from
users, to monitor those postings for objectionable content, as
defined by preset standards (e.g., community standards), and to
determine whether or not to display the postings to the on-line
community 110 (e.g., on the community website). In order to
accomplish this, the system 100 of the invention can comprise both
a pre-screener 121 and a content filter 122. Those skilled in the
art will recognize that the pre-screener 121 and content filter 122
can be integral components of the content management system 120, as
shown, or can be comprise separate components in communication with
the content management system 120.
[0031] The pre-screener 121 is adapted to determine, based on at
least one predetermined metric, a risk value that indicates the
likelihood that a given posting by a given user contains
objectionable content. The metric(s) are scores that are used to
determine the risk value. These metrics are not content-based, but
rather are based on the individual poster's profile and/or the
social network profile of the individual poster. That is, a points
system can be predetermined, wherein more points will be assigned
(e.g., on a scale of 1-100, or any other scale) to information
about a given user (or about that given user's social network) if
the information predicts objectionable content postings by the
user. Specifically, the following are examples of metrics or scores
that may be used to determine a risk value of a given posting by a
given user: (1) a metric based on parameters associated with the
given user (e.g., a male user age 16-24 may be more likely to post
objectionable content than a female user 75-90 and thus such young
male user would receive a relatively higher score based on user
parameters than an older female); (2) a metric based on a record of
objectionable content postings by the given user (e.g., a user that
has posted a number of objectionable postings in the past is more
likely to post objectionable postings in the future and thus such a
user would receive a relatively higher score); (3) a metric based
on average predetermined parameters associated with other users in
a social network of the given user (e.g., a male in a social
network with mostly other males having an average age between 16
and 24 may be more likely to post objectionable content than a
female in a social network with mostly other females having an
average age between 75 and 90 and thus such a male user would
receive a relatively higher score; and (4) a metric based on a
compiled record of objectionable content postings by other users in
a social network of the given user (e.g., a user that is in a
social network with other users that regularly post objectionable
postings is more likely to post objectionable postings than a user
that associates other users that do not post objectionable postings
and thus such a user would receive a relatively higher score). It
should be noted that a given user's social network can be limited
to other users within the online community with which the given
user has a direct relationship (e.g., other users on the given
user's contacts or friends list, other users with which the given
user has exchanged instant messages, other users with which the
given user has communicated on a forum, etc.). The user's social
network may also be expanded to include other users with which the
given user has an indirect relationship (e.g., contacts of
contacts, friends of friends, etc.).
[0032] The content management system (CMS) 120 can further be
adapted to perform different processes, depending upon whether or
not the risk value of a given posting, as determined by the
pre-screener 121, is below a predetermined low risk threshold
value, above a predetermined high risk threshold value or somewhere
in between. For example, to minimize delay for low risk postings,
the CMS 120 can be adapted to allow the given posting to be
immediately displayed to the on-line community 110 (e.g., displayed
on a website), if the risk value is below the predetermined low
risk threshold value. To minimize the risk of exposure of online
community members to objectionable content, the CMS 120 can be
adapted to automatically remove a given posting from the on-line
community 110 (e.g., from the website) without further review, if
the risk value is above a predetermined high risk threshold value.
However, if the risk value is above the low risk threshold value
(e.g., between the low risk threshold value and the high risk
threshold value), the CMS 120 can be adapted to request a posting
confirmation and/or to analyze the posting itself for objectionable
content.
[0033] Specifically, as mentioned above, the system 100 of the
invention can comprise a content filter 122. This content filter
122 can be adapted to analyze the content of each of the postings
to determine an objectionable content score. For example, the
content filter 122 can be implemented using any conventional
training based classifier, such as a naive Bayes classifier or a
similarly-based (SB) classifier. However, the score can optionally
also be weighted based on the risk value. Thus, scoring of uploaded
content can be accomplished using a weighted fusion of risk value
scores (e.g., based on a user's individual behavior or social
network) combined with techniques using analytics based on content
analysis (e.g., text, image, video, and/or voice analysis) to
determine the probability that the posting contains objectionable
material. The CMS 120 can further be adapted to display or remove a
posting from the on-line community 110 (e.g., a community website),
based on this weighted objectionable content score. Additionally,
the order in which each of the postings is analyzed by the content
filter or, for that matter, manually by a website administrator can
be dynamically determined by the CMS 120 based on the risk
value.
[0034] Thus, in operation, posts to on-line communities 110 enter
the system 100 from a number of sources 102a-c. Some of these posts
may contain unobjectionable material and some may contain
objectionable material, as defined by preset community standards.
Each of the posts will be pre-screened by the pre-screener 121 and
a decision is made as to the degree of risk (i.e., the risk value)
of each particular post. If the risk is low, the content can be
displayed to the on-line community 110 (e.g. placed on the
community website) immediately, keeping Delay low. If the content
needs further evaluation and review (i.e., the risk is higher than
a threshold value), display of the content can be delayed and the
posting can be subjected to more rigorous computation (e.g., by the
content filter 122 and/or by human) review prior to displaying it,
keeping BIoS low.
[0035] Referring to FIG. 2, also disclosed are embodiments of a
method for managing an on-line community and, particularly, for
monitoring and filtering the content of electronic postings in
on-line communities.
[0036] The method can comprise receiving from users (e.g., from
members of an online community) electronic postings (e.g., text,
video, images, audio, etc.) (202). These postings can be, for
example, to a community website containing blogs, wikis, forums,
chat rooms, etc. Then, prior to analyzing each of the postings for
objectionable content, as determined by preset community standards,
a risk value is determined for each given posting from each given
user (204). This risk value indicates the likelihood that the given
posting by the given user contains objectionable content and is
determined based on at least one predetermined metric.
[0037] The metric(s) that are used to determine this risk value are
not content-based, but rather are based on the individual poster's
profile and/or the social network profile of the individual poster.
Specifically, the following are examples of metrics that may be
used to determine a risk value of a given posting by a given user:
(1) a metric based on parameters associated with the given user
(205); (2) a metric based on a record of objectionable content
postings by the given user (206); (3) a metric based on average
predetermined parameters associated with other users in a social
network of the given user (207); and (4) a metric based on a
compiled record of objectionable content postings by other users in
a social network of the given user (208). A given user's social
network can be limited to other users within the online community
with which the given user has a direct relationship (e.g., a first
layer of relationships--other users on the given user's contacts or
friends list, other users with which the given user has exchanged
instant messages, other users with which the given user has
communicated on a forum, etc.). The user's social network may also
be expanded to include other users with which the given user has an
indirect relationship (e.g., second layer of
relationships--contacts of contacts, friends of friends, etc.).
[0038] Once the risk value is determined, then different method
steps are performed depending upon whether or not the risk value of
a given posting is below a predetermined low risk threshold value,
above a predetermined high risk threshold value or somewhere in
between (210). For example, if the risk value is below the
predetermined low risk threshold value, then to minimize delay for
low risk postings, a given posting can be immediately displayed to
the on-line community (e.g., on the community website) (212).
Whereas, if the risk value is above a predetermined high risk
threshold value, then to minimize the risk of exposure of online
community members to objectionable content, a given posting can be
automatically removed from the website without further review
(214). However, if the risk value is above the low risk threshold
value (e.g., between the low risk threshold value and the high risk
threshold value), additional method steps can be performed
(215-219).
[0039] For example, a posting confirmation can be requested from
the given user (215). This request can include a notice setting out
the ramifications for violations of the community standards. That
is, if a posting appears suspicious to the initial set of filters
(i.e., has a relatively high risk value), the poster can be asked
"are you sure?" with some kind of a notation that offenders will be
dealt with harshly. This confirmation is roughly analogous to the
theory that people are less likely to vandalize a subway stop with
a closed circuit TV visible.
[0040] Additionally, the order in which each of the postings is to
be analyzed manually (e.g., by a web administrator) and/or
automatically (e.g., by a content filter) can be dynamically
determined based on the risk value (216-217). Such dynamic ordering
may allow the human or automated analyzer to focus on the
higher-risk content first, and the low-risk content, already
displayed content, will be reviewed as time permits. This ordering
process can allow a fixed quantity of human analyzers to be
maximally effective on reducing BIoS.
[0041] During the analysis process (218), the content of each of
the postings can be analyzed to determine an objectionable content
score. For example, a conventional training based classification
technique, such as a naive Bayes classification technique or a
similarly-based (SB) classification technique, can be used to
determine a score that indicates the probability that the content
of the posting is objectionable, based on information contained in
the posting (e.g., based on an analysis of the text, images,
videos, and/or voices contained in the posting). This score can
optionally also be weighted based on the previously determined risk
value (219). For example, the following exemplary formula can be
applied: POC=0.5*I0+0.33*<I1>+0.17*<I2>, where the
probability of objectionable content is weighted by 50% of the
user's behavior (I0), 33% of the average score (I1) of the user's
first layer social network (e.g., contact list), and 17% of the
average score (I2) of the user's second layer social network (e.g.,
contacts of contacts). Thus, scoring of uploaded content can be
accomplished using a weighted fusion of a risk value (e.g., a value
based on a user's individual behavior or social network) combined
with a score based on content analysis (e.g., text, image, video,
and/or voice analysis) to determine the probability that the
posting contains objectionable material. Based on this weighted
objectionable content score, a final decision can be made regarding
displaying the posting or removing it from the website (220).
[0042] The embodiments of the invention can take the form of an
entirely hardware embodiment, an entirely software embodiment or an
embodiment including both hardware and software elements. In a
preferred embodiment, the invention is implemented in software,
which includes but is not limited to firmware, resident software,
microcode, etc.
[0043] Furthermore, the embodiments of the invention can take the
form of a computer program product accessible from a
computer-usable or computer-readable medium providing program code
for use by or in connection with a computer or any instruction
execution system. For the purposes of this description, a
computer-usable or computer readable medium can be any apparatus
that can comprise, store, communicate, propagate, or transport the
program for use by or in connection with the instruction execution
system, apparatus, or device.
[0044] The medium can be an electronic, magnetic, optical,
electromagnetic, infrared, or semiconductor system (or apparatus or
device) or a propagation medium. Examples of a computer-readable
medium include a semiconductor or solid state memory, magnetic
tape, a removable computer diskette, a random access memory (RAM),
a read-only memory (ROM), a rigid magnetic disk and an optical
disk. Current examples of optical disks include compact disk-read
only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
[0045] A data processing system suitable for storing and/or
executing program code will include at least one processor coupled
directly or indirectly to memory elements through a system bus. The
memory elements can include local memory employed during actual
execution of the program code, bulk storage, and cache memories
which provide temporary storage of at least some program code in
order to reduce the number of times code must be retrieved from
bulk storage during execution.
[0046] Input/output (I/O) devices (including but not limited to
keyboards, displays, pointing devices, etc.) can be coupled to the
system either directly or through intervening I/O controllers.
Network adapters may also be coupled to the system to enable the
data processing system to become coupled to other data processing
systems or remote printers or storage devices through intervening
private or public networks. Modems, cable modem and Ethernet cards
are just a few of the currently available types of network
adapters.
[0047] A representative hardware environment for practicing the
embodiments of the invention is depicted in FIG. 3. This schematic
drawing illustrates a hardware configuration of an information
handling/computer system in accordance with the embodiments of the
invention. The system comprises at least one processor or central
processing unit (CPU) 10. The CPUs 10 are interconnected via system
bus 12 to various devices such as a random access memory (RAM) 14,
read-only memory (ROM) 16, and an input/output (I/O) adapter 18.
The I/O adapter 18 can connect to peripheral devices, such as disk
units 11 and tape drives 13, or other program storage devices that
are readable by the system. The system can read the inventive
instructions on the program storage devices and follow these
instructions to execute the methodology of the embodiments of the
invention. The system further includes a user interface adapter 19
that connects a keyboard 15, mouse 17, speaker 24, microphone 22,
and/or other user interface devices such as a touch screen device
(not shown) to the bus 12 to gather user input. Additionally, a
communication adapter 20 connects the bus 12 to a data processing
network 25, and a display adapter 21 connects the bus 12 to a
display device 23 which may be embodied as an output device such as
a monitor, printer, or transmitter, for example.
[0048] In view of the foregoing, disclosed herein are embodiments a
system and an associated method for managing an on-line community
and, particularly, for monitoring and filtering the content of
electronic postings to on-line communities. The system and method
incorporate a pre-screening process in which individual postings
are pre-screened based on one or more predetermined metrics (i.e.,
scores) to determine a risk value indicative of the likelihood that
a posting contains objectionable content. These metrics or scores
are not based on content analytics. Rather, they are based on the
profile of an individual poster, including various parameters of
the poster and/or the poster's record of objectionable content
postings. These metrics can also be based on the social network
profile of an individual poster, including the average of various
parameters of other users in the poster's social network and/or a
compiled record of objectionable content postings of other users in
the poster's social network. If the risk value is relatively low
(e.g., below a low risk threshold value), the posting can be
displayed to the on-line community (e.g., on the community website)
immediately, thereby keeping delay low. If the risk value is
relatively high (e.g., above the low threshold value), display of
the posting can be delayed until further automated and/or manual
content analysis is completed. Finally, if the risk value is above
a high risk threshold value, removal of the posting can be made
automatic without requiring additional analysis.
[0049] The foregoing description of the specific embodiments will
so fully reveal the general nature of the invention that others
can, by applying current knowledge, readily modify and/or adapt for
various applications such specific embodiments without departing
from the generic concept, and, therefore, such adaptations and
modifications should and are intended to be comprehended within the
meaning and range of equivalents of the disclosed embodiments. It
is to be understood that the phraseology or terminology employed
herein is for the purpose of description and not of limitation.
Therefore, those skilled in the art will recognize that the
embodiments of the invention can be practiced with modification
within the spirit and scope of the appended claims.
* * * * *