U.S. patent application number 14/303621 was filed with the patent office on 2014-12-18 for method and system for detection, classification and prediction of user behavior trends.
The applicant listed for this patent is Flytxt B.V. Invention is credited to Santanu Chaudhury, Noopur Jain, Prateek Kapadia, Jobin Wilson.
Application Number | 20140372175 14/303621 |
Document ID | / |
Family ID | 51207626 |
Filed Date | 2014-12-18 |
United States Patent
Application |
20140372175 |
Kind Code |
A1 |
Jain; Noopur ; et
al. |
December 18, 2014 |
Method and system for detection, classification and prediction of
user behavior trends
Abstract
A method and system for detection, classification and prediction
of user behavior trends using correspondence analysis is disclosed.
The method and system reduces the n-dimensional feature space to
lower dimensional space for easy processing, improved quality of
emerging clusters and superior prediction accuracies. Further, the
method applies the correspondence analysis so that each user is
assigned with a new coordinate in the lower dimension which
maintains a similarity, difference and the relationship between the
variables. Once the correspondence analysis is completed,
clustering or grouping of the coordinates based on the similar
trends of the users is performed. Further, unlabeled cluster
members are assigned class membership proportional to the labeled
samples in the cluster. Finally, the method predicts the future
actions of the users based on the past trends that are observed
from the labeled clusters.
Inventors: |
Jain; Noopur; (Bahraich,
IN) ; Chaudhury; Santanu; (New Delhi, IN) ;
Wilson; Jobin; (Kerala, IN) ; Kapadia; Prateek;
(Mumbai, IN) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Flytxt B.V |
|
|
|
|
|
Family ID: |
51207626 |
Appl. No.: |
14/303621 |
Filed: |
June 13, 2014 |
Current U.S.
Class: |
705/7.31 |
Current CPC
Class: |
H04W 84/047 20130101;
H04W 76/12 20180201; G06Q 30/0202 20130101 |
Class at
Publication: |
705/7.31 |
International
Class: |
G06Q 30/02 20060101
G06Q030/02 |
Foreign Application Data
Date |
Code |
Application Number |
Jun 13, 2013 |
IN |
258/CHE/2013 |
Claims
1. A method for detection of user behaviour trends, wherein the
method comprises of performing pre-processing and feature selection
on raw data by a cluster master, wherein the raw data comprises of
data related to temporal behaviour of a user; obtaining trend data
from the raw data by the cluster master; reducing dimensionality of
the raw data by the cluster master to a lower dimension using
correspondence analysis, wherein the data with the lower dimension
causes users with similar behaviour to be closer to each other than
those who are dissimilar; performing clustering on the data with
the lower dimension by the cluster master based on attributes of
the user; and assigning at least one label to the clustered data by
the cluster master.
2. The method, as claimed in claim 1, wherein pre-processing and
feature selection on the raw data comprises determining the
attributes of the users.
3. The method, as claimed in claim 1, wherein trend data comprises
behaviour that changes over time.
4. The method, as claimed in claim 1, wherein assigning at least
one label to the clustered data is based on label information of
previous users according to actions taken by the users
previously.
5. The method, as claimed in claim 1, wherein the method further
comprises of predicting future actions of the users based on the
labeled clustered data.
6. The method, as claimed in claim 5, wherein the method further
comprises of augmenting the predictions of future actions by
generating confidence measures based on class membership
proportional to the labeled clustered data.
7. The method, as claimed in claim 1, wherein the method further
comprises of applying association rule mining on the clustered data
to discover at least one rule; and using the at least one
discovered rule for user targeted applications.
8. The method, as claimed in claim 7, wherein applying association
rule mining on the clustered data to discover at least one rule
comprises of finding relationships between features of users in a
cluster features of users who were previously converted by
historical campaigns and features of previous campaigns themselves;
mining underlying rules in the clustered data; and discovering
defining attributes of each campaign, relationship of attributes of
each campaign, other attributes of the campaign and previously
converted users.
9. The method, as claimed in claim 1, wherein the method further
comprises of detecting unusual events based on the raw data.
10. The method, as claimed in claim 1, wherein the raw data is at
least one of numerical multinomial data; and an array having
n-dimensions, where the raw data comprises of continuous individual
features.
11. A computer program product comprising computer executable
program code recorded on a computer readable non-transitory storage
medium, said computer executable program code when executed,
causing a method for detection, classification and prediction of
user behaviour trends, comprising: performing pre-processing and
feature selection on raw data, wherein the raw data comprises of
data related to temporal behaviour of a user; obtaining trend data
from the raw data; reducing dimensionality of the raw data to a
lower dimension using correspondence analysis, wherein the data
with the lower dimension causes users with similar behaviour to be
closer to each other than those who are dissimilar; performing
clustering on the data with the lower dimension based on attributes
of the user; and assigning at least one label to the clustered
data.
12. The computer program product, as claimed in claim 11, wherein
pre-processing and feature selection on the raw data comprises
determining the attributes of the users.
13. The computer program product, as claimed in claim 11, wherein
trend data comprises behaviour that changes over time.
14. The computer program product, as claimed in claim 11, wherein
assigning at least one label to the clustered data is based on
label information of previous users according to actions taken by
the users previously.
15. The computer program product, as claimed in claim 11, wherein
the method further comprises of predicting future actions of the
users based on the labeled clustered data.
16. The computer program product, as claimed in claim 15, wherein
the method further comprises of augmenting the predictions of
future actions by generating confidence measures based on class
membership proportional to the labeled clustered data.
17. The computer program product, as claimed in claim 11, wherein
the method further comprises of applying association rule mining on
the clustered data to discover at least one rule; and using the at
least one discovered rule for user targeted applications.
18. The computer program product, as claimed in claim 17, wherein
applying association rule mining on the clustered data to discover
at least one rule comprises of finding relationships between
features of users in a cluster features of users who were
previously converted by historical campaigns and features of
previous campaigns themselves; mining underlying rules in the
clustered data; and discovering defining attributes of each
campaign, relationship of attributes of each campaign, other
attributes of the campaign and previously converted users.
19. The computer program product, as claimed in claim 11, wherein
the method further comprises of detecting unusual events based on
the raw data.
20. The computer program product, as claimed in claim 11, wherein
the raw data is at least one of numerical multinomial data; and an
array having n-dimensions, where the raw data comprises of
continuous individual features.
Description
[0001] The present application is based on, and claims priority
from, IN Application Number 2581/CHE/2013, filed on 13 Jun. 2013,
the disclosure of which is hereby incorporated by reference
herein.
TECHNICAL FIELD
[0002] Embodiments herein relate to the field of predictive
analytics and more particularly relates to a method and system for
detection, classification and prediction of behaviour trends using
correspondence analysis.
BACKGROUND
[0003] In competitive business environments, companies frequently
desire to forecast events that influence business metrics and
performance indicators. Indeed, such ability is often important for
effective decision making. Information obtained from accurate event
forecast, results in more efficient operations and cost savings for
the business. For example, the business that forecasts particular
requirements in the near future can make profitable adjustments to
its business practices based on this information. As another
example, if the business can accurately predict potential failures
or inefficiencies in the business process, then requirements can be
analyzed to mitigate such failures.
[0004] By recognizing future trends, companies can potentially
increase efficiency and gain competitive advantage. Accurate
recognition of such trends also results in significant cost savings
and improved business processes.
[0005] In certain business applications, there are many situations
where the behavior of users should be predicted and analyzed for
taking actions according to the behavioral trends. Further, the
events generated by the users are sources of precious information
about their behavior, interactions, preferences as well as temporal
changes in their behavior and preferences. In the current scenario,
the marketers are not able to take the advantage of the data
related to the user that is available in large amounts. This
prevents the service providers or marketers from providing accurate
service personalization, customized personal offers and others
based on the user behavior trends. In case of large data sets, it
would be complex and expensive to predict behavior of each and
every user at an individual level
[0006] The existing methods of trend recognition and predictions
based on numerical time series data are based on individual users,
where each user is treated as an independent entity. The
representation as well as grouping of millions of users (for
example users in a telecommunications network) based on such
time-series data is an expensive option in terms of space and time
complexity. The existing system lacks the mechanism for a
low-dimension representation of the time series for global trending
pattern of a data set.
BRIEF DESCRIPTION OF THE FIGURES
[0007] Embodiments herein are illustrated in the accompanying
drawings, throughout which like reference letters indicate
corresponding parts in the various figures. The embodiments herein
will be better understood from the following description with
reference to the drawings, in which:
[0008] FIG. 1 illustrates an overview for detection and
classification of user behavior trends using correspondence
analysis, according to the embodiments as disclosed herein;
[0009] FIG. 2 illustrates a flow diagram explaining the various
steps involved in predicting the user behavior trends using the
correspondence analysis, according to the embodiments as disclosed
herein;
[0010] FIG. 3 depicts the process of reducing the dimensions of
data, according to embodiments as disclosed herein;
[0011] FIG. 4 depicts the process of clustering, according to
embodiments as disclosed herein;
[0012] FIG. 5 is a flowchart illustrating the process of optimizing
campaigns and performing product bundling for a user based on
clusters, according to embodiments as disclosed herein;
[0013] FIG. 6 is a graph showing the representation of users in a
low dimensional feature space, according to the embodiments as
disclosed herein;
[0014] FIG. 7 is a graph showing the grouping of users having
similar trends over certain time period, according to the
embodiments as disclosed herein; and
[0015] FIG. 8 illustrates a computing environment implementing the
method and system for detection and classification of user behavior
trends using correspondence analysis, according to the embodiments
as disclosed herein.
DETAILED DESCRIPTION OF EMBODIMENTS
[0016] The embodiments herein and the various features and
advantageous details thereof are explained more fully with
reference to the non-limiting embodiments that are illustrated in
the accompanying drawings and detailed in the following
description. Descriptions of well-known components and processing
techniques are omitted so as to not unnecessarily obscure the
embodiments herein. The examples used herein are intended merely to
facilitate an understanding of ways in which the embodiments herein
can be practiced and to further enable those of skill in the art to
practice the embodiments herein. Accordingly, the examples should
not be construed as limiting the scope of the embodiments
herein.
[0017] Provided herein is a scalable mechanism for grouping the
users based on similar trends in n-dimensional space using
correspondence analysis. The method provides a framework for
clustering or grouping the users, representing their trends in an
n-dimensional space, using correspondence analysis. The method
reduces the n-dimensional feature space to a lower dimensional
space for easy processing, better interpretation and for generating
superior quality clusters, Further, the method applies the
correspondence analysis so that each user is assigned with a new
coordinate in the lower dimension which maintains a similarity,
difference and the relationship between the variables.
[0018] Once the correspondence analysis is done, clustering or
grouping of the coordinates based on the similar trends of the
users is performed. Further, unlabeled cluster members are assigned
class membership proportional to the labeled samples in the
cluster. Finally, the method predicts the future actions of the
users based on the past trends that are observed from the labeled
clusters. Completely unlabeled clusters may be inspected by an
administrator for the purpose of manual analysis, labeling and
mapping to predicted trends and actions.
[0019] The embodiments herein achieve a method and system that
provides a scalable mechanism for grouping the users based on
similar trends in n-dimensional space using correspondence
analysis.
[0020] Further, the method and system is applicable in the context
of any user transaction based system (for example in a telecom
network, banking system and so on). The method provides a framework
for clustering or grouping the users representing similar trends in
the n-dimensional space using correspondence analysis.
[0021] The correspondence analysis is used to recognize the trends
or nature of the users on the basis of their numerical attributes
as well as temporal variation of such attributes.
[0022] The method and system disclosed herein reduces the
n-dimensional feature space to a lower dimensional space for easy
processing and interpretation, without losing the trend information
of each user, using correspondence analysis. Further, each user is
assigned with a new coordinate in the lower dimension which
maintains a similarity, difference and the relationship between the
variables, as they existed in the higher dimensional space.
[0023] Once the correspondence analysis is done, clustering or
grouping of the coordinates based on the similar trends of the
users is performed.
[0024] Further, unlabeled cluster members are assigned class
membership proportional to the labeled samples in the cluster.
Finally, the method predicts the future actions of the users based
on the past trends that are observed from the labeled clusters.
[0025] The principal object of the embodiments herein is to provide
a scalable method and system for detection, classification and
prediction of behaviour trends using correspondence analysis.
[0026] Another object of the embodiments herein is to provide a
scalable method and system for effectively reducing the dimensional
space using correspondence analysis on numerical multinomial data
for reduction of complexity in cluster analysis and to improve
quality of emerging clusters, along with superior prediction
accuracies.
[0027] Referring now to the drawings and more particularly to FIGS.
1 through 5 where similar reference characters denote corresponding
features consistently throughout the figures, there are shown
preferred embodiments.
[0028] FIG. 1 illustrates an overview for detection and
classification of user behavior trends using correspondence
analysis, according to the embodiments as disclosed herein. As
depicted in the FIG. 1, consider a group of users performing
transactions with a system (for example a telecom network, banking
system and so on). The transactions of the users are recorded in
the network 101. Further, the network 101 maintains the raw
transaction logs of all the users in a file server 102.
[0029] In an embodiment, the file server 102 comprises the files
that have all the transactional details of the users.
[0030] The raw transactional logs that are present in the file
server 102 are uploaded by scheduled data upload jobs orchestrated
by cluster master 103 into a distributed file system 105.
[0031] Jobs orchestrated by cluster master 103 perform clustering
of the users based on their transactions that are having similar
trends, in a distributed fashion, over the worker nodes 104.
[0032] In an embodiment, the raw transactional logs having the
n-dimensional feature space is reduced to a lower dimensional space
for easy processing and interpretation, without losing the trend
information of each user.
[0033] In an embodiment, correspondence analysis is used for trend
recognition and dimensionality reduction of the raw transactional
data of the users.
[0034] Typically, the correspondence analysis is a descriptive
technique that is designed to analyze simple two-way and multi-way
tables containing some measure of correspondence between rows and
columns.
[0035] The correspondence analysis is used to recognize the trend
of users on the basis of temporal variations of their numerical
attributes. Each column of the correspondence table represents a
numerical attribute and all the columns will be the observations of
the same variable over time at different time instances.
[0036] In an embodiment, the cluster master 103 maintains the
uploaded files over the worker nodes 104 and distributed file
system 105 (any distributed file system or memory). The raw
transactional data logs are distributed across multiple machines
and the correspondence analysis is applied on the data.
[0037] Once the correspondence analysis is completed, clustering or
grouping of the coordinates based on the similar trends of the
users is performed. Further, unlabeled cluster members are assigned
class membership proportional to the labeled samples in the
cluster. Finally, the method predicts the future actions of the
users based on the past trends that are observed from the labeled
clusters.
[0038] The cluster master 103 further applies association rule
mining on the clusters discovered in a lower dimensional space. The
cluster master 103 further uses the discovered rules for user
targeted applications, such as optimizing advertising campaigns,
performing product bundling, pricing and so on.
[0039] The cluster master 103 may be a standalone device. The
cluster master 103 may comprise of a plurality of devices,
implemented using distributed architecture. The cluster master 103
may be implemented on the cloud.
[0040] FIG. 2 illustrates a flow diagram explaining the various
steps involved in predicting the user behavior trends using the
correspondence analysis, according to the embodiments as disclosed
herein. As depicted in the flow diagram 200, initially the method
obtains (201) the raw data from the network that corresponds to a
particular domain. For example, the domain may include but is not
limited to a telecommunications network or a banking system. In a
telecom domain all the transactions are recorded and stored in a
network and in a banking system, all the transactions of the users
are stored in a bank server.
[0041] The data format which is used herein as an example is U=1,
2, 3 . . . u subjects, for each subject numerical value of the
attribute at each time instance T=1, 2, 3 . . . t is measured, so
in table format it will look like
TABLE-US-00001 T1 T2 T3 T4 . . . Tt User 1 X11 X12 X13 X14 . . .
X1t User 2 X21 X22 X23 X24 . . . X2t . . . . . . . . . . . . . . .
. . . . . . User u Xu1 Xu2 Xu3 Xu4 . . . X ut
[0042] Here Xij can be value of any numerical attribute observed at
different time instances. Data in this case is of u*t dimension or
each subject is measured in t-dimensional space.
[0043] The transactional data of the users can either be obtained
from a network designed for storing such data (for example, the
telecom network or the bank server). Once the transactional data
(raw data) is obtained, the method performs (202) pre-processing
and feature selection on the raw data. In an embodiment, the
preprocessing and feature selection on the raw data comprises
determining the attributes of the users. One such attribute of the
user can be minutes of usage (may be usage of a network in
telecommunications domain).
[0044] Further, the method obtains (203) trend data from the raw
transactional logs. The trend data includes the values that changes
over time.
[0045] Further, the method reduces (204) the dimensionality of the
data format of the raw data (which is a multinomial data,
n-dimensional), when the feature selection and trend data are
obtained from the raw transactional logs using correspondence
analysis (301, 302) (as depicted in FIG. 3). For data with low
dimensionality, the new coordinates will be such that those users
who are following similar trend in multidimensional time series
domain will become closer to each other than that those who are
dissimilar. In an example, consider users were in t-dimensional
space, if the data can be mapped from t to 2 or 3-dimensional space
without losing much information about the trend of the subscribers,
then it will be easily interpretable and analyzable and efficiently
represented in comparison to the data in t-dimensions.
[0046] Correspondence analysis is an exploratory data analysis
technique for contingency tables and multivariate or multinomial
data. Correspondence analysis also emphasizes on the graphical
representation of the result in lower dimension for its easy
interpretation, maintaining the similarity or dissimilarity between
the rows and the column of the table. Embodiments herein apply
correspondence analysis in applications where the trend of high
dimensional user data with numerical multidimensional attributes of
time series domain is required. Correspondence analysis is used to
determine similarities and differences among the trends of users
with respect to their behavior over time and depicting the same
graphically in a joint low-dimensional space. Correspondence
analysis assigns each user a co-ordinate in the lower dimension
maintaining the similarity, difference and the relationship between
the variables in rows and columns of the table, which means those
rows which are similar in their trend will be close to each other
in the new low dimensional space and those which are dissimilar
will be some far apart. Correspondence analysis is based on the
Eigen value of a matrix, so it can be used for dimension reduction
similar to principal component analysis, which enables an easier
interpretation of results. The similarity between users in the new
low dimensional space can be graphically visualized.
[0047] In an embodiment, the correspondence analysis is used to
recognize the trend of users (subscribers) on the basis of their
numerical attributes. Once the correspondence analysis is applied,
the correspondence table is generated. Each column of the
correspondence table represents a numerical attribute and all the
columns will be the observations of the same variable over time at
different time instances.
[0048] In an embodiment, the method obtains the number of target
dimensions (for example, it can be 2-dimensional or 3-dimensional
based on the requirement) as an input for reducing the
dimensionality of transactional data of the users.
[0049] Once the dimensionality of the data is reduced using
correspondence analysis, the method performs (205) the clustering
of the users attributes based on parameters to obtain unlabeled
clusters based on trend similarity. Clustering of the users is
performed to group the users having similar trends. In an
embodiment, the method obtains clustering parameters for performing
clustering of the users based on parameters. In an embodiment,
standard clustering techniques such as DBSCAN (for density based
clusters) and k-means clustering algorithm can be used for grouping
the similar trends of the users in the lower dimension such that
the users with similar trends will be grouped in the same
cluster.
[0050] Embodiments disclosed herein first apply DBSCAN clustering
to obtain (401) density based clusters. DBCCAN considers users
whose trend differs from the majority of the users as noise because
of their lesser density. To avoid loss of this data, the noise is
further clustered (402) using k-means clustering algorithm, before
the final clusters are obtained (403) (as depicted in FIG. 4).
[0051] The clusters formed in lower dimension retain the properties
(similarities, differences and relationships) which were there in
the n dimensional space.
[0052] Based on the trend similarity with the labeled samples, the
clusters are assigned (206) labels based on label information of
users according to the actions taken by them previously (historical
data). The clusters may be further divided into classes based on at
least one other feature and each user in the cluster may be
assigned to be a member of at least one class. The users may be
then assigned a confidence level for each predicted action, based
on the class to which they belong.
[0053] Further, the method predicts (207) the future actions of the
users based on the trends of attributes that are observed in the
case of labeled samples. In an embodiment, the prediction step
forecasts the future actions of users, based on the past trends of
attribute values that are observed in the case of labeled samples.
The prediction may be in the form of rules consisting of predicates
and relationships among them along with augmented statistics such
as confidence measures, indicating a degree of algorithmic
confidence on each rule. For example, if there is a churn file that
lists the users who are churned, and could make use of the trends
exhibited by these users prior to churning to label other users who
exhibit similar trends as potential churn candidates. Further,
there can be multiple labeled lists corresponding to user actions
that are observed in the past (for example churning, postpaid to
prepaid switching and so on). In each of the unlabeled clusters
that emerge, the number of labeled users can be identified from a
particular list being present. Having more users from a labeled
list (representing a class) in a cluster is a strong indication
that the cluster likely represents the group of users who could
potentially exhibit the same behavior. The various actions in flow
diagram 200 may be performed in the order presented, in a different
order or simultaneously. Further, in some embodiments, some actions
listed in FIG. 2 may be omitted.
[0054] FIG. 5 is a flowchart illustrating the process of optimizing
campaigns and performing product bundling for a user based on
clusters, according to embodiments as disclosed herein. After the
formation of clusters in lower dimension space, association rule
mining can be applied on each of these clusters, and thereby
automatically use the discovered rules for user targeted
applications, such as optimizing advertising campaigns, performing
product bundling, pricing and so on. Association rule mining is a
method for discovering interesting relations between variables in
large databases. It finds complete association from all the items
to the others given historical purchase data (market-basket
analysis). E.g. if most people who buy bread and milk also tend to
buy butter, a rule milk, bread->butter [support=5%,
confidence=100%] may be discovered. Support of an item set is the
fraction of all purchases in which that item set appears (e.g. if
there are 100 purchases (each purchase may contain multiple items
such as bread, jam, butter, oil, juice, milk etc.), if 20 of the
purchases had bread as well as butter, then support of
bread->butter is 20%). Confidence is the fraction of purchases
in which two items appear together to the total number of purchases
for the 2nd item (e.g. confidence of bread->butter will be 1.0
if out of all purchases of bread, butter is also purchased together
with it).
[0055] After the formation of clusters in lower dimension space
using other features of users within the clusters and the campaigns
which historically were sent to them, the relationships between
features of users within the cluster (wherein examples of the
features may be the ARPU of the user, the number of SMSs sent by
the user, the number of international calls made by the user and so
on) and features of user who were previously converted by
previously run campaigns are found (501) and the underlying hidden
rules in the relationships are mined (502) using association rule
mining. The rules obtained are combined (503) to suggest user
targeted applications, such as optimizing advertising campaigns,
performing product bundling, pricing and so on.
[0056] In an example, each attribute of users is discretized into
bins (e.g ARPU (Average Revenue Per User) can be high, medium and
low). Now each conversion for each campaign can be treated like a
"purchase". So corresponding to each campaign, top association
rules are mined. Now, within each cluster, conversion information
corresponding to several campaigns can be obtained. Now the
discovered rules can be ranked based on how many times they occur
within the cluster and then top ranking rules would be combined to
generate new rules which can be the basis for designing a new
campaign or optimizing an existing campaign.
[0057] The various actions in flow diagram 500 may be performed in
the order presented, in a different order or simultaneously.
Further, in some embodiments, some actions listed in FIG. 5 may be
omitted.
[0058] FIG. 6 is a graph showing the representation of users in a
low dimensional feature space, according to the embodiments as
disclosed herein. The graph shown in the figure depicts a two
dimensional feature space with X and Y axes. The graph is obtained
by reducing the n-dimensional feature space and applying the
correspondence analysis on the numerical time series data.
[0059] Considering a sample of ten users in a telecom network as an
example. The transactional data of all the ten users are recorded
in the telecom network. The transactional data (raw data) of all
the users is represented using U=1, 2, 3 . . . u users, for each
user, numerical value of the attribute at each time instance T=1,
2, 3 . . . t is measured. This model of representing each user's
numerical value of the attribute at each time instances forms a
multinomial data or an array having n-dimensions (for example
u.times.n).
[0060] The first step involved in classification and detection of
user behavior trends using correspondence analysis is the reduction
of n-dimensional space to lower dimensional space.
[0061] The dimensionality reduction of the multinomial data is
performed for easy processing and interpretation of data without
losing trend information of each user. The multinomial data can be
reduced to lower dimension (for example 2-dimensional or
3-dimensional based on the requirement). In the lower dimensional
feature space (2-dimensional as in the graph), the new coordinates
(as shown in the graph) will be such that those users who are
following similar trend in the multidimensional time series domain
will become closer to each other than those who are dissimilar as
shown in the graph.
[0062] FIG. 7 is a graph showing the grouping of users having
similar trends over certain time period, according to the
embodiments as disclosed herein. Once the multinomial data is
reduced to a lower dimension (2-dimensional) as described in FIG.
3, the users of the telecom network can be grouped or clustered as
shown in the graph. Clustering or grouping of the coordinates is
performed based on the similar trends of the users. These groups or
clusters contain the users who are similar in their trends over
certain time period. These clusters are used for group based
prediction or further analysis on the group.
[0063] Further, unlabeled cluster members are assigned class
membership proportional to the labeled samples in the cluster.
Finally, the method predicts the future actions of the users based
on the past trends that are observed from the labeled clusters.
[0064] From the transactional data or historical data of the users
in the telecom network, the actions performed by the users
following a similar trend can be predicted. This information is
used for predicting the actions of new users of similar trend.
[0065] Consider a group of 10 users (as depicted in table 1 below,
which depicts the ARPU for each user) having a similar trend.
TABLE-US-00002 TABLE 1 ARPU Month 1 Month 2 Month 3 Month 4 User 1
473.05 740 439 0 User 2 247 100 99 0 User 3 372 508 282 0 User 4 80
105.1 55 30 User 5 235 334 50 120.17 User 6 409 309 9 500 User 7
73.01 75.05 0 144.01 User 8 105 176 129 509 User 9 65 0 0 10 User
10 200 0 0 50
[0066] After applying correspondence analysis of this type of
numerical time series data, it maps it to two-dimensional feature
space by assigning new coordinates to the users such that those
which are following similar trend will be close to each other in
this new space as indicated in table 2.
TABLE-US-00003 TABLE 2 ARPU Dimension 1 Dimension 2 User 1 0.715
1.706 User 2 0.14 1.975 User 3 0.785 1.2 User 4 -0.854 -0.967 User
5 -1.047 -0.134 User 6 -1.488 -0.656 User 7 -1.266 -0.083 User 8
-1.705 -0.622 User 9 2.051 -1.46 User 10 2.6 -0.9
[0067] In an example, consider that various brands based on their
historical stock prices can be expressed as a time series (change
in price over time). Assuming that it is required to identify
brands which are similar in terms of their stock value trends over
a period of time. The dimensionality of the historical stock prices
is reduced from a multi-dimensional time series data into a 2D
space followed by clustering. This will result in clusters of
similar brands (for example, brands like Yahoo and Amazon may fall
in one cluster and so on). Once grouping is done, timeseries models
can be learned at the cluster level (e.g. ARMA models) to make
predictions of future stock values.
[0068] Embodiments disclosed herein may be used for video
segmentation, as depicted in the following example. Consider that
an unsupervised segmentation of objects/users in a video needs to
be done based on their similarity of their motion, which may be for
safety management of large gathering (big crowd) in a public area,
to get moving areas in a scene for efficient video compression, to
detect unusual events, video surveillance or to analyze video for
further for specific purposes.
[0069] For finding trend of object movement in a video, use
magnitude of pixel movements over frames as an attribute of the
trend recognition. Optical flow is used to get the subsequent
position of the pixels from frame to frame. If a pixel was at (u1,
v1) position in one frame and it move to (u2, v2) position in the
next frame then the magnitude of its movement is calculated as the
Euclidian distance between them. Once the optical flow is obtained,
the magnitude of pixels displacement is calculated consecutively
over all n frames, which results the trend of pixels in time series
over (n-1) dimensions. Correspondence analysis will map the pixels
movement data from n-1 dimension to 2-dimension such that pixels
which belong to the similar object movement will be close to each
other than those have dissimilar motion. On clustering the pixels,
all pixels within each cluster will be representing similar
trend.
[0070] Often gatherings involve movement of crowds in confined
spaces such as city streets, overhead bridges, or narrow
passageways. Because of the small space and big crowd, there can be
many catastrophic events. If the usual motion at these places can
be known apriori, then it is possible to predict locations of
possible stampedes and hence do better safety management in those
areas.
[0071] Detection of unusual events may be performed if areas where
objects motion is not regular or deviations from normal behavior
are detected.
[0072] FIG. 8 illustrates a computing environment implementing the
method and system for detection and classification of user behavior
trends using correspondence analysis, according to the embodiments
as disclosed herein. The compute environment may consist of
plurality such units, forming a distributed cluster, over which the
algorithms are executed in a scalable fashion. As depicted the
computing environment 801 comprises at least one processing unit
804 that is equipped with a control unit 802 and an Arithmetic
Logic Unit (ALU) 803, a memory 805, a storage unit 806, plurality
of networking devices 808 and a plurality Input output (I/O)
devices 807. The processing unit 804 is responsible for processing
the instructions of the algorithm. The processing unit 804 receives
commands from the control unit in order to perform its processing.
Further, any logical and arithmetic operations involved in the
execution of the instructions are computed with the help of the ALU
803.
[0073] The overall computing environment 801 can be composed of
multiple homogeneous and/or heterogeneous cores, multiple CPUs of
different kinds, special media and other accelerators. The
processing unit 804 is responsible for processing the instructions
of the algorithm. Further, the plurality of processing units 804
may be located on a single chip or over multiple chips. Further a
plurality of nodes such as 801 may be interconnected over a network
to form a distributed computing environment, where the method
described gets executed in a distributed fashion.
[0074] The algorithm comprising of instructions and codes required
for the implementation are stored in either the memory unit 805 or
the storage 806 or both. At the time of execution, the instructions
may be fetched from the corresponding memory 805 and/or storage
806, and executed by the processing unit 804.
[0075] In case of any hardware implementations various networking
devices 808 or external I/O devices 807 may be connected to the
computing environment to support the implementation through the
networking unit and the I/O device unit.
[0076] Embodiments disclosed herein enable compression of large
amounts of temporal data related to users to smaller and more
manageable amounts of data, hereby reducing the time required for
processing the data and complexity of the system required for
computing.
[0077] Embodiments disclosed herein enable detection of unusual
events based on the raw data. The unusual event may be a behaviour
of a user which does not match his history and/or the cluster of
users to which he belongs. For example, the unusual event may be a
user of a telecommunication network sending a large number of SMSs
within a short period of time, when he previously used to send only
a few SMSs.
[0078] Embodiments disclosed herein account for temporal changes in
user behaviour.
[0079] The embodiments disclosed herein can be implemented through
at least one software program running on at least one hardware
device and performing network management functions to control the
elements. The elements shown in FIGS. 1 and 5 include blocks which
can be at least one of a hardware device, or a combination of
hardware device and software module.
[0080] The foregoing description of the specific embodiments will
so fully reveal the general nature of the embodiments herein that
others can, by applying current knowledge, readily modify and/or
adapt for various applications such specific embodiments without
departing from the generic concept, and, therefore, such
adaptations and modifications should and are intended to be
comprehended within the meaning and range of equivalents of the
disclosed embodiments. It is to be understood that the phraseology
or terminology employed herein is for the purpose of description
and not of limitation. Therefore, while the embodiments herein have
been described in terms of preferred embodiments, those skilled in
the art will recognize that the embodiments herein can be practiced
with modification within the spirit and scope of the embodiments as
described herein.
* * * * *