Method and system for monitoring online media and dynamically charting the results to facilitate human pattern detection Janssens; Frizo ; et al. [Janssens; Frizo]

Method and system for monitoring online media and dynamically charting the results to facilitate human pattern detection

Janssens; Frizo ; et al.

Patent Application Summary

U.S. patent application number 12/639022 was filed with the patent office on 2010-12-30 for method and system for monitoring online media and dynamically charting the results to facilitate human pattern detection. Invention is credited to Frizo Janssens, Per Siljubergsasen.

Application Number	20100332465 12/639022
Document ID	/
Family ID	42153810
Filed Date	2010-12-30

United States Patent Application	20100332465
Kind Code	A1
Janssens; Frizo ; et al.	December 30, 2010

Method and system for monitoring online media and dynamically charting the results to facilitate human pattern detection

Abstract

A time frame is specified. A search engine is queried for concepts within the time frame. The similarity and distances between concepts is calculated, and the graph coordinates of the concepts are computed. The search engine is queried for more time frames, and similarity, distances, and coordinates calculated for the concepts for each time frame. Consecutive time frames are mapped onto each other. A dynamic chart of the relationships between the concepts and how they evolve over the time frames is generated.

Inventors:	Janssens; Frizo; (Mortsel, BE) ; Siljubergsasen; Per; (Brussels, BE)
Correspondence Address:	ELLIOT FURMAN 15 WEST 81ST STREET #11J NEW YORK NY 10024 US
Family ID:	42153810
Appl. No.:	12/639022
Filed:	December 16, 2009

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
61138073	Dec 16, 2008
61175757	May 5, 2009

Current U.S. Class:	707/722 ; 707/E17.014
Current CPC Class:	G06F 16/338 20190101; G06Q 30/02 20130101
Class at Publication:	707/722 ; 707/E17.014
International Class:	G06F 17/30 20060101 G06F017/30

Claims

1. A method for monitoring online media and charting the results to facilitate human pattern detection comprising: (a) specifying a time frame; (b) querying a search engine for concepts within the time frame; (c) calculating similarity and distances between the concepts, wherein the calculating comprises computing a distance matrix; (d) computing graph coordinates of the concepts from at least part of the matrix in (c); (e) repeating (b), (c) and (d) for at least one more time frame; (f) mapping consecutive time frames onto each other; and (g) generating a dynamic chart of the relationships between the concepts and how they evolve over the time frames.

2. The method of claim 1 wherein the step of specifying further comprises specifying a region.

3. The method of claim 1 wherein the step of specifying further comprises specifying a language.

4. The method of claim 1 wherein the step of specifying further comprises specifying a data source.

5. The method of claim 1 wherein the step of querying comprises querying a search engine for concepts and pair-wise combinations of concepts.

6. The method of claim 1 wherein computing a distance matrix in (c) comprises computing a square symmetric co-reference matrix with co-reference numbers between all possible pairs of concepts.

7. The method of claim 1 wherein computing a distance matrix in (c) comprises computing a co-reference matrix with co-reference numbers between at least one of possible pairs of concepts, wherein the possible pairs comprise entities-topics, topics-topics, and entities-entities.

8. The method of claim 1 wherein the distance matrix is at least one of asymmetric, and not square.

9. The method of claim 1 wherein the distance matrix is at least one of symmetric, and square.

10. The method of claims 1 wherein the query in (b) returns a number of articles or documents and the step of computing in (c) comprises computing buzz numbers and co-reference numbers from the number of articles or documents.

11. The method of claim 1 wherein the computing in (d) comprises computing using one of: a multidimensional scaling algorithm, a centric multidimensional scaling algorithm, a principal component analysis algorithm, and a correspondence analysis algorithm.

12. The method of claim 1 wherein the mapping in (f) comprises mapping using a procrustes procedure.

13. The method of claim 1 wherein the mapping in (f) comprises computing at least one of the following transformations: a rotation, a reflection, a dilation, and a sign change.

14. The method of claim 1 wherein the concepts include at least one of: an entity, and a topic.

15. A computer program product comprising a computer readable medium including a computer readable program, wherein the computer readable program when executed on a computer causes the computer to: (a) query a search engine for concepts within a time frame; (b) calculate similarity and distances between the concepts, wherein the calculating comprises computing a distance matrix; (c) compute graph coordinates of the concepts from at least part of the matrix in (b); (e) repeat (a), (b) and (c) for at least one more time frame; (d) map consecutive time frames onto each other; and (e) generate a dynamic chart of the relationships between the concepts and how they evolve over the time frames.

16. The computer program product of claim 15 wherein at least some of the computer readable program is executed on a server.

17. The computer program product of claim 15 wherein at least some of the computer readable program is executed on a client computer.

18. A system for monitoring online media and charting the results to facilitate human pattern detection comprising: (a) means for specifying a time frame; (b) means for querying a search engine for concepts within the time frame; (c) means for calculating similarity and distances between the concepts, wherein the means for calculating comprises means for computing a distance matrix; (d) means for computing graph coordinates of the concepts from at least part of the matrix in (c); (e) means for repeating (b), (c) and (d) for at least one more time frame; (f) means for mapping consecutive time frames onto each other; and (g) means for generating a dynamic chart of the relationships between the concepts and how they evolve over the time frames.

Description

[0001] This application claims the benefit of U.S. Provisional Application No. 61/138,073, filed Dec. 16, 2008, and U.S. Provisional Application No. 61/175,757, filed May 5, 2009, both of which are hereby incorporated by reference.

BACKGROUND

[0002] Companies like Twitter and Facebook and other social media such as blogs, microblogs, forums, commenting systems, video sites, and the like offer a huge opportunity for professionals such as marketers, advertisers, and public relations specialists to better understand how their products, brands, and topics are perceived by the public, and how they can better position their products, brands, topics based on the public perception.

[0003] Professionals might want to know brands and topics that are discussed online together, as well as their evolution, and to identify why certain brands and topics are related. This is important since brand value and future sales may be strongly impacted by customers' and consumers' perceptions. Is the perception of a brand in-line with the brand owner's goal? What do consumers see as competing, alternative products?

[0004] Market research companies have traditionally relied on manual collation of this type of information via focus groups and consumer sampling. Social media, however, offers the dream of obtaining this information in a more timely and automatic manner. But, there is a never-ending and constantly changing supply of "conversational" social media data, making it is extremely difficult, if not impossible, for professionals to accurately assess, in a timely manner, which conversations are of value, how they are interrelated, and how they relate to the professionals' product, brand, or topic.

[0005] Thus, a need presently exists for a method and system for monitoring online media and dynamically charting the results to facilitate human pattern detection.

SUMMARY

[0006] A method for monitoring online media and charting the results to facilitate human pattern detection comprises specifying a time frame. A search engine is queried for concepts within the time frame. Similarity and distances between the concepts is calculated. In calculating the similarity and distances, a distance matrix is calculated. Graph coordinates of the concepts are computed from at least part of the distance matrix. The querying, calculating the similarity and distances, and computing graph coordinates is repeated for at least one more time frame. Consecutive time frames are mapped onto each other. A dynamic chart of the relationships between the concepts and how they evolve over the time frames is generated. A computer program product comprises a computer readable medium including a computer readable program, wherein the computer readable program when executed on a computer causes the computer to carry out the method for monitoring online media and charting the results to facilitate human pattern detection.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007] FIG. 1 shows the data, algorithm, and visualization layers of a system for monitoring online media and charting the results to facilitate human pattern detection.

[0008] FIG. 2 illustrates a symmetric co-reference matrix with buzz, restricted buzz and (restricted) co-reference numbers for calculating the similarity and distances between concepts.

[0009] FIG. 3 shows an input for a multidimensional scaling algorithm for calculating the graph coordinates of concepts.

[0010] FIG. 4 shows an input for a principal component analysis algorithm for calculating the graph coordinates of concepts.

[0011] FIG. 5 shows an exemplary output of a multidimensional scaling algorithm, principal component analysis algorithm, and a correspondence analysis algorithm.

[0012] FIG. 6 is a mock-up of a Brand Map chart.

[0013] FIG. 7 is a screenshot of an exemplary Brand Map charts.

[0014] FIG. 8 shows an exemplary architecture of the system of FIG. 1.

[0015] FIG. 9 shows a method for monitoring online media and charting the results to facilitate human pattern detection.

DETAILED DESCRIPTION

[0016] I. Introduction

[0017] Brand Maps (BMs) measure and visualize the evolution of perceived associations or relatedness between (possibly multiple types of) concepts (e.g., "entities" and "topics" will be used throughout this document). Entities can be brands, products, organizations, people, etc, while topics can be events, features, etc. Entities/topics can be either predefined or automatically detected. The result is a temporal visualization of large amounts of data and high-dimensional distances based on large-scale data sets, facilitating human pattern detection. BMs can be generated for any type of digital data having a temporal aspect (timestamps): blogs, forums, news, data sets with scientific articles, patent data sets, corporate data sets, etc.

[0018] Part of the commercial value of BMs lies in the possibility for users to identify brands and topics that are discussed online together, as well as their evolution, and to identify why certain brands and topics are related. This is important since brand value and future sales are strongly impacted by customers' and their perceptions. Is the perception of a brand in line with brand owners' goals? What do consumers see as competing/alternative products?

[0019] Feedback from BMs provides a basis for improving and adjusting marketing campaigns, to maintain brand reputation, discover new insights and emerging trends, conversational/word-of-mouth marketing, and the like.

[0020] II. Terminology

[0021] Concept: anything that can be described by a query (for example, comprising keywords and Boolean operators) that can be executed in a search engine. Multiple types/categories of concepts are possible. Throughout this document two categories "entities" and "topics" will be used

[0022] Example entity: ("Barack Obama" OR (obama AND (president OR senator)))

[0023] Example topic: (iraq OR iraqi OR escalation OR (("middle east" OR este) AND (crisis OR guerra OR war)))

[0024] Scope: a clause that is conjunctively added to every concept's query to include or exclude certain contexts.

[0025] "Buzz" of a concept: Aggregate number of online articles collected containing pre-selected terms related to the concept. It is the total number of documents that are returned in the search result satisfying the concept's query.

[0026] Article or document: unit of buzz. An individual sentence or post, usually a writing sample, e.g. a blog entry, a forum post, or a news article.

[0027] "Restricted buzz" of a concept: the buzz of a concept that is restricted to also co-occur with any concept of another category. Currently only used for "topic" concepts. For example, the restricted buzz of a topic is the number of documents in the collection that satisfy the conjunctive query consisting of the topic's query AND a disjunction of all entity queries. It will return the number of documents that contain the topic concept and at least one of the entity concepts.

[0028] Number of co-references: Co-reference numbers count the number of documents in a certain collection that refer to each concept or a certain pair of concepts. The concepts are said to "co-occur" in those documents. In practice, the number of co-references of two concepts can be the number of documents that are returned by a search engine in response to a conjunction of the queries of both concepts.

[0029] Restricted number of co-references: Number of times that a pair of concepts both co-occur with at least one concept of another category.

[0030] Co-reference matrix: a matrix containing the co-reference numbers c.sub.ij, i.e., the number of documents in which concepts i and j co-occur.

[0031] III. Overview of the BM System

[0032] FIG. 1 shows the data, algorithm, and visualization layers of the system. FIG. 8 shows an exemplary architecture for the system of FIG. 1. The architecture includes a server 82 connected to a network 80, such as the internet. At least one client 84 is connected to the network 80 and in communication with the server 82. A plurality of data sources 86 are also in communication with network. FIG. 9 shows a method for monitoring online media and charting the results to facilitate human pattern detection.

[0033] Briefly, server 82, which functions in part as a search engine, searches one or more of the plurality of data sources 86 for concepts within a time frame (steps 92 and 94 of FIG. 9). Calculations are performed on the results of the search to determine the similarity and distances between the concepts (96 of FIG. 9), and to compute graph coordinates of the concepts (98 of FIG. 9). The search engine 82 is queried again for additional concepts in different time frames (104 of FIG. 9). Then, consecutive time frames are mapped onto each other in order to ensure stability of a dynamic chart (100 of FIG. 9). Finally, a dynamic chart (for example, FIG. 7) is generated which displays the relationship between brands and topics and conversation online (102 of FIG. 9).

[0034] The chart is displayed at client computer 84. This chart provides a view of a topic's or brand's online conversational universe and makes it possible to identify brands and topics that are discussed online together, as well as their evolution, and to identify why certain brands and topics are related (also see "Attentio Brand Maps," Frizo Janssens, Proceedings of the Third International ICWSM Conference (2009), which is hereby incorporated by reference).

[0035] Computations may be initiated by the client 84 instead of being pre-calculated by the server 82, allowing flexible sub-selections of computational options made by the client. For server-side computations, a buffering system could be used to incrementally load the data.

[0036] Client 84 may comprise any type of computer, including mobile devices such as cell phones, smart phones, PDAs, portable computers, and any other type of mobile device operable to transmit and receive electronic messages. The network 80 may include the Internet and wireless networks such as a mobile phone network. Computers 82 and 84 may be one or more computers and may comprise any type of computer capable of storing computer executable code and executing the computer executable code on a microprocessor, and communicating with the communication network 80.

[0037] The disclosed systems and methods, and modification thereof may be implemented on any conventional computer using any array of widely available and well understood software platforms, programs, and programming languages. For example the systems and methods may be implemented on an Intel or Intel compatible based computer running a version of the Linux operating system or running a version of Microsoft Windows. The computers may include any and all components of a computer such as storage like memory and magnetic storage, interfaces like network interfaces, and microprocessors. Programs, programming languages, APIs, and the like may be used such as Java, Java Database Connectivity (JDBC), Adobe Flex, and Adobe Flash, such as shown in FIG. 1. Addendum 3 shows an exemplary XML schema for storing and transferring chart data.

[0038] The server 82 may include a database and an Apache web server. The database may be any conventional database such as an Oracle database or an SQL database. The server may include a search platform such as Solr. These components of the computer, including creating, storing, modifying, and querying databases, and interfacing and communicating with networks are well understood by those having ordinary skill in the art.

[0039] FIG. 9 shows a method for method for monitoring online media and charting the results to facilitate human pattern detection. A computer program product may include a computer readable medium comprising computer readable code which when executed on the computer causes the computer to perform the methods described herein. Some or all of the computer readable code, which includes the data, algorithm, and visualization layers of FIG. 1 and the method of FIG. 9, may be executed on the processor of server 82 and client computer 84.

[0040] IV. Input for BMs

[0041] The similarity and distances between concepts is calculated and a distance matrix is created. In one example, per source and per region (or other demographics), a square, symmetric "co-references matrix" with co-reference numbers between concepts is computed. As will be disclosed below, depending on the algorithm used to compute the similarities and distances, the co-reference numbers between concepts may be between one, or any combination of the following: between entities-topics, topics-topics, and entities-entities.

[0042] For two identical concepts, the number of co-references (a value on the diagonal in the co-references matrix) is taken equal to the total number of documents in the collection that contain that concept (i.e., the "buzz" or "restricted buzz" of the concept). The size of the co-references matrix is k.times.k, with k the total number of concepts (number of entities m+number of topics n). Because the matrix is symmetric, the upper (or lower) triangular part together with the diagonal contain all needed information.

[0043] BMs may or may not aggregate multiple hours or days of data in each time frame (`moving window`), whether or not the aggregation is `overlapping`.

[0044] V. Algorithms

[0045] The positions (coordinates) of concept representations on a BM can be computed by various algorithms. These coordinates are 2- or 3-D approximations that are optimal in mathematical/statistical sense. Three exemplary algorithms are:

[0046] 1) Multidimensional Scaling (MDS)

[0047] 2) Principal Component Analysis (PCA)

[0048] 3) Correspondence Analysis (CA)

[0049] It is appreciated that these are not the only algorithms that may be used. The distance matrix may be computed from any other distance or similarity function between concepts. For example, text based cosine similarity between term-document vectors may be used. Accordingly, buzz and co-reference numbers are not specifically required since any similarity or distance relationship between concepts can be used. For example, distances may be calculated by text mining, based on hyperlink information, and the like. The matrix is not necessarily square and symmetric, and the distance function does not need to be symmetric. In the example with co-reference numbers it is symmetric.

[0050] V.1 Multidimensional Scaling (MDS)

[0051] MDS presents the concepts (e.g., entities and topics) in a 2D or 3D space such that the pairwise distances approximate the buzz-based distances as precisely as possible. Highly co-referenced concepts in general are placed close to each other on an MDS BM.

[0052] Multiple MDS algorithms exist. One type is "Classical, metric MDS", which includes advantages such as:

[0053] It gives an analytical solution requiring no iteration

[0054] It gives a nested solution (2D-3D- . . . )

[0055] "metric MDS is more robust in numerical sense; more likely to yield global optimum"

[0056] Input

[0057] The input for an MDS algorithm is a square, symmetric dissimilarity (distance) matrix (see FIG. 3). This k.times.k dissimilarity matrix is calculated from the (restricted) buzz and co-reference numbers in the co-reference matrix (see FIG. 2) by, for example, applying the following formula,

dist ( a , b ) = Dab = 1 - ( Nab 1 + 2 * Na + Nab 1 + 2 * Nb ) ( 1 ) ##EQU00001##

[0058] with Na and Nb the respective (restricted) buzz (values on diagonal), and Nab the co-occurrence frequency (off-diagonal values). (The `1+` in the denominator down-weights a bit cases like 1=Nab=N a=Nb (i.e., if both brands occur only once, their similarity should not be 100%)).

[0059] Short Description of the MDS Algorithm (Also See [1] in Addendum 2)

[0060] Output

[0061] The output of an MDS algorithm is a (k-by-1) configuration matrix containing the coordinates of concept representations. If the dissimilarity matrix (see FIG. 3) would be a Euclidean distance matrix, then 1 would be the dimension of the smallest space in which the k points can be embedded. In the case of BM, however, the matrix is a more general dissimilarity matrix and 1 is the number of positive eigenvalues of the matrix. For displaying the BM charts in two or three dimensions, only the first two or three coordinates (out of 1) are retained (see FIG. 5). Consequently, a BM is an approximation of the configuration of points that is optimal in mathematical sense.

[0062] V.1.1 Centric MDS

[0063] To compute a "centric MDS", which has a focal concept in the center, a one-dimensional MDS is calculated with all concept representations except for the centered one, which is left out. The result is a straight line of concept representations. Largest distance is between those on opposite sites of the line. Next, the line is "projected" on the unit circle (radius=1) around the centric concept in the following manner,

[0064] dMax=max(mdsCoords)-min(mdsCoords);

[0065] scale=dMax/(2*pi-pi/3);

[0066] posOnCirc=mdsCoords/scale;

[0067] posOnCirc=posOnCirc-min(posOnCirc);

[0068] angles=pi/3-posOnCirc;

[0069] centricCoordinates [cos(angles), sin(angles)];

[0070] where mdsCoords contains the ordinate values of all concepts on the line and centricCoordinates will contain the X- and Y-coordinates of the non-centric concepts, lying on the unit circle around the centric concept.

[0071] Each concept representation (b) on the unit circle is then pulled towards the center according to the number of co-references with the centric concept (a). An exponential multiplier is applied to the coordinates to pull concept (b) towards the centric concept; the x- and y-coordinates are multiplied by:

exp ( - 3 N ab min ( ( c N a c ) , N a ) ) ( 2 ) ##EQU00002##

[0072] where Na is the buzz of the centered concept (a), Nab is the number of co-references the centric concept (a) has with the non-centric one (b), and .SIGMA..sub.cN.sub.ac is the sum of all co-references of any concept (c) with the centric one (a).

[0073] Examples:

[0074] If there are no co-references, then the non-centric concept representation is on the unit circle (exp(0)=1).

[0075] If the number of co-references is maximal (Nab=Na), then the bubble is almost in the center. (exp(-3)=0,05).

[0076] V.2 Principal Component Analysis (PCA)

[0077] PCA gives the dimensions (axes) that explain most of the variance in the data by calculating the eigenvalue decomposition of the covariance matrix of an object-by-variable matrix. The resulting principal components are orthogonal linear combinations of the original `variables` (columns).

[0078] Input

[0079] The matrix in FIG. 4 is the complement of the dissimilarity matrix of FIG. 3 (Sab=1-Dab), completed with both the upper and lower triangular part. The values on the diagonal are set to the mean of the off-diagonal values on the corresponding row or column. The similarity/proximity/affinity matrix is first standardized and then passed as input to the PCA algorithm, where it is considered as an object-by-variable matrix.

[0080] Short Description of the PCA Algorithm (Also See Addendum 2)

[0081] Output

[0082] The "principal component scores" provide the representation of the data in the space spanned by the principal components, i.e., the coordinates of which again only the first two or three are withheld (see FIG. 5).

[0083] V.3 Correspondence Analysis (CA)

[0084] CA is a weighted form of PCA that is appropriate for frequency data of 2 categorical variables. To compute BMs using CA (Unlike MDS and PCA), only the co-reference counts between entities and topics are needed (gray region in FIG. 2, left). Hence, a frequency or contingency table listing all co-occurrence frequencies of entity-by-topic pairs suffices to calculate positions of concepts on the charts, reducing the number of queries needed and thus the computational complexity. However, the buzz values on the diagonal of the co-references matrix are needed in order to determine the "bubble sizes" of the concepts on the charts; and the entity-entity (blue region) and topic-topic (yellow) pairs are useful information to show on the chart when requested (see Section VI). If less than two rows or less than two columns remain in the contingency table, then the CA map is not generated.

[0085] V.5 Stability of BMs Over Time

[0086] In order to ensure stability of the dynamic charts over time, consecutive time frames are mapped onto each other in a mathematical optimal way. Depending on the algorithm used to compute the BM, this optimal mapping may be achieved by different algorithms. In case of MDS, the temporal mapping is done by the "Procrustes procedure" (also see [1] of Addendum 2): the chart of time t2 is mapped on the chart of time t1 by minimizing (in least-squares sense) allowed transformations: rotations, reflections, and dilations. For PCA and CA only reflections are allowed; the optimal reflection out of 4 possibilities (change of sign of X and/or Y axes) is calculated in least-squares sense. Centric MDS maps only consider a change of the sign of X.

[0087] Matrix Algebra Behind the Procrustes Procedure

[U,S,V]=singular_value_decomposition (coordinates_t1'*coordinates_t2) optimal_coordinates_t2=coordinates_t2*V*U'

[0088] V.6 Additional Remarks

[0089] In one embodiment, the calculations are done server-side. In another embodiment, the similarity/distance information is transferred from the server to the client, while concept positions are calculated by applying the algorithms on the client-side.

[0090] Classical MDS with (embeddable) Euclidean distances gives the same result as PCA (up to the sign). CA uses the Chi-Square distance as a dissimilarity measure, whereas MDS can accept any (dis)similarity measure.

[0091] VI. Visualization Engine

[0092] FIG. 6 is a mock-up and FIG. 7 a screenshots of Brand Map charts generated according to the above methods and systems. Some of the features and configuration options of the Brand Maps charts include.

[0093] VI.1 Features

[0094] Dimensionality

[0095] The charts can be one-, two- or three-dimensional.

[0096] Source Selection

[0097] The data source may be selected, for example "online news articles."

[0098] Region/Demographics Selection

[0099] The region or demographics may be selected, for example by country.

[0100] Algorithm Selection

[0101] For example MDS, PCA, CA

[0102] Legend

[0103] Shows how the different concept categories (e.g., "Brands" and "Topics") are visualized on the charts.

[0104] Size of Concept Representations

[0105] Concepts representations (e.g., the bubbles of FIGS. 6 and 7) are auto-scaled on the charts based on a linear or non-linear (e.g. sqrt, log, . . . ) function of the corresponding number of occurrences (buzz). This number of occurrences may be counted in any (sliding) time window. (e.g., one hour or day, or aggregated over multiple days, etc.). The user can also adjust the scaling factor.

[0106] Selecting Concept Representations

[0107] The user can select one or more concept representations, by either using the mouse or another pointing device to drag a rectangle around concept representations, or by clicking concepts while holding the control button in MS Windows, or the Option button on Apple Mac computers. Without holding the button, only the last clicked item remains selected. Selection can also be made by clicking one concept and holding the Shift button while clicking a second concept. All concepts residing in the implicit rectangle defined by the two selected nodes are be selected.

[0108] Non-Exhaustive List of Possible Interactions with one Selected Concept Representation

[0109] Request number of occurrences in the underlying data set ((restricted) buzz: red and green parts of FIG. 2), e.g. by hovering over the concept.

[0110] Request all information entities that can be attributed to the concept, e.g. the collection of articles that contain the concept, potentially ranked by different criteria (date, relevance, rank, etc.). These sets can be pre-computed (static) or generated on the fly (e.g., "Live search" functionality). The resulting list allows a user to browse the original information entities, offline or online.

[0111] Hide/show

[0112] Trace concept over time

[0113] Switch to centric MDS map with the selected concept representation as focal concept

[0114] Non-Exhaustive List of Possible Interactions with Two or More Selected Concept Representations

[0115] Request number of co-references in the underlying data set (blue, grey and yellow parts of FIG. 2)

[0116] Request all information entities that can be attributed to the combination of concepts, e.g. the collection of articles that contain each concept, potentially ranked by different criteria (date, relevance, rank, etc.). These sets can be pre-computed (static) or generated on the fly (e.g., "Live search" functionality). The resulting list allows to browse to the original information entities, offline or online, allowing users to drill down to individual articles that have concrete associations between certain entities/topics

[0117] Hide/show

[0118] Trace pairs of concepts over time

[0119] Hide Selected Concept Representations

[0120] The user interface allows hiding a sub-selection of concepts, whether or not leading to recalculating the positions of the remaining concepts. Currently, the selected nodes are just hidden from view, while their underlying data is still considered to define the positions of all concepts on the charts. However, it might as well trigger a re-calculation of node positions, be it either client-side or server-side.

[0121] Show All Concept Representations

[0122] Show all hidden nodes again.

[0123] Show/Hide Concept Labels

[0124] Whether the user- or automatically-defined labels for concepts are shown close to their representation. When activated, the labels are optimized in order not to overlap too much with other labels.

[0125] Interactive Timeline with Play/Pause Button

[0126] The interface may show a time slider (see sliders at bottom of FIGS. 6 and 7) that can be used interactively to go back and forth in time, and play/pause/ . . . buttons to control automatic animation. The timeline shows the current time window of data that is used to make up the current chart. The user can drag the slider to move the sliding window or start/pause the automated advancing of the time window animation. The user can also interactively adjust the speed of the automated advancing of the time window animation.

[0127] "Interpolation Effect"

[0128] When the current time frame is changed (manually or automatically), the concept representations can visually move on the chart to their new locations (updated coordinates) that are computed by the selected algorithm based on the corresponding co-reference's matrix. For example, two concepts might move closer together because they are discussed more often together.

[0129] Non-Exhaustive List of Additional Features

[0130] The user interface automatically or manually groups/annotates concepts based on common features.

[0131] The color of concept representations illustrates the overall sentiment value of underlying information units.

[0132] One or more concepts may optionally be traced on the charts by visualizing the track they follow over time.

[0133] Concepts may be added to the charts by automatic topic detection and/or named entity recognition techniques. Other concepts may disappear from the chart if they become less interesting over time, in whatever sense.

[0134] Scale Labels

[0135] The font size of the concept labels on the map (e.g., "Barack Obama") can be auto-scaled in function of the corresponding number of occurrences (buzz).

[0136] VII. Interpreting Brand Map Charts

[0137] (Occasional reference is made to reference material of Addendum 2, and to http://faculty.chass.ncsu.edu/garson/PA765/mds.htm.)

[0138] VII.1 MDS

[0139] "While MDS assures that objects which are similar are close on the MDS map, the axes and orientation are arbitrary functions of the input data. . . . Likewise, in intuiting the meaning of dimensions, since the axes are arbitrarily oriented, it may be more interpretable to understand point location diagonally rather than vertically/horizontally."

[0140] Horizontal and vertical axes are not to be interpreted, they have no real meaning. The only thing that matters is the pairwise distances between bubbles. Consequently, no axes are shown on BMs with MDS.

[0141] Prior knowledge about the field of interest should be used to interpret a given MDS plot. For instance, if all nodes on the MDS plot lie on a line or on a circle, or if they cluster in different groups, then you can use your expert knowledge to try to explain the reason why. Particular geometries or groupings on the plot can thus be interpreted, if you know the data.

[0142] Interpreting the MDS representation essentially means to link some of its geometric properties to known or assumed features about the brands or topics represented by the points.

[0143] It involves human interpretation of the scatter of points in specific dimensions, not necessarily the given X and Y axis. So, feel free to draw lines or curves on an MDS plot that partition the space to support your interpretations/explanations.

[0144] Another reason why the actual X and Y axis of the MDS plot have no real meaning is that the MDS representation is insensitive to rotations, translations, reflections, and dilations. i.e. a rotated MDS is the same MDS.

[0145] VII.2 PCA

[0146] PCA does not establish a direct link between dissimilarity measures and geometric distance.

[0147] It is not necessarily true that the ratio of the distances between two pairs of nodes approximately corresponds to the ratio of their buzz-based distances, as is the case for MDS.

[0148] "A PCA solution is seldom studied geometrically. Rather, typically only the loadings of the vectors on the components are interpreted."

[0149] VII.3 CA

[0150] Distances on CA charts are related to "profile vectors."

[0151] The origin is the average entity (and topic) profile (centroid).

[0152] "In the simultaneous representation, the apparent distance between a point j and a point k is not a genuine distance", so distances between entities and topics to be interpreted with care.

[0153] From [2] of Addendum 2 ("Geometric Data Analysis", Le Roux and Rouanet), p. 49: "Interpreting an axis amounts to finding out what is similar, on the one hand, between all the elements figuring on the right of the origin and, on the other hand between all that is written on the left; and expressing with conciseness and precision, the contrast (or opposition) between the two extremes."

[0154] Addendum 1 shows two examples of the method of FIG. 9 using actual data. One example uses multidimensional scaling, and the other example uses correspondence analysis.

[0155] With the above disclosure in mind, and referring to FIG. 9, at step 92 a time frame is specified. It is understood that the time frame may be manually specified by a user, automatically specified by, for example, the server (82 of FIG. 8), or any combination thereof. Examples of times frames are hourly, daily, weekly, monthly, or any other arbitrary period of time, such as every 28 days. The specifying may further include specifying a region, specifying a language, specifying a data source, and the like.

[0156] At step 94 a search engine is queried for concepts within the time frame. The concepts include at least one of an entity and a topic. The step of querying further comprises querying a search engine for concepts and pair-wise combinations of concepts. A query may include the conjunction (boolean AND combination) of other queries.

[0157] At step 96 the similarity and distances between the concepts are calculated. As disclosed above the calculating comprises computing a distance matrix. In one example computing the distance matrix comprises computing a square symmetric co-reference matrix with co-reference numbers between all possible pairs of concepts. In another example, computing the distance matrix comprises computing a co-reference matrix with co-reference numbers between at least one of possible pairs of concepts, wherein the possible pairs comprise entities-topics, topics-topics, and entities-entities. In yet another example, the distance matrix is at least one of asymmetric and not square. And, in another example, the distance matrix is at least one of symmetric and square. In still another example, the query of step 94 returns a number of articles or documents and the computing in step 96 comprises computing buzz numbers and co-reference numbers from the number of articles or documents.

[0158] At step 98 the graph coordinates of the concepts are computed from at least part of the matrix which was computed in step 96. The graph coordinates are computed using one of a multidimensional scaling algorithm, a centric multidimensional scaling algorithm, a principal component analysis algorithm, and a correspondence analysis algorithm.

[0159] As indicated by arrow 104, steps 94, 96, and 98 are repeated for additional time frames.

[0160] At step 100 consecutive time frames are mapped onto each other. In mapping, at least one of the following transformations are computed: a rotation, a reflection, a dilation, and a sign change. One procedure for mapping time frames is a Procrustes procedure.

[0161] At step 102 a dynamic chart is generated showing the relationships between the concepts and how they evolve over the time frames.

[0162] The foregoing detailed description has discussed only a few of the many forms that this invention can take. It is intended that the foregoing detailed description be understood as an illustration of selected forms that the invention can take and not as a definition of the invention. It is only the following claims, including all equivalents, that are intended to define the scope of this invention.

[0163] Addendum 1: Examples Using MDS and CA

TABLE-US-00001 TABLE A1.1 Concepts: types, labels, queries, buzz, and "restricted buzz" Concepts "Restricted Type Label Query Buzz buzz" Entity "Barack Obama" ((("Barack Obama" OR (obama AND (president 195561 / OR senator))))) Entity "John McCain" (((McCain AND (John OR president OR 162940 / republican OR candidate)))) Entity "Sarah Palin" (((palin AND (sarah OR president OR candidate 63301 / OR alaska OR governor OR McCain)))) Entity "Joe Biden" (((biden AND (joe OR obama OR president OR 59812 / candidate OR senator)))) Topic "Iraq" ((iraq OR iraqi OR escalation OR (("middle east" 43277 OR este) AND (crisis OR guerra OR war)))) Topic "economy" ((Economia OR economy OR economics OR 67549 economic OR dollar OR gastos OR dollars OR (fiscal AND (policy OR crisis)))) Topic "values" ((values OR morals OR moral OR valores OR "the 35470 family" OR abortion OR aborto OR morality)) Topic "environment" ((environment OR ambiente OR environmental 13006 OR eco OR "climate change" OR "climate control")) Topic "foreign policy" (((foreign AND policy) OR (politica AND 26048 extranjero))) Topic "taxes" ((impuestos OR tax OR taxes OR tariffs OR tariff)) 28851 Topic "big business" (("big business" OR corporation OR corporate 9918 OR corporation OR (negocios AND grandes))) Topic "energy" ((energy OR gas OR petrol OR oil OR petroleo 45352 OR energia OR petroleo)) Topic "health care" ((health OR medicare OR medicaid OR salud)) 29301 m = number of entities = 4 n = number of topics = 9 k = m + n = 13 Data window: 2008-08-20 to 2008-09-16 Data source: online news articles

TABLE-US-00002 TABLE A1.2 Input for ABM (for 1 region and 1 source): symmetric co-reference matrix with buzz, restricted buzz and (restricted) co-reference numbers. Co-reference matrix (symmetric) 195561 126617 36112 53053 39356 61478 31901 10637 25439 26038 8625 38311 26162 126617 162940 49182 42074 36049 54944 28069 9748 22680 23575 7479 35807 21897 36112 49182 63301 13702 9942 16290 12487 3780 6095 7982 2372 13904 7093 53053 42074 13702 59812 16230 19893 12412 2729 15401 6620 1849 11534 8098 39356 36049 9942 16230 43277 19989 11116 3580 12819 9751 2001 14831 9808 61478 54944 16290 19893 19989 67549 14296 6552 13483 18533 6264 24791 16718 31901 28069 12487 12412 11116 14296 35470 3378 7071 7527 1714 11153 8044 10637 9748 3780 2729 3580 6552 3378 13006 2368 4030 1347 6988 3867 25439 22680 6095 15401 12819 13483 7071 2368 26048 4849 1224 9246 4798 26038 23575 7982 6620 9751 18533 7527 4030 4849 28851 3956 15693 10358 8625 7479 2372 1849 2001 6264 1714 1347 1224 3956 9918 3704 3326 38311 35807 13904 11534 14831 24791 11153 6988 9246 15693 3704 45352 11021 26162 21897 7093 8098 9808 16718 8044 3867 4798 10358 3326 11021 29301

TABLE-US-00003 TABLE A1.3 Square, symmetric dissimilarity/disparity/distance matrix, calculated from information in the co-reference matrix by applying formula (1). Distance matrix 0.000 0.288 0.622 0.421 0.445 0.388 0.469 0.564 0.447 0.482 0.543 0.480 0.487 0.288 0.000 0.461 0.519 0.473 0.425 0.518 0.595 0.495 0.519 0.600 0.495 0.559 0.622 0.461 0.000 0.777 0.807 0.751 0.725 0.825 0.835 0.799 0.862 0.737 0.823 0.421 0.519 0.777 0.000 0.677 0.686 0.721 0.872 0.576 0.830 0.891 0.776 0.794 0.445 0.473 0.807 0.677 0.000 0.621 0.715 0.821 0.606 0.718 0.876 0.665 0.719 0.388 0.425 0.751 0.686 0.621 0.000 0.693 0.700 0.641 0.542 0.638 0.543 0.591 0.469 0.518 0.725 0.721 0.715 0.693 0.000 0.823 0.765 0.763 0.889 0.720 0.749 0.564 0.595 0.825 0.872 0.821 0.700 0.823 0.000 0.864 0.775 0.880 0.654 0.785 0.447 0.495 0.835 0.576 0.606 0.641 0.765 0.864 0.000 0.823 0.915 0.721 0.826 0.482 0.519 0.799 0.830 0.718 0.542 0.763 0.775 0.823 0.000 0.732 0.555 0.644 0.543 0.600 0.862 0.891 0.876 0.638 0.889 0.880 0.915 0.732 0.000 0.772 0.776 0.480 0.495 0.737 0.776 0.665 0.543 0.720 0.654 0.721 0.555 0.772 0.000 0.690 0.487 0.559 0.823 0.794 0.719 0.591 0.749 0.785 0.826 0.644 0.776 0.690 0.000

TABLE-US-00004 TABLE A1.4 Two-dimensional configuration matrix resulting from application of classical, metric multidimensional scaling on the distance matrix in Table A1.3. Multidimensional Scaling (MDS) Concept X Y "Barack Obama" -0.037 0.067 "John McCain" -0.038 -0.060 "Sarah Palin" -0.044 -0.410 "Joe Biden" -0.370 0.085 "Iraq" -0.208 0.130 "economy" 0.106 0.141 "values" -0.146 -0.194 "environment" 0.184 -0.281 "foreign policy" -0.377 0.169 "taxes" 0.260 0.084 "big business" 0.363 0.219 "energy" 0.134 -0.055 "health care" 0.174 0.105

[0164] Centric MDS: Example for "Barack Obama" as Focal Concept

TABLE-US-00005 TABLE A1.5 mdsCoords: ordinate values of all concepts but the focal concept, resulting from application of classical, metric MDS on the distance matrix from Table A1.3 in which row 1 and column 1 are first removed (focal concept). Concept X "John McCain" -0.0441802 "Sarah Palin" -0.0541157 "Joe Biden" -0.3703048 "Iraq" -0.2104367 "economy" 0.1030416 "values" -0.1489996 "environment" 0.1801164 "foreign policy" -0.3781218 "taxes" 0.2567263 "big business" 0.3651794 "energy" 0.1282719 "health care" 0.1728232

TABLE-US-00006 dMax = 0.74330 scale = 0.14196 posOnCirc = 2.35236 2.28238 0.05507 1.18121 3.38943 1.61399 3.93236 0.00000 4.47202 5.23599 3.56716 3.88099 angles = -1.30517 -1.23518 0.99213 -0.13402 -2.34223 -0.56679 -2.88516 1.04720 -3.42482 -4.18879 -2.51996 -2.83379 Concept X Y centricCoordinates = "Barack Obama" 0.00000 0.00000 "John McCain" 0.26252 -0.96493 "Sarah Palin" 0.32935 -0.94421 "Joe Biden" 0.54691 0.83719 "Iraq" 0.99103 -0.13361 "economy" -0.69716 -0.71691 "values" 0.84363 -0.53693 "environment" -0.96730 -0.25363 "foreign policy" 0.50000 0.86603 "taxes" -0.96016 0.27946 "big business" -0.50000 0.86603 "energy" -0.81293 -0.58236 "health care" -0.95300 -0.30297

[0165] After application of the exponential multiplier to the coordinates (to pull non-centric concepts the center), this becomes:

TABLE-US-00007 TABLE A1.5 Two-dimensional configuration matrix resulting from centric MDS. "Barack Obama" is the focal concept. Concept X Y "Barack Obama" 0.00000 0.00000 "John McCain" 0.03764 0.13834 "Sarah Palin" 0.18927 -0.54260 "Joe Biden" 0.24236 0.37100 "Iraq" 0.54186 -0.07306 "economy" -0.27149 -0.27918 "values" 0.51715 -0.32914 "environment" -0.82167 -0.21544 "foreign policy" 0.33844 0.58620 "taxes" -0.64398 0.18743 "big business" -0.43803 0.75870 "energy" -0.45166 -0.32356 "health care" -0.63796 -0.20281

[0166] Stability of ABMs Over Time

[0167] In case of MDS, the temporal mapping is done by the "Procrustes procedure".

[0168] For example, Table A1.6 contains the coordinates for a subsequent time frame, which are to be mapped on the coordinates of Table A1.4 (previous time frame).

TABLE-US-00008 TABLE A1.6 coordinates_t2: ABM coordinates of a later time frame. Concept X Y "Barack Obama" -0.040000 0.070000 "John McCain" -0.040000 -0.060000 "Sarah Palin" -0.040000 -0.410000 "Joe Biden" -0.370000 0.080000 "Iraq" -0.210000 0.130000 "economy" 0.110000 0.140000 "values" -0.150000 -0.190000 "environment" 0.180000 -0.280000 "foreign policy" -0.380000 0.170000 "taxes" 0.260000 0.080000 "big business" 0.360000 0.220000 "energy" 0.130000 -0.050000 "health care" 0.170000 0.110000

TABLE-US-00009 TABLE A1.7 optimal_coordinates_t2: ABM coordinates of the later time frame (cf. Table A1.6) `mapped` onto the previous time frame (Table A1.4) by the procrustes procedure. (Allowed transformations for an MDS ABM: rotations, reflections, and dilations) Concept X Y "Barack Obama" -0.036903 0.067850 "John McCain" -0.036860 -0.048182 "Sarah Palin" -0.039172 -0.362750 "Joe Biden" -0.370051 0.092506 "Iraq" -0.210448 0.109146 "economy" 0.105558 0.132463 "values" -0.134775 -0.185817 "environment" 0.185918 -0.313082 "foreign policy" -0.384220 0.153858 "taxes" 0.256170 0.070392 "big business" 0.362573 0.273206 "energy" 0.131387 -0.079863 "health care" 0.170822 0.090273

[0169] If the set of concepts that are present in timeframe t1 is not exactly the same as in timeframe t2, then the procrustes procedure only considers the concepts that are present in both timeframes (intersection). (For example, concepts might have zero buzz in one of the timeframes, or new concepts could be added to the brand map)

[0170] Principal Component Analysis (PCA)

TABLE-US-00010 TABLE A1.8 Contingency table `contTable` (=sub-part of co-reference matrix in Table A1.2) with column sums, row sums and total sum indicated. Correspondence Analysis (CA) Row foreign big health sums Iraq economy values environment policy taxes business energy care (row Sum) Barack 39356 61478 31901 10637 25439 26038 8625 38311 26162 267947 Obama John 36049 54944 28069 9748 22680 23575 7479 35807 21897 240248 McCain Sarah 9942 16290 12487 3780 6095 7982 2372 13904 7093 79945 Palin Joe 16230 19893 12412 2729 15401 6620 1849 11534 8098 94766 Biden Column 101577 152605 84869 26894 69615 64215 20325 99556 63250 totSum = sums 682906 (colSum)

[0171] Octave code to compute CA according to [2] ("Geometric Data Analysis", Le Roux and Rouanet):

TABLE-US-00011 nEntities= m; nNodes= k; validBuzzMatrixRowsCols= [1:nNodes]; rowSum= sum(contTable,2); colSum = sum(contTable,1); totSum= sum(rowSum); Dr= diag(rowSum); Dc= diag(colSum); E= rowSum*colSum/totSum; % matrix of expected values under the independence model DrPow_05= Dr{circumflex over ( )}(-0.5); DcPow_05= Dc{circumflex over ( )}(-0.5); DrPow_pos05= Dr{circumflex over ( )}(0.5); DcPow_pos05= Dc{circumflex over ( )}(0.5); M= DrPow_05 * contTable * DcPow_05; M0= M - 1/totSum*( DrPow_pos05 * ones(size(contTable,1),1) * ones(1,size(contTable,2)) * DcPow_pos05); [u, s, v] = svd(M0); R= sqrt(totSum) * DrPow_05 * u * s; if size(v,2) ~= size(s,1) sss= s'; else sss= s; end C= sqrt(totSum) * DcPow_05 * v * sss; coords2D = zeros(nNodes, 2); coords2D( [intersect(validBuzzMatrixRowsCols,[1:nEntities])], 1:2)= R(:,1:2); coords2D( [intersect(validBuzzMatrixRowsCols,[nEntities+1:nNodes])], 1:2)= C(:,1:2);

TABLE-US-00012 TABLE A1.9 Two-dimensional configuration matrix resulting from CA. Matrix of expected values under the independence model (E) = 3.9855e+04 5.9877e+04 3.3299e+04 1.0552e+04 2.7314e+04 2.5196e+04 7.9748e+03 3.9062e+04 2.4817e+04 3.5735e+04 5.3687e+04 2.9857e+04 9.4614e+03 2.4491e+04 2.2591e+04 7.1504e+03 3.5024e+04 2.2252e+04 1.1891e+04 1.7865e+04 9.9353e+03 3.1484e+03 8.1495e+03 7.5174e+03 2.3794e+03 1.1655e+04 7.4044e+03 1.4096e+04 2.1177e+04 1.1777e+04 3.7320e+03 9.6604e+03 8.9110e+03 2.8205e+03 1.3815e+04 8.7771e+03 M = 0.238555 0.304026 0.211546 0.125305 0.186262 0.198502 0.116874 0.234566 0.200963 0.230763 0.286950 0.196572 0.121271 0.175373 0.189803 0.107028 0.231528 0.177633 0.110327 0.147483 0.151596 0.081521 0.081701 0.111403 0.058844 0.155851 0.099748 0.165422 0.165421 0.138402 0.054057 0.189614 0.084862 0.042130 0.118746 0.104597 . . . Concept X Y coords2D = "Barack Obama" -0.0243545 0.0271031 "John McCain" -0.0272003 0.0229095 "Sarah Palin" -0.1183412 -0.1180778 "Joe Biden" 0.2376518 -0.0351015 "Iraq" 0.0731092 0.0307280 "economy" -0.0125957 0.0416491 "values" -0.0080728 -0.0993985 "environment" -0.1202766 -0.0237781 "foreign policy" 0.2449037 -0.0154220 "taxes" -0.1008656 0.0231540 "big business" -0.1255398 0.0620026 "energy" -0.0816193 -0.0395712 "health care" -0.0233801 0.0294757

[0172] Addendum 2: Reference Material

[0173] The following reference material is hereby incorporated by reference:

[0174] Lee G. Cooper, "A Review of Multidimensional Scaling in Marketing Research,"

[0175] Applied Psychological Measurement, Vol. 7, No. 4, 427-450 (1983)

[0176] http://apm.sagepub.com/cgi/content/abstract/7/4/427

[0177] C. L. Bentley, M. O. Ward, "Animating multidimensional scaling to visualize N-dimensional data sets," infovis, pp. 72, 1996 IEEE Symposium on Information Visualization (Info Vis '96), 1996

[0178] http://www2.computer.org/portal/web/csdl/doi/10.1109/INFVIS.1996.55- 9 223

[0179] [1] Modern Multidimensional Scaling. Theory and Applications.

[0180] Series: Springer Series in Statistics

[0181] Borg, Ingwer, Groenen, Patrick J. F.

[0182] Originally published in the series: Springer Series in Statistics

[0183] 2nd ed., 2005, XXII, 614 p. 176 illus., Hardcover

[0184] ISBN: 978-0-387-25150-9

[0185] [2] Geometric Data Analysis

[0186] From Correspondence Analysis to Structured Data Analysis

[0187] Le Roux, Brigitte, Rouanet, Henry

[0188] 2005, XI, 475 p., Hardcover

[0189] ISBN: 978-1-4020-2235-7

[0190] [3] Applied Multivariate Techniques

[0191] Subhash Sharma

[0192] 1995, 493 p., Hardcover

[0193] John Wiley & Sons Inc

[0194] ISBN-10: 0471310646

[0195] ISBN-13: 9780471310648

TABLE-US-00013 Addendum 3: XML Schema Definition for transferring BM data <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">  <xs:annotation> <xs:appinfo>Attentio Note</xs:appinfo> <xs:documentation xml:lang="en"> This Schema defines a series of plots. </xs:documentation> </xs:annotation> <xs:simpleType name="nodeLabelType"> <xs:restriction base="xs:string"> <xs:whiteSpace value="collapse"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="nodeKindType"> <xs:restriction base="xs:string"> <xs:enumeration value="Entity"/> <xs:enumeration value="Topic"/> <xs:enumeration value="unspecified"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="buzzSizeType"> <xs:restriction base="xs:integer"> <xs:minInclusive value="-1"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="normalizedBuzzSizeType"> <xs:restriction base="xs:float"> <xs:minInclusive value="0.0"/> <xs:maxInclusive value="100.0"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="coOccNumberType"> <xs:restriction base="xs:integer"> <xs:minInclusive value="-1"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="floatList"> <xs:list itemType="xs:float"/> </xs:simpleType> <xs:simpleType name="buzzSizeList"> <xs:list itemType="buzzSizeType"/> </xs:simpleType> <xs:complexType name="axisQType"> <xs:attribute name="ax" type="xs:positiveInteger" use="required"/> <xs:attribute name="Q" type="xs:string" use="required"/> </xs:complexType> <xs:complexType name="nodeType"> <xs:sequence> <xs:element name="co" type="floatList"/> <xs:element name="c" type="floatList" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="bz" type="buzzSizeType" minOccurs="0"/> <xs:element name="nrmBz" type="normalizedBuzzSizeType" minOccurs="0"/> <xs:element name="s" type="buzzSizeList" minOccurs="0"/> <xs:element name="query" type="xs:string" minOccurs="0"/> </xs:sequence> <xs:attribute name="label" type="nodeLabelType"/> <xs:attribute name="ID" type="nodeLabelType" use="required"/> <xs:attribute name="v" type="xs:boolean" default="true"/> </xs:complexType> <xs:complexType name="coOccType"> <xs:attribute name="u" type="nodeLabelType" use="required"/> <xs:attribute name="v" type="nodeLabelType" use="required"/> <xs:attribute name="n" type="coOccNumberType" use="required"/> </xs:complexType> <xs:attributeGroup name="plotAttrGrp"> <xs:attribute name="type" type="xs:string"/> <xs:attribute name="ti" type="xs:string"/> <xs:attribute name="dim" type="xs:positiveInteger" default="2"/> <xs:attribute name="dataStartDate" type="xs:date"/> <xs:attribute name="dataEndDate" type="xs:date"/> <xs:attribute name="dateGen" type="xs:date"/> <xs:attribute name="timeGen" type="xs:time"/> <xs:attribute name="coordsComputationTime" type="xs:duration"/> <xs:attribute name="xLab" type="xs:string"/> <xs:attribute name="yLab" type="xs:string"/> <xs:attribute name="zLab" type="xs:string"/> <xs:attribute name="Q" type="xs:string"/> </xs:attributeGroup> <xs:complexType name="plotType"> <xs:sequence> <xs:element name="Q" type="axisQType" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="n" type="nodeType" maxOccurs="unbounded"/> <xs:element name="cr" type="coOccType" maxOccurs="unbounded" minOccurs="0"/> <xs:any minOccurs="0"/> </xs:sequence> <xs:attributeGroup ref="plotAttrGrp"/> <xs:anyAttribute/> </xs:complexType> <xs:complexType name="nodeIDsAndLabelsType"> <xs:attribute name="ID" type="xs:string" use="required"/> <xs:attribute name="label" type="nodeLabelType" use="required"/> <xs:attribute name="kind" type="nodeKindType" default="unspecified"/> </xs:complexType> <xs:complexType name="allNodeIDsAndLabelsType"> <xs:sequence> <xs:element name="nodeID" type="nodeIDsAndLabelsType" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="plotSeriesType"> <xs:sequence> <xs:element name="NodeIDsAndLabels" type="allNodeIDsAndLabelsType" maxOccurs="1"/> <xs:element name="Plot" type="plotType" maxOccurs="unbounded"/> <xs:any minOccurs="0"/> </xs:sequence> <xs:attribute name="seriesTitle" type="xs:string" default=""/> <xs:attribute name="projectName" type="xs:string" default=""/> <xs:attribute name="projectLabel" type="xs:string" default=""/> <xs:attribute name="projectID" type="xs:string" default=""/> <xs:attribute name="alg" type="xs:string" default=""/> <xs:attribute name="version" type="xs:positiveInteger" default="1"/> <xs:attribute name="projectStartDate" type="xs:string" default=""/> <xs:attribute name="projectEndDate" type="xs:string" default=""/> <xs:attribute name="projectReportFreq" type="xs:float" default="24"/> <xs:attribute name="srcUsrLab" type="xs:string" default=""/> <xs:attribute name="regionUsrLab" type="xs:string" default=""/> <xs:attribute name="entitiesPresLabel" type="xs:string" default="Brands"/> <xs:attribute name="topicsPresLabel" type="xs:string" default="Topics"/> <xs:anyAttribute/> </xs:complexType> <xs:element name="PlotSeries" type="plotSeriesType"/> </xs:schema>

* * * * *

Method and system for monitoring online media and dynamically charting the results to facilitate human pattern detection

Janssens; Frizo ; et al.

References