Method for identification, categorization and search of graphical, auditory and computational pattern ensembles Gnanasambandam; Shanmuga-nathan ; et al. [Xerox Corporation]

Method for identification, categorization and search of graphical, auditory and computational pattern ensembles

Gnanasambandam; Shanmuga-nathan ; et al.

Patent Application Summary

U.S. patent application number 13/109343 was filed with the patent office on 2011-11-17 for method for identification, categorization and search of graphical, auditory and computational pattern ensembles. This patent application is currently assigned to Xerox Corporation. Invention is credited to Shanmuga-nathan Gnanasambandam, Jonathan David Levine.

Application Number	20110279458 13/109343
Document ID	/
Family ID	44911388
Filed Date	2011-11-17

United States Patent Application	20110279458
Kind Code	A1
Gnanasambandam; Shanmuga-nathan ; et al.	November 17, 2011

Method for identification, categorization and search of graphical, auditory and computational pattern ensembles

Abstract

This disclosure provides a system and method to use context graphs for targeting communications to a user of an image rendering device. According to one exemplary embodiment, the method provides targeted communications generated as a function of one or more attributes associated with a user requested printed document. The attributes are provided by accessing a context graph including a plurality of links between a plurality of entities.

Inventors:	Gnanasambandam; Shanmuga-nathan; (Victor, NY) ; Levine; Jonathan David; (Rochester, NY)
Assignee:	Xerox Corporation Norwalk CT
Family ID:	44911388
Appl. No.:	13/109343
Filed:	May 17, 2011

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
61345377	May 17, 2010
61345301	May 17, 2010
61345340	May 17, 2010
61345289	May 17, 2010

Current U.S. Class:	345/440 ; 358/1.15
Current CPC Class:	G06Q 30/08 20130101; G06Q 30/0238 20130101; G06Q 30/0251 20130101
Class at Publication:	345/440 ; 358/1.15
International Class:	G06T 11/20 20060101 G06T011/20; G06K 15/02 20060101 G06K015/02

Claims

1. A method of providing user requested printed material and one or more targeted communications to a user of a printing system, the method comprising: a) the printing system acquiring material to be printed on a printing device to produce the user requested printed material, the printing device associated with the printing system; b) the printing system acquiring the one or more targeted communications associated with the user requested printed material; and, c) the printing system printing the user requested printed material utilizing the printing device and providing the acquired one or more targeted communications to the user, wherein the one or more targeted communications are generated as a function of one or more attributes associated with one or more of the user requested printed material the printing device and the user, the one or more attributes associated with pattern ensembles included within a context graph, the context graph including a plurality of links between a plurality of entities related to one or more of the user requested printed material, the printing device and the user.

2. The method according to claim 1, wherein the context graph is an n-partite scale-free graph.

3. The method according to claim 2, wherein the context graph includes a historical representation of the plurality of links between the plurality of entities.

4. The method according to claim 3, the plurality of entities associated with one or more of events, people, devices, locations, date, time, keywords, categories, preferences.

5. The method according to claim 3, wherein the context graph includes weights associated with the plurality of links, the weights related to a frequency of links between a pair of entities.

6. The method according to claim 3, wherein the context graph is dynamically updated.

7. The method according to claim 3, wherein the one or more attributes are provided by one or more sub-graphs extracted from the context graph.

8. The method according to claim 7, wherein the one or more sub-graphs extracted from the context graph are associated with a predefined structure.

9. The method according to claim 7, wherein the one or more attributes are provided by a dominant sub-graph associated with the one or more sub-graphs extracted from the context graph.

10. The method according to claim 3, wherein the one or more attributes are provided by two or more sub-graphs connected by one or more hubs associated with the context graph.

11. The method according to claim 1, wherein a verification action is performed by the user, the verification action indicating the user has read at least one target communication.

12. A printing system for providing user requested printed material and one or more targeted communications to a user of the printing system comprising: a printing device; and, one or more servers operating connected to the printing device, wherein one or more of the printing device and the one or more servers are configured to execute a method comprising: the printing system acquiring material to be printed on the printing device to produce the user requested printed material; a) the printing system acquiring the one or more targeted communications associated with the user requested printed material; and, b) the printing device printing the user requested printed material and the printing device providing the acquired one or more targeted communications to the user, wherein of one or more attributes associated with one or more of the user requested printed material, the printing device and the user, the attributes associated with pattern ensembles included within the context graph, the context graph including a plurality of links between a plurality of entities related to one or more of the user requested printed material, the printing device and the user.

13. The printing system according to claim 12, wherein the context graph is an n-Partite scale-free graph.

14. The printing system according to claim 13, wherein the context graph includes a historical representation of the plurality of link between the plurality of entities.

15. The printing system according to claim 14, the plurality of entities associated with one or more of events, people, devices, locations, date, time, keywords, categories, preferences.

16. The printing system according to claim 14, wherein the context graph includes weights associated with the plurality of links, the weights related to a frequency of links between a pair of entities.

17. The printing system according to claim 14, where the context graph is dynamically updated.

18. The printing system according to claim 14, wherein the one or more attributes are provided by one or more sub-graphs extracted from the context graph.

19. The printing system according to claim 18, wherein the one or more sub-graphs extracted from the context graph are associated with a predefined structure.

20. The printing system according to claim 18, wherein the one or more attributes are provided by a dominant sub-graph associated with the one or more sub-graphs extracted from the context graph.

21. The printing system according to claim 14, wherein the one or more attributes are provided by two or more sub-graphs connected by one or more hubs associated with the context graph.

22. The printing system according to claim 12, wherein a verification action is performed by the user, the verification action indicating the user has read at least one targeted communication.

23. A method of providing a targeted communication to a first image rendering device operatively connected to one or more of a second image rendering device and comprising: a) the image rendering device acquiring the target communication from one of the second image rendering device and the server; and b) the image rendering device rendering the target communication on the first image rendering device, wherein the targeted communication is generated as a function of one or more attributes associated with one or more of the first image rendering device, and a user associated with the first image rendering device, the one or more attributes associated with pattern ensembles involved within the context graph, the context graph including a plurality of links between a plurality of entities related to one or more of the first image rendering device, the second image rendering device, and the user associated with the first image rendering device.

24. The method according to claim 23, wherein the context graph is an n-Partite scale-free graph.

25. The method according to claim 23, wherein the context graph includes weights associated with the plurality of links, the weights related to a frequency of links between a pair of entities.

Description

[0001] This application claims the benefit of priority to U.S. Provisional Application No. 61/345,377, filed May 17, 2010, entitled "METHOD FOR IDENTIFICATION, CATEGORIZATION AND SEARCH OF GRAPHICAL, AUDITORY AND COMPUTATIONAL PATTERN ENSEMBLES," by Gnanasambandam et al., U.S. Provisional Application No. 61/345,340, filed May 17, 2010, entitled "OPTIMAL AUCTION MECHANISM FOR MULTI-LEVEL DEVICE CLICK-THROUGH (DCT) IN TARGETED PRINT COMMUNICATION," by Lee et al., U.S. Provisional Application No. 61/345,301, filed May 17, 2010, entitled "SYSTEM AND METHOD TO PRODUCE AND CONTROL SUBSIDIZATION OF TARGETED MATERIALS AT POINT OF SALE," by Gnanasambandam et al., and U.S. Provisional Application No. 61/345,289, filed May 17, 2010, entitled "SYSTEM AND METHODS TO USE IN-PARTITE SCALE-FREE GRAPHS FOR INTERPRETING CONTEXTUAL INFORMATION AND TARGETING," by Gnanasambandam et al., all of which are hereby incorporated by reference in their entirety.

CROSS REFERENCE TO RELATED PATENTS AND APPLICATIONS

[0002] U.S. patent application Ser. No. 13/100,636, filed May 4, 2011, entitled "Method of Providing Targeted Communications to a User of a Printing System," by Lee et al.

[0003] U.S. patent application Ser. No. 13/104,136, filed May 10, 2011, entitled "System and Method to Produce and Control Subsidization of Targeted Materials at Point of Sale," by Gnanasambandam et al.

[0004] U.S. patent application Ser. No. ______, filed ______, entitled "System and Methods to Use Context Graphs for Targeting Communications," by Gnanasambandam et al. each of which is hereby incorporated by reference in their entirety.

BACKGROUND

[0005] This disclosure relates generally to a system and method for generating targeted advertisements, and more particularly to a system and method for generating targeted advertisements delivered via a printer assembly including at least one printer device. It is currently known for advertisers to use publicly available information for targeting potential and publicly available clients. Additionally, advertisers may use the publicly available information for advertising merchandise that is likely to be of interest to a particular person.

INCORPORATION BY REFERENCE

[0006] The following Patent Applications, Patent Application Publications and Non-patent references are incorporated herein by reference in their entirety:

[0007] Gnanasambandam et al., U.S. patent application Ser. No. 13/104,136, filed May 10, 2011, entitled "System and Method to Produce and Control Subsidization of Targeted Materials at Point of Sale";

[0008] Gnanasambandam et al., U.S. patent application Ser. No. ______; filed ______, entitled "System and Methods to Use Context Graphs for Targeting Communications";

[0009] Lee et al., U.S. patent application Ser. No. 13/100,636; filed May 4, 2011, entitled "Method of Providing Targeted Communications to a User of a Printing System";

[0010] Gnanasambandam et al., U.S. patent application Ser. No. 12/780,543, filed May 14, 2010, entitled "System and Method to Prearrange Hyper-Local Value-Added Marketing Campaigns and Communication Along Consumer Trajectories";

[0011] Gnanasambandam, U.S. patent application Ser. No. 12/780,267, filed May 14, 2010, entitled "In-Situ Mobile Application Suggestions and Multi-Application Updates Through Context Specific Analytics";

[0012] Gnanasambandam et al., U.S. Patent Application Publication No. US 2010/0088178 A1, published Apr. 8, 2010, entitled "System and Method for Generating and Verifying Targeted Advertisements Delivered Via a Printer Device";

[0013] Gross, U.S. Patent Application Publication No. US 2010/0005486 A1, published Jan. 7, 2010, entitled "Apparatus and Method for Embedding Commercials";

[0014] Evevsky, U.S. Patent Application Publication No. US 2009/0313060 A1, published Dec. 17, 2009, entitled "System and Method for Personalized Printing and Facilitated Delivery of Personalized Campaign Items";

[0015] Chow et al., U.S. Patent Application Publication No. US 2009/0157650 A1, published Jun. 18, 2009, entitled "Outbound Content Filtering Via Automated Inference Detection";

[0016] Chow et al., U.S. Patent Application Publication No. US 2009/0150365 A1, published Jun. 11, 2009, entitled "Inbound Content Filtering Via Automated Inference Detection";

[0017] Gnanasambandam, U.S. patent application Ser. No. 12/761,985, filed Apr. 16, 2010, entitled "System and Method for Providing Feedback for Targeted Communications";

[0018] Liu, U.S. Patent Application Publication No. US 2011/0096354 A1, published Apr. 28, 2011, entitled "System and Method for Handling Print Requests from a Mobile Device";

[0019] Liu et al., U.S. Patent Application Publication No. US 2011/0040823 A1, published Feb. 17, 2011, entitled "System and Method for Communicating with a Network of Printers Using a Mobile Device";

[0020] Harrington, U.S. Patent Application Publication No. US 2011/0029952 A1, published Feb. 3, 2011, entitled "Method and System for Constructing a Document Redundancy Graph";

[0021] Gnanasambandam et al., U.S. Patent Application Publication No. US 2010/0325422 A1, published Dec. 23, 2010, entitled "System and Method for Policy-driven File Segmentation and Intercloud File Storage and Retrieval";

[0022] Partridge et al., U.S. Patent Application Publication No. US 2010/0309503 A1, published Dec. 9, 2010, entitled "Method and System for Printing Documents from a Portable Device";

[0023] Gnanasambandam et al., U.S. Patent Application Publication No. US 2010/0268591 A1, published Oct. 21, 2010, entitled "System and Method for Selectively Controlling the Use of Functionality in One or More Multifunction Devices and Subsidizing Their Use Through Advertisements";

[0024] Gnanasambandam et al., U.S. Patent Application Publication No. US 2010/0264214 A1, published Oct. 21, 2010, entitled "Method and System for Providing Contract-free `pay-as-you-go` Options for Utilization of Multi-function Devices";

[0025] St. Jacques, Jr. et al., U.S. Patent Application Publication No. US 2010/0149572 A1, published Jun. 17, 2010, entitled "Method and System for Automatically Providing for Multi-point Document Storing, Access, and Retrieval";

[0026] Edelman, B. and M. Schwarz "Optimal Auction Design in a Multi-Unit Environment: The Case of Sponsored Search Auctions," Unpublished manuscript, Harvard Business School 2006;

[0027] Edelman, B., Ostrovsky, M., and Schwarz, M., "Internet Advertising and the Generalized Second-Price Auction: Selling Billions of Dollars Worth of Keywords," American Economic Review, American Economic Association, Vol. 97(1), pages 242-259, March 2007;

[0028] Google AdWords Select, 2002 https://www.google.co/accounts/ServiceLogin?service=adwords&ht;

[0029] Myerson, R. B. "Optimal Auction Design," Mathematics of Operations Research, 6(1), pages 58-73, 1981;

[0030] Shazam Entertainment Ltd., 2002, http://www.shazam.com/music/web/pagesibackground.html,

[0031] Talluri, K. and van Ryzin, G., "The Theory and Practice of Revenue Management," Publisher Springer, 1.sup.st Edition, Feb. 23, 2005;

[0032] Ulku, L. "Optimal Combinatorial Mechanism Design," Unpublished Manuscript, Rutgers University, 2006;

[0033] Walsh, T., "Search on High Degree Graphs," In Proceedings of IJCAI-2001, pages 266-271, 2001.

BRIEF DESCRIPTION

[0034] In one embodiment of this disclosure, described is a method of providing user requested printed material and one or more targeted communications to a user of a printing system, the method comprising: (a) the printing system acquiring material to be printed on a printing device to produce the user requested printed material, the printing device associated with the printing system; (b) the printing system acquiring the one or more targeted communications associated with the user requested printed material; and, (c) the printing system printing the user requested printed material utilizing the printing device and providing the acquired one or more targeted communications to the user, wherein the one or more targeted communications are generated as a function of one or more attributes associated with one or more of the user requested printed material the printing device and the user, the one or more attributes associated with pattern ensembles included within a context graph, the context graph including a plurality of links between a plurality of entities related to one or more of user requested printed material, the printing device and the user.

[0035] In another embodiment of this disclosure, described is a printing system for providing user requested printed material and one or more targeted communications to a user of the printing system comprising: a printing device; and, one or more servers operating connected to the printing device, wherein one or more of the printing device and the one or more servers are configured to execute a method comprising: the printing system acquiring material to be printed on the printing device to produce the user requested printed material; (a) the printing system acquiring the one or more targeted communications associated with the user requested printed material; and, (b) the printing device printing the user requested printed material and the printing device providing the acquired one or more targeted communications to the user, wherein of one or more attributes associated with one or more of the user requested printed material, the printing device and the user, the attributes associated with pattern ensembles included within the context graph, the context graph including a plurality of links between a plurality of entities related one or more of the user requested printed material, the printing device and the user.

[0036] In still another embodiment of this disclosure, described is a method of providing a targeted communication to a first image rendering device operatively connected to one or more of a second image rendering device and comprising: (a) the image rendering device acquiring the target communication from one of the second image rendering device and the server; and (b) the image rendering device rendering the target communication on the first image rendering device, wherein the targeted communication is generated as a function of one or more attributes associated with one or more of the first image rendering device, and a user associated with the first image rendering device, the one or more attributes associated with pattern ensembles involved within the context graph, the context graph including a plurality of links between a plurality of entities related to one or more of the first image rendering device, the second image rendering device, and the user associated with the first image rendering device.

BRIEF DESCRIPTION OF THE DRAWINGS

[0037] FIGS. 1 and 2 are tripartite context graphs according to an exemplary embodiment of this disclosure.

[0038] FIG. 3 is a 5-partite context graph according to an exemplary embodiment of this disclosure.

[0039] FIG. 4 illustrates keyword interconnections associated with k1, k2, k3, k4, k5, k6, k7, k8, k9 and k10 as shown in the context graph of FIG. 3.

[0040] FIG. 5 is an exemplary embodiment of a context graph showing interactions among entities, for example, but not limited to, keywords and devices.

[0041] FIG. 6 shows a power-law in degree distribution of a context graph.

[0042] FIG. 7 shows un-weighted context graph A including a scaling exponent of 2.58.

[0043] FIG. 8 shows a weighted context graph A including a scaling exponent of 1.61.

[0044] FIG. 9 shows an un-weighted context graph B including a scaling exponent 2.6.

[0045] FIG. 10 shows a weighted context graph B including a scaling exponent 2.29.

[0046] FIG. 11 shows a sub-graph including a sequence of edges.

[0047] FIG. 12 shows a sub-graph including a loop.

[0048] FIG. 13 shows a sub-graph including a single-hop hub where k.sub.3 is the origin.

[0049] FIG. 14 shows a sub-graph including a two hop hub where k.sub.3 is the origin.

[0050] FIG. 15 shows a sub-graph including a 5-partite graph where k.sub.3 is the origin.

[0051] FIG. 16 schematically illustrates a system for context graph based targeting using a MFD according to an exemplary embodiment of this disclosure.

[0052] FIG. 17 shows a targeted communication produced according to an exemplary embodiment of this disclosure, the targeted communication including text ads with graphics from two of the advertisers.

[0053] FIG. 18 shows another targeted communication produced according to an exemplary embodiment of this disclosure, the targeted communication including categorized coupons.

[0054] FIG. 19 schematically illustrates Transactional Print Workflow according to an exemplary embodiment of this disclosure.

[0055] FIG. 20 illustrates a first example of a graphical pattern ensemble according to an exemplary embodiment of this disclosure.

[0056] FIG. 21 illustrates a second example of a graphical ensemble according to an exemplary embodiment of this disclosure.

[0057] FIG. 22 illustrates a third example of a graphical ensemble according to an exemplary embodiment of this disclosure.

[0058] FIG. 23 illustrates a general layered representation according to an exemplary embodiment of this disclosure.

[0059] FIG. 24 illustrates an example of two layer representations (only details of layer 1 shown) according to an exemplary embodiment of this disclosure.

[0060] FIG. 25 illustrates an example of a newly detected graphical pattern assembly of "similar" pattern to those of FIGS. 20-22.

[0061] FIG. 26 illustrates an example of an exemplary workflow including the application of pattern assemblies for a context graph case.

DETAILED DESCRIPTION

[0062] This disclosure provides systems and methods of using n-partite graphs in determining contextual information that can be used to characterize a user. Such characterizations can be used for personalization and targeting of information, such as advertisements, to the user. Examples for the nodes of the graph include keywords extracted from the titles of documents being printed, the user who prints the document and the device on which it is printed. Graph edges are entered connecting the document key words to the user and the user to the printing device, but other edges such as a connection between key words that occur together in a title can be entered as well. The disclosed systems and methods maintain a large graph that captures all the activities (e.g. document printing), but can extract sub-graphs for a user and uses the information within the sub-graph to target advertisements to the user.

[0063] Conventional targeting techniques have been largely keyword based or spatio-temporal without specific attention to expressivity that an ordering or structure of events can bring to the effectiveness of targeting. Extracting an expressive sub-graph from a large graph (e.g. >10 6 nodes and 10 9 edges) from an n-partite graph of events has not been well studied.

[0064] Capturing multi-dimensional tuples of information and the interconnections between events is structurally complex. More so because a large number of users are causing or contributing to these users directly or through the usage of devices (mobile, print or otherwise). These can be captured coherently by graph theoretical structures.

[0065] More information can be gained if the structure of these complex interconnections (i.e. in the way contextual information gets related amongst users, devices, locations, time and other attributes) is analyzed dynamically. This disclosure also discloses processes to perform that analysis.

[0066] In this disclosure, a method is provided wherein graph theoretic structures in a large graph or sub-graph thereof are utilized to express certain correlated user activities and/or event correlations so that they can enhance targeted communications to the users. In other words, graphical structures such as sub-graphs--chains of related nodes, loops, paths etc. & statistics on top these structures--degrees, hop-distance, edge weight, edge direction etc. can capture certain inherent connectivity and association related properties in a more fine grained fashion. These structures or the properties that govern them can be used to relate how the users interact with their environments and/or social groups. Of particular interest, at least, are the following types of contextual information and their n-partite graphical representation when a person interacts with a device: the person, keywords, categories or topics that relate such an interaction, social group of the person, spatio-temporal aspects of the interaction, usage frequency and combinations thereof. The disclosed system and method capture and store these associations in real-time and represents them as n-partite graphs. Subsequently, sub-graphs are extracted dynamically according to a method described herein. The overall context graph resulting from the collection and storage of the aforementioned attributes is huge e.g. millions of vertices and edges. Also provided is a method to characterize the context graph and methods to use the graph with the aforementioned targeted communication application.

[0067] Complex graphs can capture interactions between events and contextual information that relate to a large user-population as and when the users interact with the environment and the multitude of devices contained in them, for example, but not limited to, phones, e-readers, printers, kiosks and the like. The scale of the interactions while huge can depict some underlying structure. As will be described below, provided are some examples of such interactions exhibiting specific graph theoretic properties, such as the presence of hubs, scale-free nature, etc. Such properties and other structural elements can be utilized for targeting users with information and marketing material of various kinds.

Examples of Targeting Context Derived from Devices and Processes

[0068] Mobile, Phone, E-reader Devices, Printing Devices, Interconnected Printing Device (or MFD) Workflows etc.: Meta-data associated with the use of a device or a multitude of devices can provide targeted communications. While some devices have access to global positioning system coordinates (latitude, longitude), and cell/base station, others have sensor readings (accelerometer, magnetometer etc), and context from documents/web applications (or apps). One or more subsets of these can be utilized for automatically providing meaningful information to the users of the devices and processes.

[0069] Printed documents: When users print to devices (or use other types of devices in myriad ways), there is a lot of unused context. In the print world, this context includes keywords in the document name, location, time, user_id, device name, device capabilities used and such. Furthermore, when a group of people utilize a group of devices, there is yet another social dimension to context--such as the user names of all people printing to a given device, document details and keywords contained in document names, preferred printing times of people and such.

[0070] Transactional Prints: Credit card statements are one example of transactional documents. These contain not just spatio-temporal information such as when and where the goods and services were purchased and used, but also the relative importance of such goods to the users. This is valuable information which can be characterized as certain kinds of purchases or user behavior being adjacent to each other in time or space.

[0071] The above applications and combinations thereof are areas where targeted communications fit in well. However, these are only provided as exemplary embodiments of the present disclosure and it is to be understood that the disclosed system and method is not limited to these applications.

[0072] Substantially provided herein, at least, is a (1) system and method to extract, rank and unify sub-graphs from large n-partite graphs; (2) system and method to utilize the extracted sub-graph for targeting; (3) controlling the scalability of the system by search termination strategies & graph properties; and (4) methods to search and aggregate multiple types of targeted communication by dynamically relating meta-data from context graph with a content repository.

Multi-Partite Context Graph

[0073] Multi-partite context graphs can represent a multitude of entities (explicitly or inferentially revealed)--including and not limited to events, people, devices, locations, date-time, keywords, categories, preferences etc. These connections between the entity types indicate actions performed by one or more users as dictated by a design choice. Furthermore, the weights of connections of entities indicate frequency of involvement of entities in an action.

[0074] For purposes of this disclosure, the meaning of a context graph is provided herein through an example in the document printing world, although the application of context graphs as provided herein is easily extended to other domains. In the printing world, context in the form of keywords, time-stamps, locations and various subject categories, such as automobiles, banks, investments etc., are generated collectively by a group of people by the act of printing, or using other rendering devices. The extracted keywords may be from job names or from document content. These attributes associated with the printed document can be used to illustrate the concept of a context graph as shown in FIG. 1.

[0075] With reference to FIG. 1, a tripartite graph is shown including a set of keywords K extracted from attributes associated with the printed document, such as the name of the document and other related information with or without extracting information from the document, a set of people P, and a set of devices D. Each link in the graph shows one of two things--either a person p.sub.i used a keyword k.sub.i or a person used a device d.sub.i. Each unique p.sub.i, k.sub.i; or d.sub.i appears only once in the graph. Furthermore, the edge weights are utilized to denote the frequency of the aforementioned semantics relating to the usage.

Associating Meanings with Sub-Graphs and/or the Whole Graph

[0076] With reference to FIG. 2, shown is a graph including additional more semantics--if a person used two keywords in one usage of the device, such as one printout, one fax or scan etc., a link is provided between the two keywords as shown in the top layer. Note that FIG. 2 is not the transitive closure of FIG. 1. Notably, such relationships that can be derived with transitive closure are left as a matter of choice and something that may be done on the fly for relatively simple relationships or offline for relatively lengthier chains to which more semantics may be assigned.

[0077] This concept of tripartite context graphs is easily extended to n-partite graphs that can describe more semantics in addition to the above. For example, FIG. 3 shows the added set of C and N to FIG. 1, where c.sub.i may indicate the type of usage such as printing or faxing and n.sub.i indicates a feature corresponding to the devices. Graph-based models are chosen for the expressivity for the complex webs of information that can associated with keyword usage and context generation. But as for the term context graphs in the reminder of this disclosure, the disclosure does not differentiate between graphs that are n-partite, their transitive closures or something in between as shown in FIG. 2.

[0078] FIG. 4 illustrates keyword interconnections associated with k1, k2, k3, k4, k5, k6, k7, k8, k9 and k10 as shown in the context graph of FIG. 3.

Characterizing Context Graph Structure

[0079] The structure and properties of a context graph are fundamental to the disclosed system and method, initially explained are some findings related to the graphs examined. Context generation has yielded several large graphs with heavy keyword interactions that are shown in FIG. 5, according to one example. Entities that are not associated with others by the users show up as isolated points. NOTE: Each "o" is a discrete entity.

[0080] The example of FIG. 5, as well as others not illustrated, from the device world of printers show the presence of hubs. These hubs show that there are some entities in an action that are heavily connected. For example, characterized are two examples. A first example with .about.2600 entities (.about.3.38 million feasible edges) denoted as graph A, and a second example with .about.4200 nodes (.about.8.82 million feasible edges) denoted as graph B. The edges further have weights based on the frequency with which pairs of entities, i.e. keywords, devices or both, were selected in combination. The graph as illustrated in FIG. 6, actually exhibits power-law characteristics for its degree distribution i.e. if X represents the random variable degree of a given node, P(X.gtoreq.x) actually represents a power-law of the form x.sup.-.alpha.. More precisely, the power law is believed to have an exponential tail in some cases that causes the behavior to taper off faster at higher degrees.

[0081] The power-law coefficients can be separately computed for the weighted (x.sup.-.alpha.1) and un-weighted (x.sup.-.alpha.2) graphs. A large difference may indicate that there is a significant positive impact that is created by factoring the repetitive social behavior i.e. clustering around common keywords, i.e. jargons, in an organization. For example, FIG. 7 shows un-weighted context graph A including a scaling exponent of 2.58; FIG. 8 shows a weighted context graph A including a scaling exponent of 1.61; FIG. 9 shows an un-weighted context graph B including a scaling exponent 2.6; and FIG. 10 shows a weighted context graph B including a scaling exponent 2.29.

[0082] In some cases, such as Graph B, the weighted context graph exhibits an exponential tail as shown in FIG. 10. In other words, P(X.gtoreq.x) is proportional to x.sup.-.alpha. e.sup.-.beta.x (where the exponential term overwhelms the power law for a large x. During the analysis of contextual information in an organizational context, it is plausible that the mass usage of jargons and terms trail off more rapidly beyond a certain level of popularity, perhaps due to group size, seasonal effects etc. While it has been observed that such faster than power-law trail off phenomenon does occur, the exact causes are not the subject of this disclosure. This disclosure, and the system and method provided herein leverage the detected properties. One example of the property detected being power-law structure and faster than power-law trail off in regions of the graph.

Method for Sub-Graph Extraction from Context Graph

[0083] In this section, described are various ways sub-graphs can be extracted from a context graph to provide targeting. In addition, described are the generation of the context graph and how it is used in real-time.

Step 1: Graph Maintenance

[0084] Initially, a context tuple, e.g. a row of comma-separated context, is received by a targeting engine, and each of the entity types in the tuple is recognized by the context graph algorithm associated with the targeting engine. Either explicitly from data or as the result of an inference, the different entities associated with the context tuples are classified. The inference can be based on a machine learning algorithm, for example a known supervised classification technique such as those based on Support Vector Machines or Singular Value Decomposition. After the context tuple's various entities are recognized, they are assigned to one of the n layers of the n-partite graph and edges and weights are updated.

[0085] Further semantics may be assigned as an extension for each given tuple. The direction of each edge may be altered given the transaction's dependency relationships or ordering. It is the responsibility of the tuple provider to specify the directionality or ordering of the entities at the time the context is provided for producing a targeted message. Each side of edge has a directionality probability which is an aggregate of the set (or any reasonable sub-set) of users assigned to the graph. So an edge directionality probability is denoted e(p.sub.a, p.sub.b, p.sub.ab) where p.sub.a+p.sub.b+p.sub.ab=1, and {a,b} refers to the edge between node a and node b. The graph itself is stored on any scalable platform such as a distributed database. According to one exemplary embodiment, a non-relational distributed database, namely Hbase can be used.

Step 2: Retrieving a Sub-graph and Finding the Dominant Sub-graph

[0086] Various graphical structures such as sub-graphs --chains of related nodes, loops, paths, hubs etc. & statistics on top of these structures--degrees, hop-distance, edge weight, edge direction etc. can capture certain inherent connectivity and association related properties in a fine grained fashion. These structures or the properties that govern them relate to how the users interact with their environments and/or social groups, as it is the activities that are utilized to make the graph in the first place. Of particular interest are at-least the following types of contextual information and their n-partite graphical representation associated with a person interacting with a device: the person, keywords, categories or topics that relate such an interaction, social group of the person, spatio-temporal aspects of the interaction, usage frequency and combinations thereof.

[0087] Retrieving a sub-graph relates to extracting a relative smaller graph, i.e. sub-graph, where the keyword from the query is contained in a designated portion of the sub-graph e.g. center, 2.sup.nd layer etc. Such a keyword is called the origin (w.r.t. to the sub-graph retrieval). According to one exemplary embodiment, the graphical substructures shown in FIGS. 10-15 are of particular interest without any limitations on other substructures that may be added.

[0088] Finding a dominant sub-graph relates to ranking the retrieved sub-graphs to find the most suitable sub-graph. Essentially a metric is computed with the properties of these sub-graphs to determine a quantitative ranking. The metric can be based on degree, edge weight, edge direction, number of nodes in a loop, hop count etc. Other constraints can be associated with a specific targeting purpose. Usually one of the keywords, denoted the origin, will be contained in the sub-graphs. If multiple keywords are part of a query, then multiple sub-graphs are retrieved iteratively. The resulting set of dominant sub-graphs are then merged based on common nodes. If there is no common node between a pair of origin keywords, the dominant sub-graphs are kept as disjoint pieces at this stage.

Step 3: Utilizing Structure of the Overall Context Graph and Sub-Graphs

[0089] For sub-graphs that do not have any overlapping entities, the structure of the overall graph is used. The overall context graph is known to exhibit hubs as previously discussed with regard to the power-law structure. Therefore, the strategy is to seek out hubs to connect the seemingly disjoint sub-graphs. The overall context graph is also known to exhibit other characteristics around the path lengths between nodes Notably, diameter and path lengths can be characterized as other relevant metrics.

[0090] Initially, the nearest hub associated with the sub-graph is determined. In most cases, this will be just a few hops away because of the scale-free structure. In other cases, the search is terminated owing to intractability. In this case, both sub-graphs are used to identify the nearest hub(s). The hubs may have some latent relationship with the sub-graphs. The revised unified sub-graph is the two sub-graphs joined by the hub-node. In this way, two seemingly dissimilar entities such as Malaria and CNN for example, can be connected, perhaps because of the airing of a Malaria program on CNN. Such connections become important when two sub-graphs seem very different. Notably, the above strategy is only done to discover a set of latent connections between sub-graphs.

[0091] Often owing to the scale of the context graph stopping criteria for search and traversal becomes important. The following are some conditions that may be used: hop-count from origin (most common); reversal in edge direction, hitting a hub, accrued edge weight along direction of traversal, number of layers of the graph traversed etc. Notably, sometimes Step 3 may be skipped without any loss of continuity to the method.

Step 4: Using Sub-Graph Outputs for Targeting

[0092] Once the sub-graphs are unified or kept disparate, a query is constructed and sent to an aggregator of information. The query may be broken into sub-queries to reflect the different ways to offer importance to the structure of the sub-graph(s) identified. The simplest way is to send the high-degree nodes from the sub-graph along with the most recent input that was received for targeting. Another way is to send all the nodes in the sub-graph in a sequence and to prioritize the results that contain the origin entities, e.g. keywords. The sub-graph will dictate the order of the sub-queries that are iteratively made to the aggregator.

Step 5: Retrieving Relevant Documents Using Sub-Graph Output

[0093] In the previous step, mentioned was sending a query, possibly composed of sub-queries, to an aggregator of information, e.g. coupons, advertisements, news feeds etc. Checking the relevance of retrieved documents is performed in this step. Relevance is performed by counting the number of entities that exist in the result and matching it with the sub-graph that is delineated in Steps 2-3. For result documents that are in text form the relevance is computed directly by keyword matches. For image/audio/video results the relevance is computed on meta-data, tags, transcriptions, translations etc. The filtered and ranked results are returned to the user in the format of choice. An alternate way is to use the redundancy observed in retrieved documents for the purpose of ranking. See U.S. patent application Ser. No. 12/533,901, by Harrington, entitled `method and system for constructing a document redundancy graph," filed Jul. 31, 2009.

Implementation Details

[0094] In this section, briefly outlined is how to build the context graph and how to search and maintain it. Explained also is how an incoming job can be monetized with advertisements and credit is stored for various verification actions by the user. In the development of this system, utilized was the open source Hadoop/MapReduce/Hbase framework from the Apache Foundation to aid in computation and prototyping.

[0095] "Printing" is an overloaded term and can be meant to interpret sending a document or piece of information from one device to another.

[0096] Step 1: User prints (or scans faxes) to/from an MFD. Each MFD receives the metadata associated with the print (or other use). Each MFD is usually a data node of the Hadoop cluster. Upon receipt of metadata, the MFD calls contextual_ad_fetch( ) which is a remote call to the server component, typically residing with the name-node of the Hadoop file system. The MFD that receives the print job/stream caches the job temporarily so that the advertisements can be fetched.

[0097] Step 2: contextual_ad_fetch( ) first calls ContextGraph_Create_Maintain( ) As every user prints (or uses the network of MFDs), the keywords and context identifiers (time of day, location, username etc.) are extracted. Now is introduced an edge of vertex in the graph context graph G(E,V) where E is the set of edges and V is the set of vertices. Note that weights of the edges have to be recomputed as well.

[0098] a. For each context item that is extracted, update the n-partite graph. Generally speaking there may be n types of context. In the prototype, assumed was a tripartite graph with the constituent part being keywords, usernames and devices.

[0099] b. If any keyword is not present in the graph, it is added as a new vertex. Likewise for username and device.

[0100] c. If the keyword is already present, added is an edge between the username that printed it and the keyword printed. Also, an edge is added between the username and the device that the job was printed on. The above process is repeated for all context items that are extracted. Up to this step, nodes and vertices of the tripartite graphs have been created.

[0101] d. Next, a link is introduced between each pair of keywords associated with the current job. This means that the user associated those keywords together and attached some semantics to it in current domain of the users' operation. According to this example, only the job name was used for this step. Other similar metadata from the document may be extracted from the document, but only sparingly owing to tractability issues. This creates links in the keyword layer on the fly.

[0102] e. Then for each edge that got newly created, the weight is set to one. If there is an edge already between the two vertices under consideration then the weight is increased by one. The latter condition occurs when such a linkage was established by another user or the very same user as the current user in a prior print job. In this way, the graph is created, updated and maintained after every job.

[0103] f. From a storage perspective, the graph is stored as co-keywords, co-user-device and co-user-keywords where together with the edge weights, provided is a social context graph. The graph creation overhead is the same as update overhead as the graph creation is done as several iterative updates. Update overhead is estimated as follows. Complexity of finding an item in the ordered list is O(log n). Adding a new element that is not there is O(1). If the element is already there, computing or updating the weights requires another O(log n). This process has to be done for about 5 keywords on average. So the update complexity is 10*O(log n) where n is the total number of keywords.

[0104] Step 3: contextual_ad_fetch( ) then calls ad_fetch( ) based upon information in the graph, i.e. history, and those in the current context, i.e. current job. The aim of this step is to maximize the chance of advertisements and then getting the most relevant advertisements. The advertisements could be obtained by contacting an advertisement aggregator. Since it is not a human that is actively querying the advertisement aggregator, it is important to send a list of most relevant context.

[0105] Step 4: Once the ads are obtained from the advertisement aggregator, the available real-estate, both electronic and paper, is populated with the best advertisements. The advertisements that essentially have greater relevance, for example, computed relevance based on the occurrence of queried keywords in the advertisements, are essentially populated at more conspicuous areas such as the banner pages and the UI (User Interface) of the device. Once the ad-filled real-estate is prepared, the rest of document that was cached in step 1 is printed along with the ads and/or coupons. According to a prototype, it takes about 2-4 seconds to fetch the targeted ads. An additional couple of seconds may be required to meld the ads with the customer job in novel ways.

[0106] Step 5: Some advertisements may also be sent to the users' desktops through the print-driver so that they can be clicked on or used at a later point. Alternately, some advertisements may be cached at the device for use at a subsequent time. If an advertisement is clicked upon at the users' workstations, the advertisement identifier and user are captured and transmitted to the device. The device(s) track and store this information and from time-to-time transmit it to the advertisement aggregator. Printed coupons or other hard copies of advertisements that the users can carry with them to a store or vendor include a barcode, micro-text or glyph embedded in the image, so that a vendor can track the source of the advertisement. These methods enable the tracking of printed or non-printed targeted messages and provide verification to the advertiser or advertisement aggregator.

[0107] A diagrammatic representation of one system according to this disclosure is shown in FIG. 16.

[0108] Step 1. According to this embodiment, a user 150 prints a document. Printing a document includes using any device that may have a so called "print button" to invoke the print action. Information including name/login of person printing, credentials, location (GPS coordinates), print account details, preferences on the printer, preferences for the document to be printed including number of pages or page ranges, color/monochrome, document content (words and images with the document) are provided when invoking the print operation. The GUI may contain the print button and collect these pieces explicitly according to an opt-in model with respect to the information. Also the information may be stored in a profile page on a server. Content internal to the document that is being submitted for printing may be restricted from use by the customer that owns the print job. Printing can also mean printing to a driver such as, but not limited to, an Adobe PDF Printer which will perform the document conversion service by accepting the print job.

[0109] Step 2. A service (on the cloud, network or any other known service such as CUPS--Common Unix Printing Service) takes the printed job and other information from the user that is printing.

[0110] Step 3a. Automatic Ad Fetch and Ranking Service.

[0111] First context is extracted. The service will extract the context (such as the examples about the user, device, location, time, document above) from the print job. From the context it applies some business rules or algorithms to extract the query that will be used for fetching ads automatically. An optimal subset of the context is used for generating a query. This subset is a function of time, location, user, the device, the service provider (i.e. whether the printing service is in the airport vs. mall vs. office vs. home), past history of printing etc. Relevant context is extracted in the form of a subset of data. Many such candidate subsets may be unearthed.

[0112] Step 3b. This context subset search is facilitated by the context graph that is stored in a database (the well known hBase in our case), where history of prior context are stored as a context graph explained before.

[0113] After the subset is extracted, it is sent to an ad-server that will provide some ads.

[0114] Step 4. Once the ads are received back by the ad-fetch and ranking service 154, the service performs relevance ranking according to settings and business rules within the ad fetch and ranking service. The results are called relevant advertisements or messages. These are passed back to (152) the print server.

[0115] Step 5. (152) This step merges the document to be printed (or converted to other forms if printing really means document conversion) with ads and messages and then sends them to another application. The other application can be, for example, a phone, an email server, an e-book reader or in our embodiment a printer or printer GUI.

[0116] Step 6. Depending on the device click-through is defined. Click-through are actions relating to clicks or click-like or view-like activity that is recorded and transmitted to the advertiser that sent the ads to the ads-fetch and ranking service (154). Click through only relates to those individual ads or messages that were acted upon by the user in some way.

List of Locations and Real-Estate (Paper and Non-Paper)

[0117] 1. Banner page

[0118] 2. Several Places within Document (margins, sides, interspersing etc.)

[0119] 3. MFD UI-EIP+Screen

[0120] 4. Via Print Driver to User Computer

[0121] 5. Windows Application including browser

[0122] 6. Mobile Destinations including e-reader, laptops etc.

[0123] FIG. 17 shows a targeted communication produced according to an exemplary embodiment of this disclosure, the targeted communication including text ads with graphics from two of the advertisers.

[0124] FIG. 18 shows another targeted communication produced according to an exemplary embodiment of this disclosure, the targeted communication including categorized coupons.

[0125] FIG. 18 refers to a one embodiment of a personalized ad or coupon listing in a categorized fashion that is again custom generated for the user or customer. FIG. 18 may be printed or may be viewed on a GUI or viewing device, and provides coupons or ads as images relating to 4 categories.

[0126] (32) and (34)--Business A and Business B coupons 32 and 34 are in the clothes/shoes category.

[0127] Business C and Business D coupons 36 and 38 are in a different vacation category.

[0128] Both the categories chosen and the individual deals/ads chosen are personalized to the user view of this listing. Service 154 of FIG. 16 generated this listing.

[0129] The personalized ad/coupon listing also contains some images personalized to the user. The content within the picture may be enhanced to appeal to the user--such by including text or faces of people known to user or color preferences etc. Also included are tracking elements for paper or electronic forms such that an authorized device can use the tracking material (such as bar codes or QR codes) and know who utilized the personalized ad or coupon.

[0130] FIG. 19 shows another use-case in a production printing environment where the aforementioned method can be utilized. Alternately, the pdfs could be just viewed on screen as opposed to printing.

[0131] Described is a large workflow with targeted personalization inserted in to the workflow stream through the context graph technology.

[0132] 180: Online or printed statement generation requests are initiated. This starts the process of providing credentials, transaction information (such as those in the statements) provided to the printer/electronic publisher.

[0133] 182: The web (182) sends extracted context (context refers to content like that explained in FIG. 16 but extracted per user or per transaction for a user--usually one line in say a credit card statement) and send to (184) a targeting service (equivalent to ad-fetch and ranking service 154 in FIG. 16).

[0134] 184: The process fetches the ads per user per transaction and ranks the relevance and applicability of the ads

[0135] 186: The process merges ads and per user content into the stream. Stream is said here because in the workflow we have many statements corresponding to a multitude of users. 186 also applies image color manipulations or document conversions or device format conversions for rich look and appeal w.r.t device or document format.

[0136] 188: Useful information, recommendations, coupons, ads or messages that are generated by 186 are transmitted by web to produce a new stream 190

[0137] 190: A new print stream is augmented with extra content.

[0138] In addition to the methods and systems discussed hereto, this disclosure provides methods to detect, learn and/or correct normal/anomalous behaviors in workflows and other systems that can be represented as state transitions. For example, many communications can benefit from the detection and learning of target system, consumer or workflow behaviors. These include consumer activities near devices or printers, printing and other document workflows. In addition, the detection of normal/anomalous behaviors of a printing system workflow, etc. can help for servicing of the system. This disclosure maps states traversed in a workflow into two dimensional patterns on grid in which the originating states are labels on the rows and destination states are on the columns. A path represents the allowable behavior in the system. Anomalous behaviors are easily detected by human inspection; faulty behaviors can be detected and corrected. Collections of patterns can be represented as a stack of tiles. Also included are algorithms to detect similar patterns across tiles and also comprises a system on a computational cloud.

[0139] According to one exemplary embodiment, a system identifies categories of digital information (including kinds of bugs)--specific classes of which we call graphical, auditory or computational pattern ensembles or, informally, digital fingerprints. Digital fingerprints are used today to identify individual files such as songs. The disclosed system identifies categories of those fingerprints and makes them searchable in a manner that is easier for both humans and computers to comprehend.

[0140] Finding patterns needs to be simpler (e.g. digital fingerprints that are visual or auditory), which can subsequently be corroborated by an automated service (e.g. a knowledge as a service feed). It is becoming common for businesses to collect vast amounts of information from the systems they develop and subsequently analyzing that data. But those analyses are often complicated and time consuming and require expensive specialists. Importantly, they do not involve the prior observations of specific visual or auditory patterns (plain symptoms & conditions included) at the customer site.

[0141] Fine grained real-time analysis as capability to find opportunities or predict scenarios is absent: It is important to capture how people are using our systems and services in a more real-time fashion. This can lead to the prediction of user-behaviors, side opportunities, co-related bugs and market trends by tapping into the combined knowledge of customers and service providers, for example the non-subjective digital finger-prints that are shared knowledge.

[0142] This disclosure provides an automatic (unsupervised) method of generating categories for digital data and identifying which digital data falls into those categories. Specifically, provided are techniques for the identification, categorization and the search of similar looking (graphical) or sounding (auditory) pattern ensembles that both the consumers of the service and the provider(s)/technicians can relate to. These pattern ensembles go beyond the mere manifestation of symptoms and conditions in the sense that they can be described without subjectivity and without going through painfully enormous logs. For instance, graphical pattern ensembles are visual/readable by lay-humans--so they can be simultaneously sophisticated for trained users (i.e. in terms of the meaning it conveys) and simple for untrained users (to read, internalize and describe). Further they utilize techniques from human vision and cognition and the assumption that even untrained humans can internalize normal and abnormal system workings upon repeated use. Computational pattern ensembles (such as the underlying graph-theory structures) are readable and searchable by computer programs and can be utilized to characterize system or workflow behaviors. Some examples of where this technology can be used directly or as a means for tracking usage are as follows: consumer behavior directed to targeted communications when the consumer interacts with a multitude of devices; comparing release and robust workflow signatures; usage signatures for robust/fragile combinations of services; particular kinds of error classification that aid easy description by humans; increasing software conformance to user needs etc. The technology can also be used as a tool for proactive/reactive analysis but more interestingly as a tool for aiding objective communication at the time of troubleshooting.

[0143] Shazam is a server-based application. If you are listening to a song and you want to identify it (or the artist), you fire up the app on your mobile phone, hold the phone near the music, and within a few seconds, the name of the song and the artist are displayed on your phone. Shazam can do this because each song has a "fingerprint," a unique pattern of data that makes it identifiable and easy to search for (See http://www.shazam.com/music/web/pages/background.html). Such digital fingerprints can be used to create categories of digital data. This disclosure provides a method of generating categories for digital data fingerprints with one or more of the following features:

[0144] The detected pattern (or family of patterns referred to as an ensemble) can have a graphical structure (e.g. W pattern, Z pattern, M pattern etc.) or have a tone based correlation (such as the collection of tones heard when dialing a telephone number). These patterns can be seen or heard so that a customer can develop familiarity with normal and abnormal operating conditions of a service or system (i.e. internalization of the system workings). Such patterns in the past would be compiled as an enormous non-volatile-memory (NVM) log analyzed by specialized models or tools.

[0145] In addition to the above human interface, the detected patterns are computer searchable through a graph-theory based representation.

[0146] Instead of describing subjectively what one sees or hears when trouble-shooting an abnormal condition, the technology creates a visual or auditory signature to avoid confusion and aid objective communication.

[0147] It also provides a method of searching for and finding which of these categories new (incoming) information falls into. This is most effective if data is gathered in the cloud--the checkpoints/event information is available in one place (data is complete). According to one exemplary aspect is the concept of graphical and auditory pattern ensembles (a type of digital fingerprint) that help visual or auditory correlation of patterns in several dimensions.

[0148] One simple example is from the fields of printing system workflow and associated software (e.g. Xerox FreeFlow Print Server) where a series of human/automated steps are required to complete the creation of an artifact, physical or logical. The system generates a category that identifies when a user employs a two-up-for-one imposition in combination with a centered watermark and a resized art box in a PDF document and map that combine to a particular graphical or auditory pattern. In this way, the system automatically, and on an ongoing basis, matches new data with those fingerprints to rapidly identify bugs, bug severity, and notify the appropriate teams before a customer encounters a problem with greater severity. Below are described in detail customer pain points where the disclosed method/system can be utilized.

Fragile Workflows

[0149] The first basic idea is that every element of a book production workflow affects every other element of a book production workflow and this fact makes workflows extremely fragile. Consequently, the operators only change the workflows when they absolutely have to. Unfortunately, they always have to; see below.

[0150] This means that any change to any physical or software-setting part of the workflow will either cause the document to print incorrectly or cause the workflow to stop operating. The following list contains examples of changes that potentially disrupt workflows frequently at customer sites.

[0151] (a) Changing the paper's grain direction.

[0152] (b) Changing the model of printer being used.

[0153] (c) Changing the stops on the finisher.

[0154] (d) Changing the imposition.

[0155] (e) Changing the content of the original document (not the size or position of the content).

[0156] (f) Using an on-demand version of an original document will always break a workflow at some point in the future.

[0157] (g) Using different creative applications to send documents into a workflow. E.g., if you create a document using a first application and a second application including the exact same document with the exact same content, and you emit Postscript (or PDF) from each application using the same driver, there are situations where one RIPs and the other fails to RIP.

[0158] (h) Changing the weight of the paper used in the workflow.

[0159] (i) Changing the cover lamination procedure.

[0160] In short, changing pretty much any workflow element or any part of the target document or its instructions, may require changing at least one other workflow element or part of the original document. Furthermore, changing that second element may mean that a change is required in yet another element. Note that frequently a workflow is never really in its final version. Once a functioning workflow is established, workflow operators routinely are be forced to modify it because of the nature of the work--the evolving customer requirements, supplier constraints, content and format of the original document, which change frequently, and the workflow technologies --compel these changes.

[0161] One example of preventing the kind of problems described above, according to this disclosure, is the generation of a Release Finger Print. In other words, a set of pattern ensembles connected with a certain release of a workflow. Subsequently, the release finger print is used to govern the migration of one release to another.

Each Site is Entirely Unique (Even if they Seem the Same)

[0162] The situation is further complicated by the fact that:

[0163] (a) Required changes to identical workflows can differ from one customer site to another customer site, even within a single company.

[0164] (b) The effect of a workflow change on the output of identical workflows can differ from customer site to customer site, even within a single company

[0165] (c) The changes necessary to fix a broken workflow can differ from customer site to customer site, even within a single company, even if the break was caused by the same change in an ostensibly identical workflow. This is due to at least slight differences in the software used at those sites or at least the configuration of the software at those sites; and different documents from different sources are being sent through the (ostensibly identical) workflows at the different sites.

[0166] A site-based fingerprint as disclosed can be used to govern replicating workflows from one site to another.

Automated Image Quality Problem Identification

[0167] Often there is difficulty uncovering the causes of image quality problems. With pattern ensembles, image quality problem categories are identified which allow the end user to solve a problem or to communicate specific error information to the appropriate technical support teams so that the correct help can be provided.

[0168] For example, a pattern ensemble can identify that the current image quality problem on copies is due to a dirty platen and the user can be notified to clean the platen to solve the problem. No call to technical support is necessary.

Consumer Behavior Identification

[0169] Graphical pattern ensembles can represent the structure of interactions between users, devices, environments and other external stimuli. For example, when a consumer interacts with a device through a kiosk, a mobile phone in a mall, patterns of interaction can be established. Furthermore, specific items of interest, times and/or places can be deduced. From such pattern ensembles, one can identify elements that provide clues about the person's likes and dislikes e.g. which aspects of the application get used; what aspects get a search query; where the consumer spends most time; what keywords get typed, etc. When such information gets collected over a large population, normal and abnormal patterns can be delineated. Normal patterns that are so delineated can be utilized as templates to search large content, meta-data or document repositories. Pattern ensembles provide the framework for aggregating and visualizing the search results.

[0170] As previously stated, this disclosure describes technique to indentify, categorize and retrieve a specific pattern from a collection of patterns. In particular, the concept of graphical pattern ensembles is used to deal with visually tracking and correlating patterns and changes thereof. This feature takes advantage of human vision workings and processes in human cognition. The disclosure also teaches how to categorize observed patterns into ensembles and how to layer them as a stack to deal with multiple dimensions. Layering or stacking is provided as one way to generalize the graphical ensemble concept to multiple dimensions.

[0171] Important aspects include the following: Graphical and auditory pattern ensembles that aid internalization of normal and abnormal symptoms of the system workings & enhance objective communication with the service provider at the time of trouble-shooting.

[0172] Graph theory based representation and formulation for searchability.

[0173] Capability to detect patterns by leveraging change detection capabilities of human vision and cognition.

[0174] Traditional alternatives include statistical models, machine learning, neural networks, etc. Graph theoretical models that capitalize on human capabilities are minimal. These models only depend on logs and model specifics. They fail to leverage the fact humans (customers) can internalize normal and abnormal aspects of system operation while dealing with a system, such as an MFD or other device, a document service, etc.

[0175] The patterns are shown through a visual interface, the auditory patterns & aids that are put in place for the customer to interact with the patterns.

[0176] This disclosed method deals with monitoring a large number of events (raw or aggregated) especially when human oversight is required because of the domain (e.g. large logistics operations, data centers, power stations), Turing-test like difficulties (e.g. connecting the dots in case of partial information) or complex decision making (e.g. major decisions that impact cost) are encountered. Of special interest are patterns that can be visually remembered or have auditory correlation, thereby providing use of special cases of analogues such as connected graphical patterns of phone numbers on a phone dial pad; collections of pictures of a landmark or tourist destination; tones heard when dialing a phone number; the sound of a telephone modem connecting to the Internet. Humans may be better able to respond to missing pieces of information that they have internalized over time. In this sense, the aim of pattern ensembles is to present human-in-the-loop decision making systems with shapes that humans could complete in a more natural fashion. The system attempts to associate a shape with pattern ensembles that humans can better internalize. At the same time, the pattern designs have a computer searchable representation.

Graphical Pattern Ensembles (GPE)

[0177] Graphical pattern ensembles are a family of graphical patterns that are easily identifiable by the human eye. These patterns are defined here as connected 2-dimensional shapes where a series of vertices are linked together by edges. These are called ensembles because two patterns may seem graphically close to each other (in terms of shape) when the corresponding attributes vary slightly (see FIGS. 20-22).

[0178] In FIGS. 20-22, we have two unique patterns--inverted-W pattern and a Z-shaped pattern. All inverted-W patterns shown in FIG. 20, FIG. 21 and FIG. 22 look alike--although they are slightly displaced laterally/vertically and/or stretched/contracted in one of the two dimensions. All the inverted-W patterns together are called a pattern ensemble. Note that there could be many more patterns that are not shown. An important attribute associated with these patterns is that these are graphical in nature and are identifiable as similar by observation by the naked eye without any analysis. Here, we leverage these visual similarities are leveraged to aid in monitoring and decision making. A few conventions used are as follows: [0179] a) The rows in FIGS. 20-22 are called originating events (0) i.e. a component where a condition or event is triggered; [0180] b) The column of the above figures refer to destination events (D) i.e. a component where an originating event concludes; [0181] c) The series of components S1 through Sn are arranged in such a way that neighboring components offer similar functionality. This however, is not a hard requirement.

[0182] Finding patterns as discerned by the human-eye is a central concept in this disclosure. By that we mean two things: [0183] a) First a 2D shape of some sort can be discerned by naked eye; [0184] b) If two similar patterns roughly at the same time (e.g. FIGS. 20 and 21 combination), then the region displaying the pattern shows a "glitch: on the screen.

[0185] This glitch can be simulated by rapidly alternating between FIG. 21 and FIG. 22 one after the other

[0186] Described now is a method to generate, represent and display the pattern ensembles.

[0187] A graphical pattern ensemble is a (database) searchable notation for a visible pattern that represents a layered collection of graphical symbols. In other words, there might exist a collection of graphical symbols or patterns whose geometry is split across multiple tiles or layers. Each tile k is represented by denoting a collection of vertices, where each vertex that qualifies has more than one edge incident on it. The pattern ensemble is a collection of tiles F.sub.1 through F.sub.q.

[0188] A graphical pattern ensemble F.sub.k for layer k comprising of m associated finger prints and the rolled-up ensemble F comprising of the layers:

F k = { ( ( f o 1 , f a 1 , f b 1 ) , ( k 11 , k 21 , , k n 1 ) , ( .theta. 11 , .theta. 21 , , .theta. n 1 ) , ( d 11 , d 21 , , d n 1 ) ) , ( ( f 02 , f a 2 , f b2 ) , ( k 12 , k 22 , , k n 2 ) , ( .theta. 12 , .theta. 22 , , .theta. n 2 ) , ( d 12 , d 22 , , d n 2 ) ) , ( ( f o m , f a m , f b m ) , ( k 1 m , k 2 m , , k n m ) , ( .theta. 1 m , .theta. 2 m , , .theta. n m ) , ( d 1 m , d 2 m , , d n m ) ) } ##EQU00001## F = { F 1 , F 2 , , F k , , F q } . ##EQU00001.2##

[0189] Here f.sub.oi stands for the origin point in layer i and there are m origin points in a layer, connected or otherwise. For each origin point f.sub.oi, f.sub.a; corresponds to another point in a layer above, adjacent or otherwise i.e. 1 through k-1. F.sub.bi likewise is for another point in a layer below k, adjacent or otherwise i.e. k+1 through q. The convention is to stack up all the tiles in increasing order with the lowest layer on the top of the pile. The collection of all these is the graphical pattern ensemble for q layers. In FIG. 23, there are 3 layers, namely S, T and U. Note that S is lexicographically the lowest layer in this case. Each event or feature in S is denoted Si and it is arranged as a square matrix. (Si) and (Si.+-.1) are more or less similar events or features which can be measured, observed, and/or check-pointed. Likewise, the other layers U and T that lie under S. (k.sub.1i, . . . , k.sub.ni) denote the co-occurrence frequency of tuples f.sub.oi and (d.sub.11, d.sub.ni) where i corresponds to the layer. f.sub.pq and d.sub.xy are tuples corresponding to points in layers q and y respectively. (.theta..sub.1i, . . . , .theta..sub.n') correspond to the angles between (f.sub.oi, d.sub.xi) and (f.sub.oi, d.sub.(x+i).sub.y) Note, with circular wrappings (x=n), x+1 is defined as 1.

[0190] With reference to FIG. 24, provided is an example that has two layers. Shown is one layer more prominently for clarity.

F1={((f.sub.o,f.sub.a,f.sub.b),(k.sub.1,k.sub.2, . . . , k.sub.n),(.theta..sub.1,.theta..sub.2, . . . , .theta..sub.n),(d.sub.1,d.sub.2, . . . , d.sub.n))}

where

[0191] f.sub.0={(S.sub.d, S.sub.e)} // a tuple denoting the origin

[0192] f.sub.a={ } // possibly several tuples denoting point in the layer above

[0193] f.sub.b={( ).sub.T1, ( ).sub.T4, ( ).sub.T9} // possibly several tuples denoting points in the layer below

[0194] (By convention, in clock-wise order starting from top-right quadrant)

[0195] k.sub.i stands for the event interaction frequency // edge weight

[0196] .theta..sub.i angle of separation between any two adjacent rays from f, that is (f.sub.o,d.sub.i) and (f.sub.o, d.sub.i+1).

[0197] d.sub.i destination tuples connected to rays originating from a given f.sub.o.

[0198] Note that there could be several rows for each F.sub.k depending on the number of multi-degree vertices in layer k. A set of such ensembles together is called a multi-layer graphical pattern ensemble. Notably, other similar ensembles can be searched out in the database in two broad ways:

[0199] Multi-layer search--take the entire multi-layer graphical pattern ensemble and search for other layers;

[0200] Layer-by layer search--take a layer of interest and conduct searches for similar patterns at different positions on the same layer or another layer where at any given time, there are just one or two layers being compared.

[0201] To characterize similarity, first considered is single layer similarity characterization. Subsequently, multi-layer similarity comparisons are performed as extensions of the single layer characterization. For single-point GPEs, the comparisons are simple variations of cosine similarity in vector algebra. Multi-vertex similarity determination algorithms, which in the special case of m=n=1, reduces to the single point GPE comparison.

[0202] The most generic similarity notion (where m>=n) is determining if every point of the sample, including a total of n points in its GPE matches some point in an original GPE including a total of m points. For this the algorithm is shown below:

TABLE-US-00001 Algorithm 1 .theta. so = g = 1 D d sg .fwdarw. d og .fwdarw. k s k o ##EQU00002## Similarity Determination Algorithm Inputs: the original GPE, the sample GPE, m .gtoreq. n, A is the set ordering operator Output: sim, the similarity of the sample to the original 1. for s = 1 to n; reset .theta..sub.s and .theta..sub.0 2. for o = 1 to m 3. compare .theta..sub.s and .theta..sub.0; record (.theta..sub.s - .theta..sub.0) 4. next o 5. record .theta..sub.s,min as min {(.theta..sub.0 - .theta..sub.s)} .A-inverted..sub.0.epsilon. {1, . . . , m} 6. Add o to set M.sub.0 7. next s 8. if .LAMBDA.(M.sub.0) = {1, . . . , n} and M.sub.0 .epsilon. {1, . . . , m} 9. compute sim = .SIGMA..sub.s=1.sup.n .theta..sub.s+min 10. end if 11. return sim.

[0203] Other kinds of similarity notions are possible, some based the order of M.sub.o, i.e., in what order the sample sequentially matches the original.

[0204] For example:

[0205] If (m=n)>1, .LAMBDA.(Mo)={1, . . . , n} and elements of Mo are added in strictly increasing or decreasing order then this is called contiguous similarity. Note that assumed is an increasing top-down approach for the layer while considering the sample.

[0206] Assuming equi-degree points in the sample and original (going clock-wise) are found

[0207] Both visual pattern matching and searching through known patterns automatically helps the operation under partial data as shown in FIG. 25.

[0208] For example, the partial data can be used in detecting problems faced with two dialects of a post-script language that can break workflows in a print-shop. While the two dialects may be grammatically correct, this will manifest in the GPE interface with or without the dotted line shown above in FIG. 25. Upon internalization, both the customer, assumed to be untrained in problem diagnosis, and the trained technician can narrow down the post-script dialect problem through the representation in FIG. 25. Previously, customers refrained from adjusting workflows, often with other adverse consequences, just because they did not want to break a fragile workflow.

[0209] According to one exemplary embodiment of this disclosure, identification, categorization and search of pattern ensembles is used for target communications associated with a user of a printing device. Running software on a cloud computing platform allows date to be collected at a single location. No date is lost. As the data is collected, a set of ongoing, massive data correlation operations can also be run to discover patterns.

[0210] According to another exemplary embodiment, running prepress software in a cloud provides a solution to some technical support problems:

[0211] By, for example, running workflow software on a cloud computing platform, the precise state of a workflow can be captured, as well as its associated document at the exact moment of job failure. This eliminates the need for technical support to reproduce the problem in its labs before taking action. All of the data about the happening is generated and captured, making debug relatively straightforward.

[0212] With reference to FIG. 26, illustrated is an example of an exemplary workflow according to this disclosure.

[0213] Step 1 (200): First form a comprehensive unique list of events relating to user activities (such as activities on the mobile device--searching, printing, providing input, responding to suggestions, social network access etc.) and back end server actions that facilitate user activities (such as suggesting a friend, such as running a query to identify conditions, storing output, tagging etc.). Particular set of entities in the context graph could lead to the definition of user activities and events. Even back-end server actions can be captured for use with pattern ensemble detection by extending the context graph to contain server side responses.

[0214] Step 2 (205): From these events granular components will need to be delineated that perform functions for the user or back-end server. Both events and components generated above could be utilized as the axes of the graphical pattern ensemble view. While generating the ensemble, for ease of viewing, the components and events are arranged in decreasing order of similarity. This will provide clues of whether the user is accessing similar or completely unrelated services.

[0215] Step 3 (210): Layout the components as axes in a given layer for representing the GPE in decreasing order of similarity (or some other metric). Use multiple layers if more dimensions are necessary. Components may also be connected across layers and viewed one layer at a time.

[0216] Step 4 (215): Monitor for real events as users utilize the system (such as printing a greeting card in uPrint using the mobile phone as interface). Monitor the components that facilitate the user interactions. Both user and system events are logged as a time-sequence. Frequencies of events and order of events are important as well. These will be utilized in generating the pattern ensemble.

[0217] Step 5 (220): The interaction of one event/component to another event/component represent a point and the transition to another point will be determined by the time-sequence that was logged in the previous step. Criteria such as only considering the interactions and sequence of events for a single person, group, time or frequency range, and other dimensions or layers are also included. Thus the GPE is constructed and over time refined depending on the parameters and filter criteria.

[0218] Step 6 (225): The GPE will actually be a group of interrelated patterns so long as a specific set of actions are considered. The GPE can be visualized and changes in a given layer will be discernable to the human eye as we go through the time-sequence of events. The GPE is machine searchable but one can also watch for glitches in the patterns for ease of use with condition or fault checking/monitoring.

[0219] Step 7 (230): From within the set of events and components, a relevant set of interactions may be filtered and all patterns governing the interactions between those components/events may be depicted and viewed layer-wise. Patterns will evolve over time and GPEs need to be updated. As a result the depiction, the ordering of events and components, the number of dimensions of quantities measured and tracked and hence the layers are some examples of entities that will change over time.

[0220] It will be appreciated that variants of the above-disclosed and other features and functions, or alternatives thereof, may be combined into many other different systems or applications. Various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.

* * * * *

Method for identification, categorization and search of graphical, auditory and computational pattern ensembles

Gnanasambandam; Shanmuga-nathan ; et al.

References