U.S. patent application number 13/109343 was filed with the patent office on 2011-11-17 for method for identification, categorization and search of graphical, auditory and computational pattern ensembles.
This patent application is currently assigned to Xerox Corporation. Invention is credited to Shanmuga-nathan Gnanasambandam, Jonathan David Levine.
Application Number | 20110279458 13/109343 |
Document ID | / |
Family ID | 44911388 |
Filed Date | 2011-11-17 |
United States Patent
Application |
20110279458 |
Kind Code |
A1 |
Gnanasambandam; Shanmuga-nathan ;
et al. |
November 17, 2011 |
Method for identification, categorization and search of graphical,
auditory and computational pattern ensembles
Abstract
This disclosure provides a system and method to use context
graphs for targeting communications to a user of an image rendering
device. According to one exemplary embodiment, the method provides
targeted communications generated as a function of one or more
attributes associated with a user requested printed document. The
attributes are provided by accessing a context graph including a
plurality of links between a plurality of entities.
Inventors: |
Gnanasambandam;
Shanmuga-nathan; (Victor, NY) ; Levine; Jonathan
David; (Rochester, NY) |
Assignee: |
Xerox Corporation
Norwalk
CT
|
Family ID: |
44911388 |
Appl. No.: |
13/109343 |
Filed: |
May 17, 2011 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61345377 |
May 17, 2010 |
|
|
|
61345301 |
May 17, 2010 |
|
|
|
61345340 |
May 17, 2010 |
|
|
|
61345289 |
May 17, 2010 |
|
|
|
Current U.S.
Class: |
345/440 ;
358/1.15 |
Current CPC
Class: |
G06Q 30/08 20130101;
G06Q 30/0238 20130101; G06Q 30/0251 20130101 |
Class at
Publication: |
345/440 ;
358/1.15 |
International
Class: |
G06T 11/20 20060101
G06T011/20; G06K 15/02 20060101 G06K015/02 |
Claims
1. A method of providing user requested printed material and one or
more targeted communications to a user of a printing system, the
method comprising: a) the printing system acquiring material to be
printed on a printing device to produce the user requested printed
material, the printing device associated with the printing system;
b) the printing system acquiring the one or more targeted
communications associated with the user requested printed material;
and, c) the printing system printing the user requested printed
material utilizing the printing device and providing the acquired
one or more targeted communications to the user, wherein the one or
more targeted communications are generated as a function of one or
more attributes associated with one or more of the user requested
printed material the printing device and the user, the one or more
attributes associated with pattern ensembles included within a
context graph, the context graph including a plurality of links
between a plurality of entities related to one or more of the user
requested printed material, the printing device and the user.
2. The method according to claim 1, wherein the context graph is an
n-partite scale-free graph.
3. The method according to claim 2, wherein the context graph
includes a historical representation of the plurality of links
between the plurality of entities.
4. The method according to claim 3, the plurality of entities
associated with one or more of events, people, devices, locations,
date, time, keywords, categories, preferences.
5. The method according to claim 3, wherein the context graph
includes weights associated with the plurality of links, the
weights related to a frequency of links between a pair of
entities.
6. The method according to claim 3, wherein the context graph is
dynamically updated.
7. The method according to claim 3, wherein the one or more
attributes are provided by one or more sub-graphs extracted from
the context graph.
8. The method according to claim 7, wherein the one or more
sub-graphs extracted from the context graph are associated with a
predefined structure.
9. The method according to claim 7, wherein the one or more
attributes are provided by a dominant sub-graph associated with the
one or more sub-graphs extracted from the context graph.
10. The method according to claim 3, wherein the one or more
attributes are provided by two or more sub-graphs connected by one
or more hubs associated with the context graph.
11. The method according to claim 1, wherein a verification action
is performed by the user, the verification action indicating the
user has read at least one target communication.
12. A printing system for providing user requested printed material
and one or more targeted communications to a user of the printing
system comprising: a printing device; and, one or more servers
operating connected to the printing device, wherein one or more of
the printing device and the one or more servers are configured to
execute a method comprising: the printing system acquiring material
to be printed on the printing device to produce the user requested
printed material; a) the printing system acquiring the one or more
targeted communications associated with the user requested printed
material; and, b) the printing device printing the user requested
printed material and the printing device providing the acquired one
or more targeted communications to the user, wherein of one or more
attributes associated with one or more of the user requested
printed material, the printing device and the user, the attributes
associated with pattern ensembles included within the context
graph, the context graph including a plurality of links between a
plurality of entities related to one or more of the user requested
printed material, the printing device and the user.
13. The printing system according to claim 12, wherein the context
graph is an n-Partite scale-free graph.
14. The printing system according to claim 13, wherein the context
graph includes a historical representation of the plurality of link
between the plurality of entities.
15. The printing system according to claim 14, the plurality of
entities associated with one or more of events, people, devices,
locations, date, time, keywords, categories, preferences.
16. The printing system according to claim 14, wherein the context
graph includes weights associated with the plurality of links, the
weights related to a frequency of links between a pair of
entities.
17. The printing system according to claim 14, where the context
graph is dynamically updated.
18. The printing system according to claim 14, wherein the one or
more attributes are provided by one or more sub-graphs extracted
from the context graph.
19. The printing system according to claim 18, wherein the one or
more sub-graphs extracted from the context graph are associated
with a predefined structure.
20. The printing system according to claim 18, wherein the one or
more attributes are provided by a dominant sub-graph associated
with the one or more sub-graphs extracted from the context
graph.
21. The printing system according to claim 14, wherein the one or
more attributes are provided by two or more sub-graphs connected by
one or more hubs associated with the context graph.
22. The printing system according to claim 12, wherein a
verification action is performed by the user, the verification
action indicating the user has read at least one targeted
communication.
23. A method of providing a targeted communication to a first image
rendering device operatively connected to one or more of a second
image rendering device and comprising: a) the image rendering
device acquiring the target communication from one of the second
image rendering device and the server; and b) the image rendering
device rendering the target communication on the first image
rendering device, wherein the targeted communication is generated
as a function of one or more attributes associated with one or more
of the first image rendering device, and a user associated with the
first image rendering device, the one or more attributes associated
with pattern ensembles involved within the context graph, the
context graph including a plurality of links between a plurality of
entities related to one or more of the first image rendering
device, the second image rendering device, and the user associated
with the first image rendering device.
24. The method according to claim 23, wherein the context graph is
an n-Partite scale-free graph.
25. The method according to claim 23, wherein the context graph
includes weights associated with the plurality of links, the
weights related to a frequency of links between a pair of entities.
Description
[0001] This application claims the benefit of priority to U.S.
Provisional Application No. 61/345,377, filed May 17, 2010,
entitled "METHOD FOR IDENTIFICATION, CATEGORIZATION AND SEARCH OF
GRAPHICAL, AUDITORY AND COMPUTATIONAL PATTERN ENSEMBLES," by
Gnanasambandam et al., U.S. Provisional Application No. 61/345,340,
filed May 17, 2010, entitled "OPTIMAL AUCTION MECHANISM FOR
MULTI-LEVEL DEVICE CLICK-THROUGH (DCT) IN TARGETED PRINT
COMMUNICATION," by Lee et al., U.S. Provisional Application No.
61/345,301, filed May 17, 2010, entitled "SYSTEM AND METHOD TO
PRODUCE AND CONTROL SUBSIDIZATION OF TARGETED MATERIALS AT POINT OF
SALE," by Gnanasambandam et al., and U.S. Provisional Application
No. 61/345,289, filed May 17, 2010, entitled "SYSTEM AND METHODS TO
USE IN-PARTITE SCALE-FREE GRAPHS FOR INTERPRETING CONTEXTUAL
INFORMATION AND TARGETING," by Gnanasambandam et al., all of which
are hereby incorporated by reference in their entirety.
CROSS REFERENCE TO RELATED PATENTS AND APPLICATIONS
[0002] U.S. patent application Ser. No. 13/100,636, filed May 4,
2011, entitled "Method of Providing Targeted Communications to a
User of a Printing System," by Lee et al.
[0003] U.S. patent application Ser. No. 13/104,136, filed May 10,
2011, entitled "System and Method to Produce and Control
Subsidization of Targeted Materials at Point of Sale," by
Gnanasambandam et al.
[0004] U.S. patent application Ser. No. ______, filed ______,
entitled "System and Methods to Use Context Graphs for Targeting
Communications," by Gnanasambandam et al. each of which is hereby
incorporated by reference in their entirety.
BACKGROUND
[0005] This disclosure relates generally to a system and method for
generating targeted advertisements, and more particularly to a
system and method for generating targeted advertisements delivered
via a printer assembly including at least one printer device. It is
currently known for advertisers to use publicly available
information for targeting potential and publicly available clients.
Additionally, advertisers may use the publicly available
information for advertising merchandise that is likely to be of
interest to a particular person.
INCORPORATION BY REFERENCE
[0006] The following Patent Applications, Patent Application
Publications and Non-patent references are incorporated herein by
reference in their entirety:
[0007] Gnanasambandam et al., U.S. patent application Ser. No.
13/104,136, filed May 10, 2011, entitled "System and Method to
Produce and Control Subsidization of Targeted Materials at Point of
Sale";
[0008] Gnanasambandam et al., U.S. patent application Ser. No.
______; filed ______, entitled "System and Methods to Use Context
Graphs for Targeting Communications";
[0009] Lee et al., U.S. patent application Ser. No. 13/100,636;
filed May 4, 2011, entitled "Method of Providing Targeted
Communications to a User of a Printing System";
[0010] Gnanasambandam et al., U.S. patent application Ser. No.
12/780,543, filed May 14, 2010, entitled "System and Method to
Prearrange Hyper-Local Value-Added Marketing Campaigns and
Communication Along Consumer Trajectories";
[0011] Gnanasambandam, U.S. patent application Ser. No. 12/780,267,
filed May 14, 2010, entitled "In-Situ Mobile Application
Suggestions and Multi-Application Updates Through Context Specific
Analytics";
[0012] Gnanasambandam et al., U.S. Patent Application Publication
No. US 2010/0088178 A1, published Apr. 8, 2010, entitled "System
and Method for Generating and Verifying Targeted Advertisements
Delivered Via a Printer Device";
[0013] Gross, U.S. Patent Application Publication No. US
2010/0005486 A1, published Jan. 7, 2010, entitled "Apparatus and
Method for Embedding Commercials";
[0014] Evevsky, U.S. Patent Application Publication No. US
2009/0313060 A1, published Dec. 17, 2009, entitled "System and
Method for Personalized Printing and Facilitated Delivery of
Personalized Campaign Items";
[0015] Chow et al., U.S. Patent Application Publication No. US
2009/0157650 A1, published Jun. 18, 2009, entitled "Outbound
Content Filtering Via Automated Inference Detection";
[0016] Chow et al., U.S. Patent Application Publication No. US
2009/0150365 A1, published Jun. 11, 2009, entitled "Inbound Content
Filtering Via Automated Inference Detection";
[0017] Gnanasambandam, U.S. patent application Ser. No. 12/761,985,
filed Apr. 16, 2010, entitled "System and Method for Providing
Feedback for Targeted Communications";
[0018] Liu, U.S. Patent Application Publication No. US 2011/0096354
A1, published Apr. 28, 2011, entitled "System and Method for
Handling Print Requests from a Mobile Device";
[0019] Liu et al., U.S. Patent Application Publication No. US
2011/0040823 A1, published Feb. 17, 2011, entitled "System and
Method for Communicating with a Network of Printers Using a Mobile
Device";
[0020] Harrington, U.S. Patent Application Publication No. US
2011/0029952 A1, published Feb. 3, 2011, entitled "Method and
System for Constructing a Document Redundancy Graph";
[0021] Gnanasambandam et al., U.S. Patent Application Publication
No. US 2010/0325422 A1, published Dec. 23, 2010, entitled "System
and Method for Policy-driven File Segmentation and Intercloud File
Storage and Retrieval";
[0022] Partridge et al., U.S. Patent Application Publication No. US
2010/0309503 A1, published Dec. 9, 2010, entitled "Method and
System for Printing Documents from a Portable Device";
[0023] Gnanasambandam et al., U.S. Patent Application Publication
No. US 2010/0268591 A1, published Oct. 21, 2010, entitled "System
and Method for Selectively Controlling the Use of Functionality in
One or More Multifunction Devices and Subsidizing Their Use Through
Advertisements";
[0024] Gnanasambandam et al., U.S. Patent Application Publication
No. US 2010/0264214 A1, published Oct. 21, 2010, entitled "Method
and System for Providing Contract-free `pay-as-you-go` Options for
Utilization of Multi-function Devices";
[0025] St. Jacques, Jr. et al., U.S. Patent Application Publication
No. US 2010/0149572 A1, published Jun. 17, 2010, entitled "Method
and System for Automatically Providing for Multi-point Document
Storing, Access, and Retrieval";
[0026] Edelman, B. and M. Schwarz "Optimal Auction Design in a
Multi-Unit Environment: The Case of Sponsored Search Auctions,"
Unpublished manuscript, Harvard Business School 2006;
[0027] Edelman, B., Ostrovsky, M., and Schwarz, M., "Internet
Advertising and the Generalized Second-Price Auction: Selling
Billions of Dollars Worth of Keywords," American Economic Review,
American Economic Association, Vol. 97(1), pages 242-259, March
2007;
[0028] Google AdWords Select, 2002
https://www.google.co/accounts/ServiceLogin?service=adwords&ht;
[0029] Myerson, R. B. "Optimal Auction Design," Mathematics of
Operations Research, 6(1), pages 58-73, 1981;
[0030] Shazam Entertainment Ltd., 2002,
http://www.shazam.com/music/web/pagesibackground.html,
[0031] Talluri, K. and van Ryzin, G., "The Theory and Practice of
Revenue Management," Publisher Springer, 1.sup.st Edition, Feb. 23,
2005;
[0032] Ulku, L. "Optimal Combinatorial Mechanism Design,"
Unpublished Manuscript, Rutgers University, 2006;
[0033] Walsh, T., "Search on High Degree Graphs," In Proceedings of
IJCAI-2001, pages 266-271, 2001.
BRIEF DESCRIPTION
[0034] In one embodiment of this disclosure, described is a method
of providing user requested printed material and one or more
targeted communications to a user of a printing system, the method
comprising: (a) the printing system acquiring material to be
printed on a printing device to produce the user requested printed
material, the printing device associated with the printing system;
(b) the printing system acquiring the one or more targeted
communications associated with the user requested printed material;
and, (c) the printing system printing the user requested printed
material utilizing the printing device and providing the acquired
one or more targeted communications to the user, wherein the one or
more targeted communications are generated as a function of one or
more attributes associated with one or more of the user requested
printed material the printing device and the user, the one or more
attributes associated with pattern ensembles included within a
context graph, the context graph including a plurality of links
between a plurality of entities related to one or more of user
requested printed material, the printing device and the user.
[0035] In another embodiment of this disclosure, described is a
printing system for providing user requested printed material and
one or more targeted communications to a user of the printing
system comprising: a printing device; and, one or more servers
operating connected to the printing device, wherein one or more of
the printing device and the one or more servers are configured to
execute a method comprising: the printing system acquiring material
to be printed on the printing device to produce the user requested
printed material; (a) the printing system acquiring the one or more
targeted communications associated with the user requested printed
material; and, (b) the printing device printing the user requested
printed material and the printing device providing the acquired one
or more targeted communications to the user, wherein of one or more
attributes associated with one or more of the user requested
printed material, the printing device and the user, the attributes
associated with pattern ensembles included within the context
graph, the context graph including a plurality of links between a
plurality of entities related one or more of the user requested
printed material, the printing device and the user.
[0036] In still another embodiment of this disclosure, described is
a method of providing a targeted communication to a first image
rendering device operatively connected to one or more of a second
image rendering device and comprising: (a) the image rendering
device acquiring the target communication from one of the second
image rendering device and the server; and (b) the image rendering
device rendering the target communication on the first image
rendering device, wherein the targeted communication is generated
as a function of one or more attributes associated with one or more
of the first image rendering device, and a user associated with the
first image rendering device, the one or more attributes associated
with pattern ensembles involved within the context graph, the
context graph including a plurality of links between a plurality of
entities related to one or more of the first image rendering
device, the second image rendering device, and the user associated
with the first image rendering device.
BRIEF DESCRIPTION OF THE DRAWINGS
[0037] FIGS. 1 and 2 are tripartite context graphs according to an
exemplary embodiment of this disclosure.
[0038] FIG. 3 is a 5-partite context graph according to an
exemplary embodiment of this disclosure.
[0039] FIG. 4 illustrates keyword interconnections associated with
k1, k2, k3, k4, k5, k6, k7, k8, k9 and k10 as shown in the context
graph of FIG. 3.
[0040] FIG. 5 is an exemplary embodiment of a context graph showing
interactions among entities, for example, but not limited to,
keywords and devices.
[0041] FIG. 6 shows a power-law in degree distribution of a context
graph.
[0042] FIG. 7 shows un-weighted context graph A including a scaling
exponent of 2.58.
[0043] FIG. 8 shows a weighted context graph A including a scaling
exponent of 1.61.
[0044] FIG. 9 shows an un-weighted context graph B including a
scaling exponent 2.6.
[0045] FIG. 10 shows a weighted context graph B including a scaling
exponent 2.29.
[0046] FIG. 11 shows a sub-graph including a sequence of edges.
[0047] FIG. 12 shows a sub-graph including a loop.
[0048] FIG. 13 shows a sub-graph including a single-hop hub where
k.sub.3 is the origin.
[0049] FIG. 14 shows a sub-graph including a two hop hub where
k.sub.3 is the origin.
[0050] FIG. 15 shows a sub-graph including a 5-partite graph where
k.sub.3 is the origin.
[0051] FIG. 16 schematically illustrates a system for context graph
based targeting using a MFD according to an exemplary embodiment of
this disclosure.
[0052] FIG. 17 shows a targeted communication produced according to
an exemplary embodiment of this disclosure, the targeted
communication including text ads with graphics from two of the
advertisers.
[0053] FIG. 18 shows another targeted communication produced
according to an exemplary embodiment of this disclosure, the
targeted communication including categorized coupons.
[0054] FIG. 19 schematically illustrates Transactional Print
Workflow according to an exemplary embodiment of this
disclosure.
[0055] FIG. 20 illustrates a first example of a graphical pattern
ensemble according to an exemplary embodiment of this
disclosure.
[0056] FIG. 21 illustrates a second example of a graphical ensemble
according to an exemplary embodiment of this disclosure.
[0057] FIG. 22 illustrates a third example of a graphical ensemble
according to an exemplary embodiment of this disclosure.
[0058] FIG. 23 illustrates a general layered representation
according to an exemplary embodiment of this disclosure.
[0059] FIG. 24 illustrates an example of two layer representations
(only details of layer 1 shown) according to an exemplary
embodiment of this disclosure.
[0060] FIG. 25 illustrates an example of a newly detected graphical
pattern assembly of "similar" pattern to those of FIGS. 20-22.
[0061] FIG. 26 illustrates an example of an exemplary workflow
including the application of pattern assemblies for a context graph
case.
DETAILED DESCRIPTION
[0062] This disclosure provides systems and methods of using
n-partite graphs in determining contextual information that can be
used to characterize a user. Such characterizations can be used for
personalization and targeting of information, such as
advertisements, to the user. Examples for the nodes of the graph
include keywords extracted from the titles of documents being
printed, the user who prints the document and the device on which
it is printed. Graph edges are entered connecting the document key
words to the user and the user to the printing device, but other
edges such as a connection between key words that occur together in
a title can be entered as well. The disclosed systems and methods
maintain a large graph that captures all the activities (e.g.
document printing), but can extract sub-graphs for a user and uses
the information within the sub-graph to target advertisements to
the user.
[0063] Conventional targeting techniques have been largely keyword
based or spatio-temporal without specific attention to expressivity
that an ordering or structure of events can bring to the
effectiveness of targeting. Extracting an expressive sub-graph from
a large graph (e.g. >10 6 nodes and 10 9 edges) from an
n-partite graph of events has not been well studied.
[0064] Capturing multi-dimensional tuples of information and the
interconnections between events is structurally complex. More so
because a large number of users are causing or contributing to
these users directly or through the usage of devices (mobile, print
or otherwise). These can be captured coherently by graph
theoretical structures.
[0065] More information can be gained if the structure of these
complex interconnections (i.e. in the way contextual information
gets related amongst users, devices, locations, time and other
attributes) is analyzed dynamically. This disclosure also discloses
processes to perform that analysis.
[0066] In this disclosure, a method is provided wherein graph
theoretic structures in a large graph or sub-graph thereof are
utilized to express certain correlated user activities and/or event
correlations so that they can enhance targeted communications to
the users. In other words, graphical structures such as
sub-graphs--chains of related nodes, loops, paths etc. &
statistics on top these structures--degrees, hop-distance, edge
weight, edge direction etc. can capture certain inherent
connectivity and association related properties in a more fine
grained fashion. These structures or the properties that govern
them can be used to relate how the users interact with their
environments and/or social groups. Of particular interest, at
least, are the following types of contextual information and their
n-partite graphical representation when a person interacts with a
device: the person, keywords, categories or topics that relate such
an interaction, social group of the person, spatio-temporal aspects
of the interaction, usage frequency and combinations thereof. The
disclosed system and method capture and store these associations in
real-time and represents them as n-partite graphs. Subsequently,
sub-graphs are extracted dynamically according to a method
described herein. The overall context graph resulting from the
collection and storage of the aforementioned attributes is huge
e.g. millions of vertices and edges. Also provided is a method to
characterize the context graph and methods to use the graph with
the aforementioned targeted communication application.
[0067] Complex graphs can capture interactions between events and
contextual information that relate to a large user-population as
and when the users interact with the environment and the multitude
of devices contained in them, for example, but not limited to,
phones, e-readers, printers, kiosks and the like. The scale of the
interactions while huge can depict some underlying structure. As
will be described below, provided are some examples of such
interactions exhibiting specific graph theoretic properties, such
as the presence of hubs, scale-free nature, etc. Such properties
and other structural elements can be utilized for targeting users
with information and marketing material of various kinds.
Examples of Targeting Context Derived from Devices and
Processes
[0068] Mobile, Phone, E-reader Devices, Printing Devices,
Interconnected Printing Device (or MFD) Workflows etc.: Meta-data
associated with the use of a device or a multitude of devices can
provide targeted communications. While some devices have access to
global positioning system coordinates (latitude, longitude), and
cell/base station, others have sensor readings (accelerometer,
magnetometer etc), and context from documents/web applications (or
apps). One or more subsets of these can be utilized for
automatically providing meaningful information to the users of the
devices and processes.
[0069] Printed documents: When users print to devices (or use other
types of devices in myriad ways), there is a lot of unused context.
In the print world, this context includes keywords in the document
name, location, time, user_id, device name, device capabilities
used and such. Furthermore, when a group of people utilize a group
of devices, there is yet another social dimension to context--such
as the user names of all people printing to a given device,
document details and keywords contained in document names,
preferred printing times of people and such.
[0070] Transactional Prints: Credit card statements are one example
of transactional documents. These contain not just spatio-temporal
information such as when and where the goods and services were
purchased and used, but also the relative importance of such goods
to the users. This is valuable information which can be
characterized as certain kinds of purchases or user behavior being
adjacent to each other in time or space.
[0071] The above applications and combinations thereof are areas
where targeted communications fit in well. However, these are only
provided as exemplary embodiments of the present disclosure and it
is to be understood that the disclosed system and method is not
limited to these applications.
[0072] Substantially provided herein, at least, is a (1) system and
method to extract, rank and unify sub-graphs from large n-partite
graphs; (2) system and method to utilize the extracted sub-graph
for targeting; (3) controlling the scalability of the system by
search termination strategies & graph properties; and (4)
methods to search and aggregate multiple types of targeted
communication by dynamically relating meta-data from context graph
with a content repository.
Multi-Partite Context Graph
[0073] Multi-partite context graphs can represent a multitude of
entities (explicitly or inferentially revealed)--including and not
limited to events, people, devices, locations, date-time, keywords,
categories, preferences etc. These connections between the entity
types indicate actions performed by one or more users as dictated
by a design choice. Furthermore, the weights of connections of
entities indicate frequency of involvement of entities in an
action.
[0074] For purposes of this disclosure, the meaning of a context
graph is provided herein through an example in the document
printing world, although the application of context graphs as
provided herein is easily extended to other domains. In the
printing world, context in the form of keywords, time-stamps,
locations and various subject categories, such as automobiles,
banks, investments etc., are generated collectively by a group of
people by the act of printing, or using other rendering devices.
The extracted keywords may be from job names or from document
content. These attributes associated with the printed document can
be used to illustrate the concept of a context graph as shown in
FIG. 1.
[0075] With reference to FIG. 1, a tripartite graph is shown
including a set of keywords K extracted from attributes associated
with the printed document, such as the name of the document and
other related information with or without extracting information
from the document, a set of people P, and a set of devices D. Each
link in the graph shows one of two things--either a person p.sub.i
used a keyword k.sub.i or a person used a device d.sub.i. Each
unique p.sub.i, k.sub.i; or d.sub.i appears only once in the graph.
Furthermore, the edge weights are utilized to denote the frequency
of the aforementioned semantics relating to the usage.
Associating Meanings with Sub-Graphs and/or the Whole Graph
[0076] With reference to FIG. 2, shown is a graph including
additional more semantics--if a person used two keywords in one
usage of the device, such as one printout, one fax or scan etc., a
link is provided between the two keywords as shown in the top
layer. Note that FIG. 2 is not the transitive closure of FIG. 1.
Notably, such relationships that can be derived with transitive
closure are left as a matter of choice and something that may be
done on the fly for relatively simple relationships or offline for
relatively lengthier chains to which more semantics may be
assigned.
[0077] This concept of tripartite context graphs is easily extended
to n-partite graphs that can describe more semantics in addition to
the above. For example, FIG. 3 shows the added set of C and N to
FIG. 1, where c.sub.i may indicate the type of usage such as
printing or faxing and n.sub.i indicates a feature corresponding to
the devices. Graph-based models are chosen for the expressivity for
the complex webs of information that can associated with keyword
usage and context generation. But as for the term context graphs in
the reminder of this disclosure, the disclosure does not
differentiate between graphs that are n-partite, their transitive
closures or something in between as shown in FIG. 2.
[0078] FIG. 4 illustrates keyword interconnections associated with
k1, k2, k3, k4, k5, k6, k7, k8, k9 and k10 as shown in the context
graph of FIG. 3.
Characterizing Context Graph Structure
[0079] The structure and properties of a context graph are
fundamental to the disclosed system and method, initially explained
are some findings related to the graphs examined. Context
generation has yielded several large graphs with heavy keyword
interactions that are shown in FIG. 5, according to one example.
Entities that are not associated with others by the users show up
as isolated points. NOTE: Each "o" is a discrete entity.
[0080] The example of FIG. 5, as well as others not illustrated,
from the device world of printers show the presence of hubs. These
hubs show that there are some entities in an action that are
heavily connected. For example, characterized are two examples. A
first example with .about.2600 entities (.about.3.38 million
feasible edges) denoted as graph A, and a second example with
.about.4200 nodes (.about.8.82 million feasible edges) denoted as
graph B. The edges further have weights based on the frequency with
which pairs of entities, i.e. keywords, devices or both, were
selected in combination. The graph as illustrated in FIG. 6,
actually exhibits power-law characteristics for its degree
distribution i.e. if X represents the random variable degree of a
given node, P(X.gtoreq.x) actually represents a power-law of the
form x.sup.-.alpha.. More precisely, the power law is believed to
have an exponential tail in some cases that causes the behavior to
taper off faster at higher degrees.
[0081] The power-law coefficients can be separately computed for
the weighted (x.sup.-.alpha.1) and un-weighted (x.sup.-.alpha.2)
graphs. A large difference may indicate that there is a significant
positive impact that is created by factoring the repetitive social
behavior i.e. clustering around common keywords, i.e. jargons, in
an organization. For example, FIG. 7 shows un-weighted context
graph A including a scaling exponent of 2.58; FIG. 8 shows a
weighted context graph A including a scaling exponent of 1.61; FIG.
9 shows an un-weighted context graph B including a scaling exponent
2.6; and FIG. 10 shows a weighted context graph B including a
scaling exponent 2.29.
[0082] In some cases, such as Graph B, the weighted context graph
exhibits an exponential tail as shown in FIG. 10. In other words,
P(X.gtoreq.x) is proportional to x.sup.-.alpha. e.sup.-.beta.x
(where the exponential term overwhelms the power law for a large x.
During the analysis of contextual information in an organizational
context, it is plausible that the mass usage of jargons and terms
trail off more rapidly beyond a certain level of popularity,
perhaps due to group size, seasonal effects etc. While it has been
observed that such faster than power-law trail off phenomenon does
occur, the exact causes are not the subject of this disclosure.
This disclosure, and the system and method provided herein leverage
the detected properties. One example of the property detected being
power-law structure and faster than power-law trail off in regions
of the graph.
Method for Sub-Graph Extraction from Context Graph
[0083] In this section, described are various ways sub-graphs can
be extracted from a context graph to provide targeting. In
addition, described are the generation of the context graph and how
it is used in real-time.
Step 1: Graph Maintenance
[0084] Initially, a context tuple, e.g. a row of comma-separated
context, is received by a targeting engine, and each of the entity
types in the tuple is recognized by the context graph algorithm
associated with the targeting engine. Either explicitly from data
or as the result of an inference, the different entities associated
with the context tuples are classified. The inference can be based
on a machine learning algorithm, for example a known supervised
classification technique such as those based on Support Vector
Machines or Singular Value Decomposition. After the context tuple's
various entities are recognized, they are assigned to one of the n
layers of the n-partite graph and edges and weights are
updated.
[0085] Further semantics may be assigned as an extension for each
given tuple. The direction of each edge may be altered given the
transaction's dependency relationships or ordering. It is the
responsibility of the tuple provider to specify the directionality
or ordering of the entities at the time the context is provided for
producing a targeted message. Each side of edge has a
directionality probability which is an aggregate of the set (or any
reasonable sub-set) of users assigned to the graph. So an edge
directionality probability is denoted e(p.sub.a, p.sub.b, p.sub.ab)
where p.sub.a+p.sub.b+p.sub.ab=1, and {a,b} refers to the edge
between node a and node b. The graph itself is stored on any
scalable platform such as a distributed database. According to one
exemplary embodiment, a non-relational distributed database, namely
Hbase can be used.
Step 2: Retrieving a Sub-graph and Finding the Dominant
Sub-graph
[0086] Various graphical structures such as sub-graphs --chains of
related nodes, loops, paths, hubs etc. & statistics on top of
these structures--degrees, hop-distance, edge weight, edge
direction etc. can capture certain inherent connectivity and
association related properties in a fine grained fashion. These
structures or the properties that govern them relate to how the
users interact with their environments and/or social groups, as it
is the activities that are utilized to make the graph in the first
place. Of particular interest are at-least the following types of
contextual information and their n-partite graphical representation
associated with a person interacting with a device: the person,
keywords, categories or topics that relate such an interaction,
social group of the person, spatio-temporal aspects of the
interaction, usage frequency and combinations thereof.
[0087] Retrieving a sub-graph relates to extracting a relative
smaller graph, i.e. sub-graph, where the keyword from the query is
contained in a designated portion of the sub-graph e.g. center,
2.sup.nd layer etc. Such a keyword is called the origin (w.r.t. to
the sub-graph retrieval). According to one exemplary embodiment,
the graphical substructures shown in FIGS. 10-15 are of particular
interest without any limitations on other substructures that may be
added.
[0088] Finding a dominant sub-graph relates to ranking the
retrieved sub-graphs to find the most suitable sub-graph.
Essentially a metric is computed with the properties of these
sub-graphs to determine a quantitative ranking. The metric can be
based on degree, edge weight, edge direction, number of nodes in a
loop, hop count etc. Other constraints can be associated with a
specific targeting purpose. Usually one of the keywords, denoted
the origin, will be contained in the sub-graphs. If multiple
keywords are part of a query, then multiple sub-graphs are
retrieved iteratively. The resulting set of dominant sub-graphs are
then merged based on common nodes. If there is no common node
between a pair of origin keywords, the dominant sub-graphs are kept
as disjoint pieces at this stage.
Step 3: Utilizing Structure of the Overall Context Graph and
Sub-Graphs
[0089] For sub-graphs that do not have any overlapping entities,
the structure of the overall graph is used. The overall context
graph is known to exhibit hubs as previously discussed with regard
to the power-law structure. Therefore, the strategy is to seek out
hubs to connect the seemingly disjoint sub-graphs. The overall
context graph is also known to exhibit other characteristics around
the path lengths between nodes Notably, diameter and path lengths
can be characterized as other relevant metrics.
[0090] Initially, the nearest hub associated with the sub-graph is
determined. In most cases, this will be just a few hops away
because of the scale-free structure. In other cases, the search is
terminated owing to intractability. In this case, both sub-graphs
are used to identify the nearest hub(s). The hubs may have some
latent relationship with the sub-graphs. The revised unified
sub-graph is the two sub-graphs joined by the hub-node. In this
way, two seemingly dissimilar entities such as Malaria and CNN for
example, can be connected, perhaps because of the airing of a
Malaria program on CNN. Such connections become important when two
sub-graphs seem very different. Notably, the above strategy is only
done to discover a set of latent connections between
sub-graphs.
[0091] Often owing to the scale of the context graph stopping
criteria for search and traversal becomes important. The following
are some conditions that may be used: hop-count from origin (most
common); reversal in edge direction, hitting a hub, accrued edge
weight along direction of traversal, number of layers of the graph
traversed etc. Notably, sometimes Step 3 may be skipped without any
loss of continuity to the method.
Step 4: Using Sub-Graph Outputs for Targeting
[0092] Once the sub-graphs are unified or kept disparate, a query
is constructed and sent to an aggregator of information. The query
may be broken into sub-queries to reflect the different ways to
offer importance to the structure of the sub-graph(s) identified.
The simplest way is to send the high-degree nodes from the
sub-graph along with the most recent input that was received for
targeting. Another way is to send all the nodes in the sub-graph in
a sequence and to prioritize the results that contain the origin
entities, e.g. keywords. The sub-graph will dictate the order of
the sub-queries that are iteratively made to the aggregator.
Step 5: Retrieving Relevant Documents Using Sub-Graph Output
[0093] In the previous step, mentioned was sending a query,
possibly composed of sub-queries, to an aggregator of information,
e.g. coupons, advertisements, news feeds etc. Checking the
relevance of retrieved documents is performed in this step.
Relevance is performed by counting the number of entities that
exist in the result and matching it with the sub-graph that is
delineated in Steps 2-3. For result documents that are in text form
the relevance is computed directly by keyword matches. For
image/audio/video results the relevance is computed on meta-data,
tags, transcriptions, translations etc. The filtered and ranked
results are returned to the user in the format of choice. An
alternate way is to use the redundancy observed in retrieved
documents for the purpose of ranking. See U.S. patent application
Ser. No. 12/533,901, by Harrington, entitled `method and system for
constructing a document redundancy graph," filed Jul. 31, 2009.
Implementation Details
[0094] In this section, briefly outlined is how to build the
context graph and how to search and maintain it. Explained also is
how an incoming job can be monetized with advertisements and credit
is stored for various verification actions by the user. In the
development of this system, utilized was the open source
Hadoop/MapReduce/Hbase framework from the Apache Foundation to aid
in computation and prototyping.
[0095] "Printing" is an overloaded term and can be meant to
interpret sending a document or piece of information from one
device to another.
[0096] Step 1: User prints (or scans faxes) to/from an MFD. Each
MFD receives the metadata associated with the print (or other use).
Each MFD is usually a data node of the Hadoop cluster. Upon receipt
of metadata, the MFD calls contextual_ad_fetch( ) which is a remote
call to the server component, typically residing with the name-node
of the Hadoop file system. The MFD that receives the print
job/stream caches the job temporarily so that the advertisements
can be fetched.
[0097] Step 2: contextual_ad_fetch( ) first calls
ContextGraph_Create_Maintain( ) As every user prints (or uses the
network of MFDs), the keywords and context identifiers (time of
day, location, username etc.) are extracted. Now is introduced an
edge of vertex in the graph context graph G(E,V) where E is the set
of edges and V is the set of vertices. Note that weights of the
edges have to be recomputed as well.
[0098] a. For each context item that is extracted, update the
n-partite graph. Generally speaking there may be n types of
context. In the prototype, assumed was a tripartite graph with the
constituent part being keywords, usernames and devices.
[0099] b. If any keyword is not present in the graph, it is added
as a new vertex. Likewise for username and device.
[0100] c. If the keyword is already present, added is an edge
between the username that printed it and the keyword printed. Also,
an edge is added between the username and the device that the job
was printed on. The above process is repeated for all context items
that are extracted. Up to this step, nodes and vertices of the
tripartite graphs have been created.
[0101] d. Next, a link is introduced between each pair of keywords
associated with the current job. This means that the user
associated those keywords together and attached some semantics to
it in current domain of the users' operation. According to this
example, only the job name was used for this step. Other similar
metadata from the document may be extracted from the document, but
only sparingly owing to tractability issues. This creates links in
the keyword layer on the fly.
[0102] e. Then for each edge that got newly created, the weight is
set to one. If there is an edge already between the two vertices
under consideration then the weight is increased by one. The latter
condition occurs when such a linkage was established by another
user or the very same user as the current user in a prior print
job. In this way, the graph is created, updated and maintained
after every job.
[0103] f. From a storage perspective, the graph is stored as
co-keywords, co-user-device and co-user-keywords where together
with the edge weights, provided is a social context graph. The
graph creation overhead is the same as update overhead as the graph
creation is done as several iterative updates. Update overhead is
estimated as follows. Complexity of finding an item in the ordered
list is O(log n). Adding a new element that is not there is O(1).
If the element is already there, computing or updating the weights
requires another O(log n). This process has to be done for about 5
keywords on average. So the update complexity is 10*O(log n) where
n is the total number of keywords.
[0104] Step 3: contextual_ad_fetch( ) then calls ad_fetch( ) based
upon information in the graph, i.e. history, and those in the
current context, i.e. current job. The aim of this step is to
maximize the chance of advertisements and then getting the most
relevant advertisements. The advertisements could be obtained by
contacting an advertisement aggregator. Since it is not a human
that is actively querying the advertisement aggregator, it is
important to send a list of most relevant context.
[0105] Step 4: Once the ads are obtained from the advertisement
aggregator, the available real-estate, both electronic and paper,
is populated with the best advertisements. The advertisements that
essentially have greater relevance, for example, computed relevance
based on the occurrence of queried keywords in the advertisements,
are essentially populated at more conspicuous areas such as the
banner pages and the UI (User Interface) of the device. Once the
ad-filled real-estate is prepared, the rest of document that was
cached in step 1 is printed along with the ads and/or coupons.
According to a prototype, it takes about 2-4 seconds to fetch the
targeted ads. An additional couple of seconds may be required to
meld the ads with the customer job in novel ways.
[0106] Step 5: Some advertisements may also be sent to the users'
desktops through the print-driver so that they can be clicked on or
used at a later point. Alternately, some advertisements may be
cached at the device for use at a subsequent time. If an
advertisement is clicked upon at the users' workstations, the
advertisement identifier and user are captured and transmitted to
the device. The device(s) track and store this information and from
time-to-time transmit it to the advertisement aggregator. Printed
coupons or other hard copies of advertisements that the users can
carry with them to a store or vendor include a barcode, micro-text
or glyph embedded in the image, so that a vendor can track the
source of the advertisement. These methods enable the tracking of
printed or non-printed targeted messages and provide verification
to the advertiser or advertisement aggregator.
[0107] A diagrammatic representation of one system according to
this disclosure is shown in FIG. 16.
[0108] Step 1. According to this embodiment, a user 150 prints a
document. Printing a document includes using any device that may
have a so called "print button" to invoke the print action.
Information including name/login of person printing, credentials,
location (GPS coordinates), print account details, preferences on
the printer, preferences for the document to be printed including
number of pages or page ranges, color/monochrome, document content
(words and images with the document) are provided when invoking the
print operation. The GUI may contain the print button and collect
these pieces explicitly according to an opt-in model with respect
to the information. Also the information may be stored in a profile
page on a server. Content internal to the document that is being
submitted for printing may be restricted from use by the customer
that owns the print job. Printing can also mean printing to a
driver such as, but not limited to, an Adobe PDF Printer which will
perform the document conversion service by accepting the print
job.
[0109] Step 2. A service (on the cloud, network or any other known
service such as CUPS--Common Unix Printing Service) takes the
printed job and other information from the user that is
printing.
[0110] Step 3a. Automatic Ad Fetch and Ranking Service.
[0111] First context is extracted. The service will extract the
context (such as the examples about the user, device, location,
time, document above) from the print job. From the context it
applies some business rules or algorithms to extract the query that
will be used for fetching ads automatically. An optimal subset of
the context is used for generating a query. This subset is a
function of time, location, user, the device, the service provider
(i.e. whether the printing service is in the airport vs. mall vs.
office vs. home), past history of printing etc. Relevant context is
extracted in the form of a subset of data. Many such candidate
subsets may be unearthed.
[0112] Step 3b. This context subset search is facilitated by the
context graph that is stored in a database (the well known hBase in
our case), where history of prior context are stored as a context
graph explained before.
[0113] After the subset is extracted, it is sent to an ad-server
that will provide some ads.
[0114] Step 4. Once the ads are received back by the ad-fetch and
ranking service 154, the service performs relevance ranking
according to settings and business rules within the ad fetch and
ranking service. The results are called relevant advertisements or
messages. These are passed back to (152) the print server.
[0115] Step 5. (152) This step merges the document to be printed
(or converted to other forms if printing really means document
conversion) with ads and messages and then sends them to another
application. The other application can be, for example, a phone, an
email server, an e-book reader or in our embodiment a printer or
printer GUI.
[0116] Step 6. Depending on the device click-through is defined.
Click-through are actions relating to clicks or click-like or
view-like activity that is recorded and transmitted to the
advertiser that sent the ads to the ads-fetch and ranking service
(154). Click through only relates to those individual ads or
messages that were acted upon by the user in some way.
List of Locations and Real-Estate (Paper and Non-Paper)
[0117] 1. Banner page
[0118] 2. Several Places within Document (margins, sides,
interspersing etc.)
[0119] 3. MFD UI-EIP+Screen
[0120] 4. Via Print Driver to User Computer
[0121] 5. Windows Application including browser
[0122] 6. Mobile Destinations including e-reader, laptops etc.
[0123] FIG. 17 shows a targeted communication produced according to
an exemplary embodiment of this disclosure, the targeted
communication including text ads with graphics from two of the
advertisers.
[0124] FIG. 18 shows another targeted communication produced
according to an exemplary embodiment of this disclosure, the
targeted communication including categorized coupons.
[0125] FIG. 18 refers to a one embodiment of a personalized ad or
coupon listing in a categorized fashion that is again custom
generated for the user or customer. FIG. 18 may be printed or may
be viewed on a GUI or viewing device, and provides coupons or ads
as images relating to 4 categories.
[0126] (32) and (34)--Business A and Business B coupons 32 and 34
are in the clothes/shoes category.
[0127] Business C and Business D coupons 36 and 38 are in a
different vacation category.
[0128] Both the categories chosen and the individual deals/ads
chosen are personalized to the user view of this listing. Service
154 of FIG. 16 generated this listing.
[0129] The personalized ad/coupon listing also contains some images
personalized to the user. The content within the picture may be
enhanced to appeal to the user--such by including text or faces of
people known to user or color preferences etc. Also included are
tracking elements for paper or electronic forms such that an
authorized device can use the tracking material (such as bar codes
or QR codes) and know who utilized the personalized ad or
coupon.
[0130] FIG. 19 shows another use-case in a production printing
environment where the aforementioned method can be utilized.
Alternately, the pdfs could be just viewed on screen as opposed to
printing.
[0131] Described is a large workflow with targeted personalization
inserted in to the workflow stream through the context graph
technology.
[0132] 180: Online or printed statement generation requests are
initiated. This starts the process of providing credentials,
transaction information (such as those in the statements) provided
to the printer/electronic publisher.
[0133] 182: The web (182) sends extracted context (context refers
to content like that explained in FIG. 16 but extracted per user or
per transaction for a user--usually one line in say a credit card
statement) and send to (184) a targeting service (equivalent to
ad-fetch and ranking service 154 in FIG. 16).
[0134] 184: The process fetches the ads per user per transaction
and ranks the relevance and applicability of the ads
[0135] 186: The process merges ads and per user content into the
stream. Stream is said here because in the workflow we have many
statements corresponding to a multitude of users. 186 also applies
image color manipulations or document conversions or device format
conversions for rich look and appeal w.r.t device or document
format.
[0136] 188: Useful information, recommendations, coupons, ads or
messages that are generated by 186 are transmitted by web to
produce a new stream 190
[0137] 190: A new print stream is augmented with extra content.
[0138] In addition to the methods and systems discussed hereto,
this disclosure provides methods to detect, learn and/or correct
normal/anomalous behaviors in workflows and other systems that can
be represented as state transitions. For example, many
communications can benefit from the detection and learning of
target system, consumer or workflow behaviors. These include
consumer activities near devices or printers, printing and other
document workflows. In addition, the detection of normal/anomalous
behaviors of a printing system workflow, etc. can help for
servicing of the system. This disclosure maps states traversed in a
workflow into two dimensional patterns on grid in which the
originating states are labels on the rows and destination states
are on the columns. A path represents the allowable behavior in the
system. Anomalous behaviors are easily detected by human
inspection; faulty behaviors can be detected and corrected.
Collections of patterns can be represented as a stack of tiles.
Also included are algorithms to detect similar patterns across
tiles and also comprises a system on a computational cloud.
[0139] According to one exemplary embodiment, a system identifies
categories of digital information (including kinds of
bugs)--specific classes of which we call graphical, auditory or
computational pattern ensembles or, informally, digital
fingerprints. Digital fingerprints are used today to identify
individual files such as songs. The disclosed system identifies
categories of those fingerprints and makes them searchable in a
manner that is easier for both humans and computers to
comprehend.
[0140] Finding patterns needs to be simpler (e.g. digital
fingerprints that are visual or auditory), which can subsequently
be corroborated by an automated service (e.g. a knowledge as a
service feed). It is becoming common for businesses to collect vast
amounts of information from the systems they develop and
subsequently analyzing that data. But those analyses are often
complicated and time consuming and require expensive specialists.
Importantly, they do not involve the prior observations of specific
visual or auditory patterns (plain symptoms & conditions
included) at the customer site.
[0141] Fine grained real-time analysis as capability to find
opportunities or predict scenarios is absent: It is important to
capture how people are using our systems and services in a more
real-time fashion. This can lead to the prediction of
user-behaviors, side opportunities, co-related bugs and market
trends by tapping into the combined knowledge of customers and
service providers, for example the non-subjective digital
finger-prints that are shared knowledge.
[0142] This disclosure provides an automatic (unsupervised) method
of generating categories for digital data and identifying which
digital data falls into those categories. Specifically, provided
are techniques for the identification, categorization and the
search of similar looking (graphical) or sounding (auditory)
pattern ensembles that both the consumers of the service and the
provider(s)/technicians can relate to. These pattern ensembles go
beyond the mere manifestation of symptoms and conditions in the
sense that they can be described without subjectivity and without
going through painfully enormous logs. For instance, graphical
pattern ensembles are visual/readable by lay-humans--so they can be
simultaneously sophisticated for trained users (i.e. in terms of
the meaning it conveys) and simple for untrained users (to read,
internalize and describe). Further they utilize techniques from
human vision and cognition and the assumption that even untrained
humans can internalize normal and abnormal system workings upon
repeated use. Computational pattern ensembles (such as the
underlying graph-theory structures) are readable and searchable by
computer programs and can be utilized to characterize system or
workflow behaviors. Some examples of where this technology can be
used directly or as a means for tracking usage are as follows:
consumer behavior directed to targeted communications when the
consumer interacts with a multitude of devices; comparing release
and robust workflow signatures; usage signatures for robust/fragile
combinations of services; particular kinds of error classification
that aid easy description by humans; increasing software
conformance to user needs etc. The technology can also be used as a
tool for proactive/reactive analysis but more interestingly as a
tool for aiding objective communication at the time of
troubleshooting.
[0143] Shazam is a server-based application. If you are listening
to a song and you want to identify it (or the artist), you fire up
the app on your mobile phone, hold the phone near the music, and
within a few seconds, the name of the song and the artist are
displayed on your phone. Shazam can do this because each song has a
"fingerprint," a unique pattern of data that makes it identifiable
and easy to search for (See
http://www.shazam.com/music/web/pages/background.html). Such
digital fingerprints can be used to create categories of digital
data. This disclosure provides a method of generating categories
for digital data fingerprints with one or more of the following
features:
[0144] The detected pattern (or family of patterns referred to as
an ensemble) can have a graphical structure (e.g. W pattern, Z
pattern, M pattern etc.) or have a tone based correlation (such as
the collection of tones heard when dialing a telephone number).
These patterns can be seen or heard so that a customer can develop
familiarity with normal and abnormal operating conditions of a
service or system (i.e. internalization of the system workings).
Such patterns in the past would be compiled as an enormous
non-volatile-memory (NVM) log analyzed by specialized models or
tools.
[0145] In addition to the above human interface, the detected
patterns are computer searchable through a graph-theory based
representation.
[0146] Instead of describing subjectively what one sees or hears
when trouble-shooting an abnormal condition, the technology creates
a visual or auditory signature to avoid confusion and aid objective
communication.
[0147] It also provides a method of searching for and finding which
of these categories new (incoming) information falls into. This is
most effective if data is gathered in the cloud--the
checkpoints/event information is available in one place (data is
complete). According to one exemplary aspect is the concept of
graphical and auditory pattern ensembles (a type of digital
fingerprint) that help visual or auditory correlation of patterns
in several dimensions.
[0148] One simple example is from the fields of printing system
workflow and associated software (e.g. Xerox FreeFlow Print Server)
where a series of human/automated steps are required to complete
the creation of an artifact, physical or logical. The system
generates a category that identifies when a user employs a
two-up-for-one imposition in combination with a centered watermark
and a resized art box in a PDF document and map that combine to a
particular graphical or auditory pattern. In this way, the system
automatically, and on an ongoing basis, matches new data with those
fingerprints to rapidly identify bugs, bug severity, and notify the
appropriate teams before a customer encounters a problem with
greater severity. Below are described in detail customer pain
points where the disclosed method/system can be utilized.
Fragile Workflows
[0149] The first basic idea is that every element of a book
production workflow affects every other element of a book
production workflow and this fact makes workflows extremely
fragile. Consequently, the operators only change the workflows when
they absolutely have to. Unfortunately, they always have to; see
below.
[0150] This means that any change to any physical or
software-setting part of the workflow will either cause the
document to print incorrectly or cause the workflow to stop
operating. The following list contains examples of changes that
potentially disrupt workflows frequently at customer sites.
[0151] (a) Changing the paper's grain direction.
[0152] (b) Changing the model of printer being used.
[0153] (c) Changing the stops on the finisher.
[0154] (d) Changing the imposition.
[0155] (e) Changing the content of the original document (not the
size or position of the content).
[0156] (f) Using an on-demand version of an original document will
always break a workflow at some point in the future.
[0157] (g) Using different creative applications to send documents
into a workflow. E.g., if you create a document using a first
application and a second application including the exact same
document with the exact same content, and you emit Postscript (or
PDF) from each application using the same driver, there are
situations where one RIPs and the other fails to RIP.
[0158] (h) Changing the weight of the paper used in the
workflow.
[0159] (i) Changing the cover lamination procedure.
[0160] In short, changing pretty much any workflow element or any
part of the target document or its instructions, may require
changing at least one other workflow element or part of the
original document. Furthermore, changing that second element may
mean that a change is required in yet another element. Note that
frequently a workflow is never really in its final version. Once a
functioning workflow is established, workflow operators routinely
are be forced to modify it because of the nature of the work--the
evolving customer requirements, supplier constraints, content and
format of the original document, which change frequently, and the
workflow technologies --compel these changes.
[0161] One example of preventing the kind of problems described
above, according to this disclosure, is the generation of a Release
Finger Print. In other words, a set of pattern ensembles connected
with a certain release of a workflow. Subsequently, the release
finger print is used to govern the migration of one release to
another.
Each Site is Entirely Unique (Even if they Seem the Same)
[0162] The situation is further complicated by the fact that:
[0163] (a) Required changes to identical workflows can differ from
one customer site to another customer site, even within a single
company.
[0164] (b) The effect of a workflow change on the output of
identical workflows can differ from customer site to customer site,
even within a single company
[0165] (c) The changes necessary to fix a broken workflow can
differ from customer site to customer site, even within a single
company, even if the break was caused by the same change in an
ostensibly identical workflow. This is due to at least slight
differences in the software used at those sites or at least the
configuration of the software at those sites; and different
documents from different sources are being sent through the
(ostensibly identical) workflows at the different sites.
[0166] A site-based fingerprint as disclosed can be used to govern
replicating workflows from one site to another.
Automated Image Quality Problem Identification
[0167] Often there is difficulty uncovering the causes of image
quality problems. With pattern ensembles, image quality problem
categories are identified which allow the end user to solve a
problem or to communicate specific error information to the
appropriate technical support teams so that the correct help can be
provided.
[0168] For example, a pattern ensemble can identify that the
current image quality problem on copies is due to a dirty platen
and the user can be notified to clean the platen to solve the
problem. No call to technical support is necessary.
Consumer Behavior Identification
[0169] Graphical pattern ensembles can represent the structure of
interactions between users, devices, environments and other
external stimuli. For example, when a consumer interacts with a
device through a kiosk, a mobile phone in a mall, patterns of
interaction can be established. Furthermore, specific items of
interest, times and/or places can be deduced. From such pattern
ensembles, one can identify elements that provide clues about the
person's likes and dislikes e.g. which aspects of the application
get used; what aspects get a search query; where the consumer
spends most time; what keywords get typed, etc. When such
information gets collected over a large population, normal and
abnormal patterns can be delineated. Normal patterns that are so
delineated can be utilized as templates to search large content,
meta-data or document repositories. Pattern ensembles provide the
framework for aggregating and visualizing the search results.
[0170] As previously stated, this disclosure describes technique to
indentify, categorize and retrieve a specific pattern from a
collection of patterns. In particular, the concept of graphical
pattern ensembles is used to deal with visually tracking and
correlating patterns and changes thereof. This feature takes
advantage of human vision workings and processes in human
cognition. The disclosure also teaches how to categorize observed
patterns into ensembles and how to layer them as a stack to deal
with multiple dimensions. Layering or stacking is provided as one
way to generalize the graphical ensemble concept to multiple
dimensions.
[0171] Important aspects include the following: Graphical and
auditory pattern ensembles that aid internalization of normal and
abnormal symptoms of the system workings & enhance objective
communication with the service provider at the time of
trouble-shooting.
[0172] Graph theory based representation and formulation for
searchability.
[0173] Capability to detect patterns by leveraging change detection
capabilities of human vision and cognition.
[0174] Traditional alternatives include statistical models, machine
learning, neural networks, etc. Graph theoretical models that
capitalize on human capabilities are minimal. These models only
depend on logs and model specifics. They fail to leverage the fact
humans (customers) can internalize normal and abnormal aspects of
system operation while dealing with a system, such as an MFD or
other device, a document service, etc.
[0175] The patterns are shown through a visual interface, the
auditory patterns & aids that are put in place for the customer
to interact with the patterns.
[0176] This disclosed method deals with monitoring a large number
of events (raw or aggregated) especially when human oversight is
required because of the domain (e.g. large logistics operations,
data centers, power stations), Turing-test like difficulties (e.g.
connecting the dots in case of partial information) or complex
decision making (e.g. major decisions that impact cost) are
encountered. Of special interest are patterns that can be visually
remembered or have auditory correlation, thereby providing use of
special cases of analogues such as connected graphical patterns of
phone numbers on a phone dial pad; collections of pictures of a
landmark or tourist destination; tones heard when dialing a phone
number; the sound of a telephone modem connecting to the Internet.
Humans may be better able to respond to missing pieces of
information that they have internalized over time. In this sense,
the aim of pattern ensembles is to present human-in-the-loop
decision making systems with shapes that humans could complete in a
more natural fashion. The system attempts to associate a shape with
pattern ensembles that humans can better internalize. At the same
time, the pattern designs have a computer searchable
representation.
Graphical Pattern Ensembles (GPE)
[0177] Graphical pattern ensembles are a family of graphical
patterns that are easily identifiable by the human eye. These
patterns are defined here as connected 2-dimensional shapes where a
series of vertices are linked together by edges. These are called
ensembles because two patterns may seem graphically close to each
other (in terms of shape) when the corresponding attributes vary
slightly (see FIGS. 20-22).
[0178] In FIGS. 20-22, we have two unique patterns--inverted-W
pattern and a Z-shaped pattern. All inverted-W patterns shown in
FIG. 20, FIG. 21 and FIG. 22 look alike--although they are slightly
displaced laterally/vertically and/or stretched/contracted in one
of the two dimensions. All the inverted-W patterns together are
called a pattern ensemble. Note that there could be many more
patterns that are not shown. An important attribute associated with
these patterns is that these are graphical in nature and are
identifiable as similar by observation by the naked eye without any
analysis. Here, we leverage these visual similarities are leveraged
to aid in monitoring and decision making. A few conventions used
are as follows: [0179] a) The rows in FIGS. 20-22 are called
originating events (0) i.e. a component where a condition or event
is triggered; [0180] b) The column of the above figures refer to
destination events (D) i.e. a component where an originating event
concludes; [0181] c) The series of components S1 through Sn are
arranged in such a way that neighboring components offer similar
functionality. This however, is not a hard requirement.
[0182] Finding patterns as discerned by the human-eye is a central
concept in this disclosure. By that we mean two things: [0183] a)
First a 2D shape of some sort can be discerned by naked eye; [0184]
b) If two similar patterns roughly at the same time (e.g. FIGS. 20
and 21 combination), then the region displaying the pattern shows a
"glitch: on the screen.
[0185] This glitch can be simulated by rapidly alternating between
FIG. 21 and FIG. 22 one after the other
[0186] Described now is a method to generate, represent and display
the pattern ensembles.
[0187] A graphical pattern ensemble is a (database) searchable
notation for a visible pattern that represents a layered collection
of graphical symbols. In other words, there might exist a
collection of graphical symbols or patterns whose geometry is split
across multiple tiles or layers. Each tile k is represented by
denoting a collection of vertices, where each vertex that qualifies
has more than one edge incident on it. The pattern ensemble is a
collection of tiles F.sub.1 through F.sub.q.
[0188] A graphical pattern ensemble F.sub.k for layer k comprising
of m associated finger prints and the rolled-up ensemble F
comprising of the layers:
F k = { ( ( f o 1 , f a 1 , f b 1 ) , ( k 11 , k 21 , , k n 1 ) , (
.theta. 11 , .theta. 21 , , .theta. n 1 ) , ( d 11 , d 21 , , d n 1
) ) , ( ( f 02 , f a 2 , f b2 ) , ( k 12 , k 22 , , k n 2 ) , (
.theta. 12 , .theta. 22 , , .theta. n 2 ) , ( d 12 , d 22 , , d n 2
) ) , ( ( f o m , f a m , f b m ) , ( k 1 m , k 2 m , , k n m ) , (
.theta. 1 m , .theta. 2 m , , .theta. n m ) , ( d 1 m , d 2 m , , d
n m ) ) } ##EQU00001## F = { F 1 , F 2 , , F k , , F q } .
##EQU00001.2##
[0189] Here f.sub.oi stands for the origin point in layer i and
there are m origin points in a layer, connected or otherwise. For
each origin point f.sub.oi, f.sub.a; corresponds to another point
in a layer above, adjacent or otherwise i.e. 1 through k-1.
F.sub.bi likewise is for another point in a layer below k, adjacent
or otherwise i.e. k+1 through q. The convention is to stack up all
the tiles in increasing order with the lowest layer on the top of
the pile. The collection of all these is the graphical pattern
ensemble for q layers. In FIG. 23, there are 3 layers, namely S, T
and U. Note that S is lexicographically the lowest layer in this
case. Each event or feature in S is denoted Si and it is arranged
as a square matrix. (Si) and (Si.+-.1) are more or less similar
events or features which can be measured, observed, and/or
check-pointed. Likewise, the other layers U and T that lie under S.
(k.sub.1i, . . . , k.sub.ni) denote the co-occurrence frequency of
tuples f.sub.oi and (d.sub.11, d.sub.ni) where i corresponds to the
layer. f.sub.pq and d.sub.xy are tuples corresponding to points in
layers q and y respectively. (.theta..sub.1i, . . . ,
.theta..sub.n') correspond to the angles between (f.sub.oi,
d.sub.xi) and (f.sub.oi, d.sub.(x+i).sub.y) Note, with circular
wrappings (x=n), x+1 is defined as 1.
[0190] With reference to FIG. 24, provided is an example that has
two layers. Shown is one layer more prominently for clarity.
F1={((f.sub.o,f.sub.a,f.sub.b),(k.sub.1,k.sub.2, . . . ,
k.sub.n),(.theta..sub.1,.theta..sub.2, . . . ,
.theta..sub.n),(d.sub.1,d.sub.2, . . . , d.sub.n))}
where
[0191] f.sub.0={(S.sub.d, S.sub.e)} // a tuple denoting the
origin
[0192] f.sub.a={ } // possibly several tuples denoting point in the
layer above
[0193] f.sub.b={( ).sub.T1, ( ).sub.T4, ( ).sub.T9} // possibly
several tuples denoting points in the layer below
[0194] (By convention, in clock-wise order starting from top-right
quadrant)
[0195] k.sub.i stands for the event interaction frequency // edge
weight
[0196] .theta..sub.i angle of separation between any two adjacent
rays from f, that is (f.sub.o,d.sub.i) and (f.sub.o,
d.sub.i+1).
[0197] d.sub.i destination tuples connected to rays originating
from a given f.sub.o.
[0198] Note that there could be several rows for each F.sub.k
depending on the number of multi-degree vertices in layer k. A set
of such ensembles together is called a multi-layer graphical
pattern ensemble. Notably, other similar ensembles can be searched
out in the database in two broad ways:
[0199] Multi-layer search--take the entire multi-layer graphical
pattern ensemble and search for other layers;
[0200] Layer-by layer search--take a layer of interest and conduct
searches for similar patterns at different positions on the same
layer or another layer where at any given time, there are just one
or two layers being compared.
[0201] To characterize similarity, first considered is single layer
similarity characterization. Subsequently, multi-layer similarity
comparisons are performed as extensions of the single layer
characterization. For single-point GPEs, the comparisons are simple
variations of cosine similarity in vector algebra. Multi-vertex
similarity determination algorithms, which in the special case of
m=n=1, reduces to the single point GPE comparison.
[0202] The most generic similarity notion (where m>=n) is
determining if every point of the sample, including a total of n
points in its GPE matches some point in an original GPE including a
total of m points. For this the algorithm is shown below:
TABLE-US-00001 Algorithm 1 .theta. so = g = 1 D d sg .fwdarw. d og
.fwdarw. k s k o ##EQU00002## Similarity Determination Algorithm
Inputs: the original GPE, the sample GPE, m .gtoreq. n, A is the
set ordering operator Output: sim, the similarity of the sample to
the original 1. for s = 1 to n; reset .theta..sub.s and
.theta..sub.0 2. for o = 1 to m 3. compare .theta..sub.s and
.theta..sub.0; record (.theta..sub.s - .theta..sub.0) 4. next o 5.
record .theta..sub.s,min as min {(.theta..sub.0 - .theta..sub.s)}
.A-inverted..sub.0.epsilon. {1, . . . , m} 6. Add o to set M.sub.0
7. next s 8. if .LAMBDA.(M.sub.0) = {1, . . . , n} and M.sub.0
.epsilon. {1, . . . , m} 9. compute sim = .SIGMA..sub.s=1.sup.n
.theta..sub.s+min 10. end if 11. return sim.
[0203] Other kinds of similarity notions are possible, some based
the order of M.sub.o, i.e., in what order the sample sequentially
matches the original.
[0204] For example:
[0205] If (m=n)>1, .LAMBDA.(Mo)={1, . . . , n} and elements of
Mo are added in strictly increasing or decreasing order then this
is called contiguous similarity. Note that assumed is an increasing
top-down approach for the layer while considering the sample.
[0206] Assuming equi-degree points in the sample and original
(going clock-wise) are found
[0207] Both visual pattern matching and searching through known
patterns automatically helps the operation under partial data as
shown in FIG. 25.
[0208] For example, the partial data can be used in detecting
problems faced with two dialects of a post-script language that can
break workflows in a print-shop. While the two dialects may be
grammatically correct, this will manifest in the GPE interface with
or without the dotted line shown above in FIG. 25. Upon
internalization, both the customer, assumed to be untrained in
problem diagnosis, and the trained technician can narrow down the
post-script dialect problem through the representation in FIG. 25.
Previously, customers refrained from adjusting workflows, often
with other adverse consequences, just because they did not want to
break a fragile workflow.
[0209] According to one exemplary embodiment of this disclosure,
identification, categorization and search of pattern ensembles is
used for target communications associated with a user of a printing
device. Running software on a cloud computing platform allows date
to be collected at a single location. No date is lost. As the data
is collected, a set of ongoing, massive data correlation operations
can also be run to discover patterns.
[0210] According to another exemplary embodiment, running prepress
software in a cloud provides a solution to some technical support
problems:
[0211] By, for example, running workflow software on a cloud
computing platform, the precise state of a workflow can be
captured, as well as its associated document at the exact moment of
job failure. This eliminates the need for technical support to
reproduce the problem in its labs before taking action. All of the
data about the happening is generated and captured, making debug
relatively straightforward.
[0212] With reference to FIG. 26, illustrated is an example of an
exemplary workflow according to this disclosure.
[0213] Step 1 (200): First form a comprehensive unique list of
events relating to user activities (such as activities on the
mobile device--searching, printing, providing input, responding to
suggestions, social network access etc.) and back end server
actions that facilitate user activities (such as suggesting a
friend, such as running a query to identify conditions, storing
output, tagging etc.). Particular set of entities in the context
graph could lead to the definition of user activities and events.
Even back-end server actions can be captured for use with pattern
ensemble detection by extending the context graph to contain server
side responses.
[0214] Step 2 (205): From these events granular components will
need to be delineated that perform functions for the user or
back-end server. Both events and components generated above could
be utilized as the axes of the graphical pattern ensemble view.
While generating the ensemble, for ease of viewing, the components
and events are arranged in decreasing order of similarity. This
will provide clues of whether the user is accessing similar or
completely unrelated services.
[0215] Step 3 (210): Layout the components as axes in a given layer
for representing the GPE in decreasing order of similarity (or some
other metric). Use multiple layers if more dimensions are
necessary. Components may also be connected across layers and
viewed one layer at a time.
[0216] Step 4 (215): Monitor for real events as users utilize the
system (such as printing a greeting card in uPrint using the mobile
phone as interface). Monitor the components that facilitate the
user interactions. Both user and system events are logged as a
time-sequence. Frequencies of events and order of events are
important as well. These will be utilized in generating the pattern
ensemble.
[0217] Step 5 (220): The interaction of one event/component to
another event/component represent a point and the transition to
another point will be determined by the time-sequence that was
logged in the previous step. Criteria such as only considering the
interactions and sequence of events for a single person, group,
time or frequency range, and other dimensions or layers are also
included. Thus the GPE is constructed and over time refined
depending on the parameters and filter criteria.
[0218] Step 6 (225): The GPE will actually be a group of
interrelated patterns so long as a specific set of actions are
considered. The GPE can be visualized and changes in a given layer
will be discernable to the human eye as we go through the
time-sequence of events. The GPE is machine searchable but one can
also watch for glitches in the patterns for ease of use with
condition or fault checking/monitoring.
[0219] Step 7 (230): From within the set of events and components,
a relevant set of interactions may be filtered and all patterns
governing the interactions between those components/events may be
depicted and viewed layer-wise. Patterns will evolve over time and
GPEs need to be updated. As a result the depiction, the ordering of
events and components, the number of dimensions of quantities
measured and tracked and hence the layers are some examples of
entities that will change over time.
[0220] It will be appreciated that variants of the above-disclosed
and other features and functions, or alternatives thereof, may be
combined into many other different systems or applications. Various
presently unforeseen or unanticipated alternatives, modifications,
variations or improvements therein may be subsequently made by
those skilled in the art which are also intended to be encompassed
by the following claims.
* * * * *
References