U.S. patent application number 13/924343 was filed with the patent office on 2014-09-18 for data management platform for digital advertising.
The applicant listed for this patent is Turn Inc.. Invention is credited to Songting CHEN, Ali DASDAN, Hazem ELMELEEGY, Santanu KOLAY, Yinan LI, Yan QI, Peter WILMOT, Mingxi WU.
Application Number | 20140279074 13/924343 |
Document ID | / |
Family ID | 51532086 |
Filed Date | 2014-09-18 |
United States Patent
Application |
20140279074 |
Kind Code |
A1 |
CHEN; Songting ; et
al. |
September 18, 2014 |
DATA MANAGEMENT PLATFORM FOR DIGITAL ADVERTISING
Abstract
A data management apparatus for digital advertising includes a
data integration processor for collecting and storing data from
providers, resolving heterogeneity of the data at schema and data
levels, and performing validity checks of the data; an analytics
processor for receiving validated data from the data integration
processor and providing to users custom, nesting-aware, SQL-like
query language and a library of data mining methods, machine
learning models, and analytical user profiles (AUP); and an
activation processor for encapsulating complex computations
performed in real-time, segment evaluation, and online user
classification using runtime user profiles (RUP).
Inventors: |
CHEN; Songting; (San Jose,
CA) ; DASDAN; Ali; (San Jose, CA) ; ELMELEEGY;
Hazem; (San Mateo, CA) ; KOLAY; Santanu; (San
Jose, CA) ; LI; Yinan; (San Jose, CA) ; QI;
Yan; (Fremont, CA) ; WILMOT; Peter; (San
Francisco, CA) ; WU; Mingxi; (San Mateo, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Turn Inc. |
Redwood City |
CA |
US |
|
|
Family ID: |
51532086 |
Appl. No.: |
13/924343 |
Filed: |
June 21, 2013 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61801001 |
Mar 15, 2013 |
|
|
|
Current U.S.
Class: |
705/14.73 |
Current CPC
Class: |
G06Q 30/0269 20130101;
G06F 40/137 20200101; G06F 40/117 20200101; G06F 16/958 20190101;
G06F 3/0482 20130101; G06N 5/02 20130101; G06Q 30/0201 20130101;
G06F 3/0484 20130101; G06Q 30/0277 20130101 |
Class at
Publication: |
705/14.73 |
International
Class: |
G06Q 30/02 20060101
G06Q030/02 |
Claims
1. A data management apparatus for digital advertising, comprising:
a data integration processor configured for collecting and storing
data from providers, resolving heterogeneity of the data at schema
and data levels, and performing validity checks of the data; an
analytics processor configured for receiving validated data from
the data integration processor and providing to users a custom,
nesting-aware, SQL-like query language and a library of data mining
methods, machine learning models, and analytical user profiles
(AUP); and an activation processor configured for encapsulating
complex computations performed in real-time, segment evaluation,
and online user classification using runtime user profiles
(RUP).
2. The apparatus of claim 1, wherein user online activity data is
sent from a demand-side platform (DSP) and wherein the user online
activity data comprises any of impressions, clicks, and
actions;
3. The apparatus of claim 1, wherein user online activity data is
obtained from an internal DSP and external DSPs.
4. The apparatus of claim 1, wherein a user profile comprises a
central data repository and the profile comprises any of: marketing
campaign data, online behavior data, and customer relations
management (CRM) data.
5. The apparatus of claim 1, wherein the data integration processor
is configured to: upload offline data files to an Hadoop File
System (HDFS); check the offline data files for data type or out of
range errors; and record lineage by joining the data files with the
user profiles based on user id mappings.
6. The apparatus of claim 1, wherein the data integration processor
is configured to use configuration files for instantiating
different loading templates and a centralized catalog, supporting
schema evolvement, to mitigate heterogeneity issues.
7. The apparatus of claim 1, wherein the data integration processor
is configured to consistently save metrics comprising any of files
that were received and stored, records processed, records rejected,
last successful pipeline step, and profiling times in a
database.
8. The apparatus of claim 1, wherein the analytics processor is
configured to employ optimization techniques for AUP queries, said
techniques comprising any of: correlating sub-queries over nested
tables; organizing columns in the user profile in a PAX-like format
to achieve better compression ratios; and multi-query
execution.
9. The apparatus of claim 1, wherein the analytics processor is
configured to incorporate different multi-touch attribution (MTA)
models as user defined functions (UDFs) into a Cheetah Query
Language (CQL).
10. The apparatus of claim 1, wherein the analytics processor is
configured to allow users to perform form their own ad-hoc analysis
to obtain unique insights.
11. The apparatus of claim 2, wherein profile stores are
high-performance key-value stores for RUPs, with keys being user
ids and values being RUPs.
12. The apparatus of claim 1, wherein RUPs are replicated locally
in each data center and globally between data centers to achieve
high availability and local low-latency access.
13. The apparatus of claim 1, wherein the activation processor is
configured to provide a replication bus that incrementally
replicates user events across data centers and distributes the user
events to profile stores to keep RUPs up-to-date.
14. The apparatus of claim 1, wherein the activation processor is
configured to support multiple types of data as any of impression
and click events, structured data events, and arbitrary key-value
pair data events, wherein the data events are available in RUPs in
real-time to use for any of algorithmic computation, decision
making, and analytics, and wherein the data events from RUP are
replicated to AUP.
15. The apparatus of claim 1, wherein the activation processor is
configured to process complex segments, where complex segments are
represented by executable code that is evaluated against the RUP
data in real-time.
16. The apparatus of claim 1, wherein the activation processor is
configured to perform a computation for a particular algorithm that
is complex and requires multiple stages of a computation layer,
comprising a series of real-time MapReduce jobs processing the data
step-by-step, wherein the computation is represented by a
continuous query language or by predefined operators using UDFs,
which operate on RUPs in real-time.
17. The apparatus of claim 1, wherein the activation processor is
configured to generate signals out of the computation layer, store
the signals as unstructured data in RUPs and AUPs, and send back to
clients for any of immediate action or for better user behavior
prediction to achieve better campaign performance.
18. A computer implemented data management method for digital
advertising, comprising: collecting and storing, by a data
integration processor, data from providers, resolving heterogeneity
of the data at schema and data levels, and performing validity
checks of the data; receiving, by an analytics processor, validated
data from the data integration processor and providing to users
custom, nesting-aware, SQL-like query language and a library of
data mining methods, machine learning models, and analytical user
profiles (AUP); and encapsulating, by an activation processor,
complex computations performed in real-time, segment evaluation,
and online user classification using runtime user profiles (RUP).
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. provisional patent
application Ser. No. 61/801,001, filed Mar. 22, 2013, which
application is incorporated herein in its entirety by this
reference thereto.
BACKGROUND OF THE INVENTION
[0002] 1. Technical Field
[0003] The invention relates generally to digital advertising. More
particularly, the invention relates to a data management platform
for digital advertising.
[0004] 2. Description of the Related Art
[0005] Over the last decade, a number of radical changes have
reshaped the worlds of digital advertising, marketing, and media.
The first is an innovation called programmatic buying, which is the
process of executing media buys in an automated fashion through
digital platforms, such as real-time bidding exchanges (RTBEs) and
demand-side platforms (DSPs). This method replaces the traditional
use of manual processes and negotiations to purchase digital media.
Instead, an advertisement (ad) impression is made available through
an auction in a RTBE in real-time. Upon requests from RTBEs, DSPs
then choose to respond with bids and proposed ads on behalf of
their advertisers for this impression. The entire end-to-end buying
process between RTBEs and DSPs typically takes less than 150 ms
including the network time, leaving less than 50 ms for DSPs to run
their runtime pipelines. It is well understood that to make such
dynamic buying decisions optimal, particular data, including user
data, advertiser data, contextual data, plays a central role.
[0006] A second important shift is the prolific use of mobile
devices, social networks, and video sites. As a result, marketers
have gained powerful tools to reach customers through multiple
channels such as but not limited to mobile, social, video, display,
email, and search. There are numerous platforms dedicated to single
channel optimization. For example, video channel platforms aim to
maximize the user engagements with video ads, while social ad
platforms aim to increase the number of fans and likes of a given
product. Regardless of channel, data driven approaches have been
proven to be very effective to lift the campaign performance.
[0007] With the advance of such technologies, one challenge to the
marketers today is that the marketing strategy becomes more
complicated than ever before. While much work has been done to
optimize each individual channel, how different channels interact
with each other is little understood. This is however very
important as customers often interact with multiple touch points
through multiple channels. One main obstacle is that while there
are abundant data to leverage, such data may be in different
platforms and in different forms. As a result, it may be a
non-trivial task to create a global dashboard by extracting
aggregated reporting data from different platforms. Performing even
finer grain analytics across channels may be virtually impossible,
which may be important to the effectiveness, attributions, and
accurate rate of return of different channels.
[0008] Recently, data management platforms (DMPs) have been
emerging as the solution to address the above challenge. A DMP may
be a central hub to seamlessly and rapidly collect, integrate,
manage, and activate large volume of data.
SUMMARY OF THE INVENTION
[0009] An embodiment of the invention comprises a data management
platform (DMP) that integrates the following functionalities:
1. Data integration: A DMP is configured to cleanse and integrate
data from multiple platforms or channels with heterogeneous schema.
Importantly, such integration may have to happen at the finest
granular level by linking the same audience or users across
different platforms. By such functionality, a deeper and more
insightful audience analytics may be obtained across campaign
activities. 2. Analytics: A DMP provides full cross channel
reporting and analytics capabilities. Examples may include, but are
not limited to, aggregation, user behavior correlation analysis,
multi-touch attribution, defined as attributing credit to the
channels which contributed to a final action of an audience, tag
management, analytical modeling, etc. Furthermore, such DMP may be
delivered through cloud-based software-as-a-service (SaaS) to end
users and provide them the flexibility to plug in their own
analytical intelligence. 3. Activation: A DMP is configured to not
only get data in, but also send data out in real-time. In other
words, such DMP may need to make the insights actionable. For
example, such DMP may be configured to perform modeling and scoring
in real-time by combining online and offline data and sending the
data to other platforms to optimize the downstream media and
enhance the customer experience.
[0010] Thus, an embodiment of the invention provides a data
management apparatus for digital advertising. A data integration
processor is provided for collecting and storing data from
providers, resolving heterogeneity of the data at schema and data
levels, and performing validity checks of the data. An analytics
processor is provided for receiving validated data from the data
integration processor and providing to users custom, nesting-aware,
SQL-like query language and a library of data mining methods,
machine learning models, and analytical user profiles (AUP).
Further, an activation processor is provided for encapsulating
complex computations performed in real-time, segment evaluation,
and online user classification using runtime user profiles
(RUP).
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] FIG. 1 is a schematic diagram showing an architecture of a
DMP according to an embodiment of the invention; and
[0012] FIG. 2 is a block schematic diagram of a system in the
exemplary form of a computer system according to an embodiment of
the invention.
DETAILED DESCRIPTION OF THE INVENTION
[0013] An embodiment of the invention comprises a data management
platform (DMP) that integrates the following functionalities:
1. Data integration: A DMP is configured to cleanse and integrate
data from multiple platforms or channels with heterogeneous schema.
Importantly, such integration may have to happen at the finest
granular level by linking the same audience or users across
different platforms. By such functionality, a deeper and more
insightful audience analytics may be obtained across campaign
activities. 2. Analytics: A DMP provides full cross channel
reporting and analytics capabilities. Examples may include, but are
not limited to, aggregation, user behavior correlation analysis,
multi-touch attribution, defined as attributing credit to the
channels which contributed to a final action of an audience, tag
management, analytical modeling, etc. Furthermore, such DMP may be
delivered through cloud-based software-as-a-service (SaaS) to end
users and provide them the flexibility to plug in their own
analytical intelligence. 3. Activation: A DMP is configured to not
only get data in, but also send data out in real-time. In other
words, such DMP may need to make the insights actionable. For
example, such DMP may be configured to perform modeling and scoring
in real-time by combining online and offline data and sending the
data to other platforms to optimize the downstream media and
enhance the customer experience.
[0014] Thus, an embodiment of the invention provides a data
management apparatus for digital advertising. A data integration
processor is provided for collecting and storing data from
providers, resolving heterogeneity of the data at schema and data
levels, and performing validity checks of the data. An analytics
processor is provided for receiving validated data from the data
integration processor and providing to users custom, nesting-aware,
SQL-like query language and a library of data mining methods,
machine learning models, and analytical user profiles (AUP).
Further, an activation processor is provided for encapsulating
complex computations performed in real-time, segment evaluation,
and online user classification using runtime user profiles
(RUP).
[0015] The following is an overview of an exemplary DMP in
accordance with an embodiment of the invention. It has been found
that DMPs are able to handle big data in batch mode, as well as in
real-time, thus unifying techniques from multiple fields of data
science, including databases, data mining, streaming, distributed
systems, key-value stores, and machine learning as disclosed in
K.-C. Lee, B. Orten, A. Dasdan, and W. Li, Estimating Conversion
Rate in Display Advertising from Past Performance Data, in KDD,
pages 768-776, 2012; X. Shao and L. Li. Data-driven Multi-touch
Attribution Models, in KDD, pages 258-264, 2011 ("Shao"); etc.
[0016] The remainder of the discussion herein is organized as a
high-level overview of an embodiment of a DMP and three main
components thereof: data integration, analytics, and
activation.
Audience and Nested Data Model
[0017] In an embodiment of the invention, an audience or user
profile covers available information for a given anonymized user,
including but not limited to, demographics, psychographics,
campaign, and behavioral data. User profile data may be typically
collected from various sources. Such data may be first party data,
i.e. historical user data collected by advertisers in their own
private customer relationship management (CRM) systems, or third
party data, i.e. data provided by third party data partners,
typically each specializing in a specific type of data, e.g. credit
scores, buying intentions, etc. In one embodiment, user profiles
are treated as first class citizen and are the basic units for
offline analytics, as well as for real-time applications.
[0018] In an embodiment of the invention, user profile data may
arrive in various types, formats, and cardinalities, which may be
best captured using a nested relational data model. Logically, each
user profile is one record, where some attributes of this record
could be another table storing certain type of events. In addition
to the digital marketing domain, the use of the nested relational
data model has already gained wide adoption in the field of big
data such as disclosed in S. Melnik, A. Gubarev, J. J. Long, G.
Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel:
Interactive analysis of web-scale datasets. PVLDB, 3(1):330-339,
2010.
[0019] Based on the above-described functionality of DMPs, in
accordance with an embodiment of the invention, a DMP maintains two
versions of the user profiles. First, the analytical user profile
(AUP) is designed for the purpose of offline analytics and data
mining. In an embodiment of the invention, the AUP is stored in a
Hadoop File System (HDFS), such as disclosed in Hadoop, Open Source
Implementation of MapReduce at hadoop.apache.org. Second, a runtime
user profile (RUP) is stored in a globally replicated key-value
store to enable fast and reliable retrieval in few milliseconds for
real-time applications.
System Architecture
[0020] An embodiment of the invention can be understood with
reference to FIG. 1, which shows an overall architecture of a DMP
3402. The DMP 3402 comprises three key components: a data
integration engine 3404; an analytics engine 3406; and a real-time
activation engine 3408, also referred to herein as a runtime
engine.
[0021] The data integration engine 3404, referred to herein as the
Datahub, is responsible for gathering and storing data from first
and third party providers, resolving heterogeneity at schema and
data levels, e.g. disparate user ids, and performing the necessary
validity checks. Once the data are received from the external
partners, such data then flow into the other two components. In an
embodiment, the analytics engine 3406 may be known as Cheetah, in
S. Chen, "Cheetah: A high performance, custom data warehouse on top
of mapreduce," PVLDB, 3(2):1459-1468, 2010 ("Cheetah document") and
has an AUP store as a data layer. The analytics engine 3406
provides the data analysts with a custom, nesting-aware, SQL-like
query language called Cheetah Query Language (CQL), in addition to
a rich library of data mining methods and machine learning models.
In an embodiment, a runtime engine 3408 runs on top of an RUP
store. The runtime engine 3408 encapsulates the complex
computations performed in real-time, such as but not limited to
segment evaluation, online user classification, etc.
Ecosystem
[0022] FIG. 1 also depicts the interaction between the DMP 3402 and
the other components in the digital advertising ecosystem in
accordance with an embodiment of the invention. The DSPs 3410 are
the entities responsible for real-time bidding (RTB) or
programmatic buying of ad space from publishers, i.e. supply side,
on behalf of advertisers 3411, i.e. demand side. The DSPs 3410
interact directly with the runtime engine 3408 of the DMP 3402 to
obtain the information that is necessary for ad selection and bid
optimization. The DSPs 3410 mainly respond to bid requests from the
RTBEs. The RTBEs are the entities where publishers make their
inventory of ad space available for the highest bidders. The RTBEs
have support for public exchanges 3412 and private exchanges 3414.
Unlike public exchanges 3412, private exchanges 3414 provide
publishers 3416 with more control over which advertisers may use
their media channels and which ads may run on their media
channels.
[0023] In an embodiment of the invention, user online activity
data, e.g. in the form of impressions, clicks, and actions, is sent
to the DMP 3402 by the DSPs 3410, e.g. for impression and click
data, and the advertiser, e.g. for action data. Because an
embodiment of the DMP may be integrated both with its own DSP, as
well as other DSPs in the ecosystem, the online data may be
obtained from those external DSPs. In such cases, they are
considered to be third party media providers 3418, analogous to
third party offline data providers 3420.
[0024] In the following discussion, particular, important
components of one or more embodiments of the DMP 3402 are explained
in more detail.
Data Integration
[0025] In an embodiment of the invention, a user profile may be a
central data repository in the DMP. Each profile contains marketing
campaign data, online behavior data, CRM data, etc. Some of such
data are collected online, while others are collected by loading
offline data files, which are typically keyed off disparate user
ids from another platform.
[0026] In an embodiment of the invention, the DMP designed
integration software is referred to as the Datahub and is used to
receive offline data files. At a high level, the Datahub implements
three steps: [0027] Upload offline data files to HDFS; [0028] Check
the offline data files for data type or out of range errors; and
[0029] Record lineage by joining the data files with the user
profiles based on user id mappings.
[0030] In an embodiment of the invention, the Datahub handles
scalability through multiple FTP servers, multi-pipeline concurrent
loading, a Hadoop MapReduce computation model, and its own job
scheduler to prioritize specific jobs. The Datahub also achieves
immunity from bad data by an initial validation of offline data,
thus shielding an embodiment of the DMP from any dirty data.
Additionally, the Datahub uses configuration files for
instantiating different loading templates and a centralized
catalog, supporting schema evolvement, to mitigate heterogeneity
issues. The Datahub consistently saves metrics, such as files that
were received and stored, records processed, records rejected, last
successful pipeline step, and profiling times, in a database. Such
monitoring information enables system alerts, client notifications,
and billing statements. Furthermore, the Datahub may be configured
to recover after failure through a fault tolerance protocol relying
on persistent status files. As well, by leveraging the nested data
model of user profiles, the Datahub may incorporate more custom
logic into a join algorithm, e.g. two data files may easily be
differentiated and loaded incrementally.
Analytics
Cheetah Query Language
[0031] In an embodiment of the invention, analytics over AUPs may
be based on Cheetah, which is a high performance, custom data
warehouse, as disclosed in the Cheetah document, supra. Cheetah has
a SQL-like query language (CQL), which also supports queries over
nested data models. Below is an example query:
TABLE-US-00001 SELECT advertiser, count(*) actions, count(distinct
uid) FROM prof.actions a WHERE( SELECT count(impression id) FROM
prof.impressions b WHERE a.advertiser = b.advertiser and b.ts <
a.ts) > 0 GROUP BY advertiser
[0032] In an embodiment of the invention, there are two nested
tables in the user profile: prof.actions and prof.impressions,
which record user's actions or conversions and impressions,
respectively. Both nested tables have the field, advertiser, to
identify which advertiser the action/impression is related to; and
the field, ts, as the time stamp. Therefore, the query above
applies GROUP-BY to the column advertiser of the nested table
prof.actions, to compute the total occurrence of actions, i.e.
count( ) and the number of users who have the action, i.e.
count(distinct uid), given the WHERE clause indicating at least one
impression from the same advertiser should take place before an
action. The filtering condition is composed of a sub-query, which
calculates the total number of impressions occurring before the
concerned action, i.e. b.ts<a.ts, from the same advertiser, i.e.
a.advertiser=b.advertiser, by querying on the nested table
prof.impressions of the same user profile.
[0033] In an embodiment of the invention, Cheetah employs a number
of optimization techniques for AUP queries. To name a few, but not
to be limiting: [0034] Cheetah optimizes correlated sub-queries
over nested tables as in the above example. [0035] Cheetah
organizes the columns in the user profile in the PAX-like format to
achieve better compression ratios, such as disclosed in A.
Ailamaki, D. J. DeWitt, M. D. Hill, and M. Skounakis. Weaving
Relations for Cache Performance, in VLDB, pages 169-180, 2001.
Cheetah also leverages the Hadoop column store format as disclosed
in Trevni: A column file format,
avro.apache.org/docs/current/trevni/spec.html to avoid scanning
irrelevant data. [0036] Multi-query execution: Cheetah allows
multiple queries to be submitted simultaneously and executed in a
batch mode, where the input data are scanned only once for all
those queries.
Advanced Analytics
[0037] In an embodiment of the invention, CQL allows for SQL-based
aggregations and correlations between different audience events.
Sometimes, marketers look for more advanced analytics, such as
modeling and machine learning. One example is multi-touch
attribution (MTA) as described in Shao, supra.
[0038] In an implementation of an embodiment, MTA is a billing
model that defines how advertisers distribute credit, e.g. customer
purchase, to their campaigns in different media channels, e.g.
video, display, mobile, etc. For example, suppose a user sees a car
ad on a Web browser. Later, the user sees a TV commercial about the
same car again, which makes him more interested. Finally, after the
user sees this ad again on his mobile phone, he takes action and
registers for a test drive. Marketers know that such media channels
may contribute to a final conversion of an audience. However, a
current common practice is last-touch attribution (LTA), where the
last impression, the one on the mobile phone, gets the credit. A
better and fairer advertising ecosystem is expected to distribute
the credit to the channels that contributed to her final action.
This is the so-called multi-touch attribution problem. In an
embodiment of the DMP, different MTA models are incorporated as
user defined functions (UDFs) into CQL. This way, CQL users have
the freedom to feed an MTA algorithm with arbitrary input data.
[0039] In an embodiment of the invention, CQL as well as the data
mining UDFs are exposed to external clients as a data service in
the cloud and are configured such that the external clients may
perform ad hoc analysis and obtain very unique insights on their
own.
Real-Time Activation
Runtime User Profile
[0040] For purposes of understanding herein, RUPs may refer to user
profiles stored in profile stores for real-time applications. In an
embodiment of the invention, as with AUPs, RUPs also have a nested
data model and are updated incrementally and in real-time with new
user events. Profile stores, as with other Not only SQL (NoSQL)
systems, are high-performance, key-value stores for RUPs, with keys
being user ids and values being RUPs. As important runtime
components, profile stores are highly optimized to provide
low-latency read/write RUP access, typically within a few
milliseconds to support peak 1,000,000 queries per second across
multiple, geographically distributed data centers.
[0041] A design of an embodiment of a profile store is inspired by
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman,
A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels'
disclosure entitled, "Dynamo: Amazon's highly available key-value
store," in SOSP, pages 205-220, 2007 and by Voldemort, which may be
found at www.project-voldemort.com/voldemort. In an embodiment of
the invention, a software layer is built on top of Berkeley DB
(BDB) that uses consistent hashing to achieve sharding,
replication, consistency, and fault tolerance. The embodiment of
the profile store also employs flash drives because hard disks may
not be fast enough for the purpose. RUPs are replicated locally in
each data center, as well as globally between data centers, to
achieve high availability and local low-latency access.
[0042] In an embodiment of the invention, to guarantee real-time
synchronized RUPs in every data center, an infrastructure called
the replication bus is built and employed that incrementally
replicates user events across data centers and distributes such to
profile stores to keep RUPs up-to-date. The replication bus is
highly optimized to synchronize tens of billions of events daily
between data centers with an average end-to-end service level
agreement (end-to-end SLA) of within a few seconds.
[0043] Real-Time Processing Pipeline
[0044] It has been found that an important feature of a modern data
management platform (DMP) is its ability to cope with data flow in
real-time. An embodiment of the DMP herein disclosed is equipped
with many real-time data processing components. In an embodiment of
the invention, some real-time DMP components may consist of data,
analytics, user modeling, complex event processing (CEP), and
actionable signal generation.
[0045] In an embodiment of the invention, both AUP and RUP may
store arbitrary user level data in a nested format. Ingress servers
are responsible for receiving and storing data look up user
profiles in real-time and performing mapping between cookies when
necessary. The platform supports multiple types of data as
impression/click events, structured data events, or arbitrary
key-value pair data events. These data events are available in RUPs
in real-time for the platform to use for algorithmic computation
and decision making as well as analytics. Eventually such events
from RUP are replicated to AUP.
[0046] In an embodiment of the invention, the platform supports
multiple real-time operations on the received data. Many of such
operations may be modeled as complex event processing. For example,
one entity might want to find if a user belongs to a particular set
of predefined segments in real-time. The segments are represented
as a complex Boolean expression of attributes defined by some
predefined taxonomy. Often the segments may be significantly more
complicated than simple Boolean expressions, e.g. having some user
behavior constraints such as having seen a display advertisement in
the last seven days. Such complex segments may be represented by
some form of executable code that is evaluated against the RUP data
in real-time.
[0047] Another example use of real-time computation on a user
profile in accordance with an embodiment of the invention involves
evaluating a user against machine-learned models. Such models may
be specified by the users of the DMP in some proprietary format or
by using industry standard model specification language, such as
Predictive Model Markup Language (PMML), an example of which may be
found at en.wikipedia.org/wiki/predictive model markup language. An
example model may predict a car buyer based on the latest online
activity or a person likely to apply for a credit card. Having such
knowledge in real-time may be immensely valuable to clients because
they may use such prediction as signals to bias the campaigns or
take other actions in real-time.
[0048] In an embodiment of the invention, in some cases, a
computation for a particular algorithm may be significantly complex
requiring multiple stages of a computation layer. Such style of
computations may be simply thought of as a series of real-time
MapReduce jobs processing the data step-by-step. The computation is
represented by a continuous query language or by predefined
operators using UDFs, which operate on RUPs in real-time. Such
approach may solve complex tasks, such as learning a classification
model, performing anomaly detection, or performing other data
stream algorithms, such as maintaining top-K elements in a stream,
as disclosed for example in A. Metwally, D. Agrawal, and A. El
Abbadi. Efficient computation of frequent and top-k elements in
data streams. In ICDT, pages 398-412, 2005.
[0049] In an embodiment of the invention, signals generated out of
the computation layer are stored as unstructured data in RUPs and
AUPs and may also be sent back to the clients through egress
servers for immediate action. DSPs or other platforms may
immediately leverage such signals for better user behavior
prediction to achieve better campaign performance.
Conclusion
[0050] Digital advertising has now reached a state where the
pipeline between publishers on the supply side and advertisers on
the demand site necessitates many technology partners to help
publishers and advertisers deal with real-time optimal decisioning
on a huge scale. Among such technology partners, data management
platforms may occupy a prominent role as the hub where data
relevant to reaching the audience over different channels is
integrated, analyzed, and shared. A high-level overview of one or
more embodiments of the DMP as an example demand side platform has
been disclosed. It is contemplated that due to efficiencies gained
through real-time decisioning and the scales involved with more
online usage, the future of advertising may be more real-time,
which may imply more data and components in real-time.
Machine Implementation
[0051] FIG. 2 is a block schematic diagram of a system in the
exemplary form of a computer system 3500 within which a set of
instructions for causing the system to perform any one of the
foregoing methodologies may be executed. In alternative
embodiments, the system may comprise a network router, a network
switch, a network bridge, personal digital assistant (PDA), a
cellular telephone, a Web appliance or any system capable of
executing a sequence of instructions that specify actions to be
taken by that system.
[0052] The computer system 3500 includes a processor 3502, a main
memory 3504 and a static memory 3506, which communicate with each
other via a bus 3508. The computer system 3500 may further include
a display unit 3510, for example, a liquid crystal display (LCD) or
a cathode ray tube (CRT). The computer system 3500 also includes an
alphanumeric input device 3512, for example, a keyboard; a cursor
control device 3514, for example, a mouse; a disk drive unit 3516,
a signal generation device 3518, for example, a speaker, and a
network interface device 3520.
[0053] The disk drive unit 3516 includes a machine-readable medium
3524 on which is stored a set of executable instructions, i.e.
software, 3526 embodying any one, or all, of the methodologies
described herein below. The software 3526 is also shown to reside,
completely or at least partially, within the main memory 3504
and/or within the processor 3502. The software 3526 may further be
transmitted or received over a network 3528, 3530 by means of a
network interface device 3520.
[0054] In contrast to the system 3500 discussed above, a different
embodiment uses logic circuitry instead of computer-executed
instructions to implement processing entities. Depending upon the
particular requirements of the application in the areas of speed,
expense, tooling costs, and the like, this logic may be implemented
by constructing an application-specific integrated circuit (ASIC)
having thousands of tiny integrated transistors. Such an ASIC may
be implemented with CMOS (complementary metal oxide semiconductor),
TTL (transistor-transistor logic), VLSI (very large systems
integration), or another suitable construction. Other alternatives
include a digital signal processing chip (DSP), discrete circuitry
(such as resistors, capacitors, diodes, inductors, and
transistors), field programmable gate array (FPGA), programmable
logic array (PLA), programmable logic device (PLD), and the
like.
[0055] It is to be understood that embodiments may be used as or to
support software programs or software modules executed upon some
form of processing core (such as the CPU of a computer) or
otherwise implemented or realized upon or within a system or
computer readable medium. A machine-readable medium includes any
mechanism for storing or transmitting information in a form
readable by a machine, e.g. a computer. For example, a machine
readable medium includes read-only memory (ROM); random access
memory (RAM); magnetic disk storage media; optical storage media;
flash memory devices; electrical, optical, acoustical or other form
of propagated signals, for example, carrier waves, infrared
signals, digital signals, etc.; or any other type of media suitable
for storing or transmitting information.
[0056] Further, it is to be understood that embodiments may include
performing operations and using storage with cloud computing. For
the purposes of discussion herein, cloud computing may mean
executing algorithms on any network that is accessible by
internet-enabled or network-enabled devices, servers, or clients
and that do not require complex hardware configurations, e.g.
requiring cables and complex software configurations, e.g.
requiring a consultant to install. For example, embodiments may
provide one or more cloud computing solutions that enable users,
e.g. users on the go, to obtain advertising analytics or universal
tag management in accordance with embodiments herein on such
internet-enabled or other network-enabled devices, servers, or
clients. It further should be appreciated that one or more cloud
computing embodiments may include providing or obtaining
advertising analytics or performing universal tag management using
mobile devices, tablets, and the like, as such devices are becoming
standard consumer devices.
[0057] Although the invention is described herein with reference to
the preferred embodiment, one skilled in the art will readily
appreciate that other applications may be substituted for those set
forth herein without departing from the spirit and scope of the
present invention. Accordingly, the invention should only be
limited by the Claims included below.
* * * * *
References