U.S. patent application number 13/949472 was filed with the patent office on 2013-11-21 for automated channel abstraction for advertising auctions.
This patent application is currently assigned to CombineNet, Inc.. The applicant listed for this patent is CombineNet, Inc.. Invention is credited to Craig E. Boutilier, George L. Nemhauser, David C. Parkes, Tuomas Sandholm, Robert L. Shields, JR., William E. Walsh.
Application Number | 20130311304 13/949472 |
Document ID | / |
Family ID | 42166030 |
Filed Date | 2013-11-21 |
United States Patent
Application |
20130311304 |
Kind Code |
A1 |
Boutilier; Craig E. ; et
al. |
November 21, 2013 |
Automated Channel Abstraction for Advertising Auctions
Abstract
In a computer-implemented method of determining an abstraction
of a plurality of differentiated goods available for exchange, data
regarding each differentiated good is stored in a computer storage,
wherein the data regarding each differentiated good includes an
attribute value assigned to at least one attribute of the
differentiated good. A processor of a computer determines a first
abstraction of the plurality of differentiated goods based on the
stored data. The first abstraction includes at least one abstract
good. Each abstract good includes one or more differentiated goods.
At least one abstract good of the first abstraction includes at
least two distinct differentiated goods. The processor determines
for each abstract good a specification for the abstract good based
on the data regarding one or more differentiated goods forming the
abstract good. The processor stores in the computer storage the
specification determined for each abstract good.
Inventors: |
Boutilier; Craig E.;
(Toronto, CA) ; Nemhauser; George L.; (Atlanta,
GA) ; Parkes; David C.; (Cambridge, MA) ;
Sandholm; Tuomas; (Pittsburgh, PA) ; Shields, JR.;
Robert L.; (Pittsburgh, PA) ; Walsh; William E.;
(Pittsburgh, PA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
CombineNet, Inc. |
Pittsburgh |
PA |
US |
|
|
Assignee: |
CombineNet, Inc.
Pittsburgh
PA
|
Family ID: |
42166030 |
Appl. No.: |
13/949472 |
Filed: |
July 24, 2013 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
12616217 |
Nov 11, 2009 |
8515814 |
|
|
13949472 |
|
|
|
|
61113306 |
Nov 11, 2008 |
|
|
|
61151333 |
Feb 10, 2009 |
|
|
|
Current U.S.
Class: |
705/14.71 |
Current CPC
Class: |
G06Q 30/0254 20130101;
G06Q 30/0275 20130101; G06Q 30/02 20130101; G06Q 30/08 20130101;
G06Q 10/087 20130101; G06Q 30/0246 20130101; G06Q 30/0204 20130101;
G06Q 30/0601 20130101 |
Class at
Publication: |
705/14.71 |
International
Class: |
G06Q 30/02 20060101
G06Q030/02 |
Claims
1. A computer-implemented method for determining an allocation of
an abstraction of a plurality of differentiated goods to a set of
offers comprising: (a) storing in a computer storage accessible to
a processor of a computer an abstraction of a supply of
differentiated goods, said abstraction comprising a set of one or
more abstract goods, each abstract good an aggregation comprising
at least one differentiated good, and at least one abstract good in
the abstraction comprises at least two differentiated goods; (b)
storing in the computer storage data about at least one offer to
purchase, where each offer to purchase includes at least one set of
at least one of the plurality of differentiated goods and a price
associated with said set of differentiated goods; (c) storing in
the computer storage a current allocation of none or a portion of
each abstract good to each offer to purchase; (d) the processor
determining for each abstract good whether the current allocation
is feasible or infeasible, wherein the current allocation is
feasible when there is an allocation of differentiated goods to
each offer to purchase assigned a portion of the abstract good in
the current allocation such that: (i) the allocated differentiated
goods are contained in the allocated abstract good; (ii) the
allocated differentiated goods lie within the one or more sets of
goods associated with said offer to purchase; and (iii) the
quantity of allocated differentiated goods meets or exceeds the
portion of the allocation of the abstract good allocated to the
offer to purchase, and wherein the current allocation is infeasible
when there is no such allocation of differentiated goods to each
offer to purchase assigned a portion of the abstract good; (e) when
at least one abstract good is determined to be infeasible in step
(d), the processor determining a new allocation of none or a
portion of each abstract good to each offer to purchase that
satisfies the constraint that, for any abstract good determined to
be infeasible in step (d) and for all offers that are allocated a
portion of said abstract good in the current allocation, the new
allocation of said abstract good to each offer to purchase is
feasible in that an allocation of differentiated goods exists that
does not exceed the supply of any differentiated good, and
designating the new allocation to be the current allocation; and
(f) repeating steps (d) and (e) until a termination condition is
met.
2. The computer-implemented method of claim 1, wherein the
allocation of abstract goods to the set of at least one offer to
purchase in steps (c) and (e) achieves one of the following
objectives: maximizes total payments for a known supply of
differentiated goods; maximizes total estimated payments for an
estimated supply of differentiated goods; maximizes worst-case
payments over a plurality of possible realizations of the estimated
supply of differentiated goods, maximizes risk-adjusted revenue or
expected utility given a distribution on possible realizations of
estimated supply of differentiated goods; or maximizes total social
welfare given estimated utility of buyers associated with offers to
purchase and an estimated supply of differentiated goods.
3. The computer-implemented method of claim 1, wherein the
termination condition is one of the following: each abstract good
is determined in step (d) to be feasible in the current allocation;
a number of repetitions of steps (d)-(e) meets or exceeds a
threshold number of repetitions; a computation time of steps
(d)-(e) meets or exceeds a threshold computation time; a number of
abstract goods that are infeasible in the current allocation in
step (d) falls below a threshold number of infeasible abstract
goods; a maximum amount of infeasibility for every abstract good in
the current allocation in step (d) falls below a threshold amount
of infeasibility, where the degree of feasibility for an abstract
good given an allocation is based on the amount by which the
quantity of supply required of goods to meet an assigned portion of
abstract goods exceeds the available supply; a value of the
objective criteria achieved in determining the new allocation meets
or exceeds a threshold value; a value of the objective criteria
achieved in determining the new allocation meets or falls below a
threshold value; and an absolute difference in the value of the
objective criteria of the new allocation and the current allocation
meets or falls below a threshold value.
4. The computer-implemented method of claim 1, wherein: the
allocation of abstract goods to offers in steps (c) and (e) is
determined by the processor solving a linear or mixed integer
program; the processor determines the feasibility of the current
allocation of the supply of an abstract good by checking the
feasibility of a linear program that assigns to each offer assigned
a portion of an abstract good, a quantity of each differentiated
good that comprises the abstract good and lies within the one or
more sets of goods associated with said offer, subject to a set of
at least one constraint of a type (i) that requires respecting the
available supply of each differentiate good and constraints of a
type (ii) that require meeting or exceeding the portion of the
abstract good allocated to said offer to purchase in the current
allocation; and for each abstract good that is determined to be
infeasible in step (d), the processor determines a second set of
constraints that is jointly infeasible and is comprised of at least
one constraint from the set of constraints of type (i) and at least
one constraint from the set of constraints of type (ii), and then
introducing a new constraint based on this second set of jointly
infeasible constraints to the determination of the new allocation
in step (e).
5. The computer-implemented method of claim 4, wherein the second
set of constraints that are jointly infeasible is computed as a
minimal infeasible set.
6. The computer-implemented method of claim 4, wherein, in
determining whether the allocation of an abstract good in the
current allocation is feasible, the processor constructs a
sub-abstraction of said abstract good, said sub-abstraction
comprising a set of sub-abstract goods, each sub-abstract good
comprising an aggregation of at least one differentiated good from
the abstract good, and at least one sub-abstract good in the
sub-abstraction comprising at least two differentiated goods from
the abstract good.
7. The computer-implemented method of claim 1, wherein: the
differentiated goods available for exchange are advertising slots
on television, internet or other media; and the properties by which
advertising slots are differentiable are of interest to a set of at
least one potential advertiser.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is a divisional of U.S. Ser. No.
12/616,217, filed Nov. 11, 2009, which claims the benefit of U.S.
61/113,306, filed Nov. 11, 2008, and U.S. 61/151,333, filed Feb.
10, 2009, all of which are incorporated herein by reference.
BACKGROUND OF THE INVENTION
[0002] 1. Field of the Invention
[0003] The present invention relates to expressive auctions for the
allocation of differentiated supply. The invention will be
described in connection with advertising auctions, i.e., auctions
for the display of advertisements on computer devices, but applies
more broadly to the allocation of differentiated supply of any
collection of goods (e.g., to bidders or participants in a
marketplace).
[0004] 2. Description of Related Art
[0005] Online advertising has radically changed both the nature of
advertising and the technology used to support the development and
deployment of ad campaigns. While ad targeting and campaign design
is inherently complex, the variety of online advertising services
has only increased this complexity. In particular, the ability to
target ads to specific individuals based on detailed, personalized
online information--information that is simply not available in
broadcast media--presents compelling opportunities and tremendous
technical challenges for ad delivery. For instance, the development
of sophisticated matching and bidding algorithms for sponsored
search, such as position auctions using the generalized second
price (GSP) mechanism, can be viewed as a response to such
opportunities.
[0006] Contextual information about a user that suggests what a
user is looking for and thinking about when engaged in online
activities (such as web search) is extremely valuable to
advertisers. The ability to target their advertisements to those
users where the likelihood of impact is greatest--as suggested by
the current contextual information--can dramatically improve the
effectiveness of ads. The publishers of online information and
providers of online services (such as search engines) have access
to valuable contextual cues that can be used to help target
advertisements.
[0007] This current state of online advertising can be
characterized roughly as follows: Potential advertisers can
interact directly with a publisher or search engine to determine
where (and when) to place their ads, or they can reach a wide set
of publishers by placing their ads via an advertising network.
Generally, an advertiser will indicate the contextual conditions of
interest, the amount they are willing to pay for placement of their
ad or for some event caused by the ad placement (such as a click on
their ad)--assuming the contextual conditions are met--and often a
budget limiting the total amount of spend (and, indirectly, the
number of ads placed).
[0008] One example of this interaction is exemplified by the Google
search engine. An advertiser indicates their interest in specific
users by bidding on particular query keywords, with their bid
indicating a maximal willingness-to-pay per click for different
keyword queries. For instance, "I will pay $0.10 for each query
with search terms basketball+betting but no more than $200 each 24
hours". Advertisers (bidders) compete for the right to display ads,
and the price paid per click for any winning bidder is a function
of the other bids competing for the same (or related) keywords. In
other words, Google, like many other search engines, publishers,
and ad networks, uses an auction to place ads.
[0009] Another example is the display of banner ads through an
advertising network. The network (e.g., as exemplified by
RightMedia) consists of potential advertisers and publishers of web
content/services. Advertisers bid for the right to display ads on
particular types of web sites/pages, while publishers can restrict
the types of ads that can appear on their sites. Again, an auction
is typically used to determine a suitable assignment of ads to
specific web sites/pages. Advertisers can indicate their target
users of interest by specifying appropriate contextual conditions.
For instance, the advertiser might specify that their ad must only
be displayed: on web pages with a particular type of content (e.g.,
as indicated by the presence of keywords, tags, etc.); at a certain
time of day; to users with a particular demographic profile (e.g.,
as verified by publishers of subscription sites/services, or
estimated by the publisher).
[0010] More general expressive advertising auctions allow
advertisers to express extremely complex conditions on the
placement of their ads. In expressive auctions, advertisers can
express their willingness to pay for sequences or sets of ad
impressions, clicks, or other events. Expressiveness forms include:
payment only if minimum targets are met; multiple targets and
tiered payment; temporal sequencing of different ads; substitution
of different web sites or other contextual properties.
[0011] It is important to realize that the supply of advertising
space to advertisers need not be determined by an auction or an
explicit market mechanism. It could be allocated using
non-competitive processes such manually negotiated contracts
between publishers/ad networks and advertisers. However,
advertisers are rarely interested in arbitrary placement of ads:
even in non-market based settings, the contextual information
associated with an ad placement is vital.
[0012] In contrast to sponsored search, the selling of banner ads
(aka. display ads) is still largely approached through manual
negotiation. There are some exceptions to this, with online
exchanges for banner ads established by companies like Right Media
(now part of Yahoo!) and DoubleClick (now part of Google); however,
these exchanges largely deal with lower-value, "remnant" inventory
on web sites. Premium display advertising space (e.g., slots near
the top, or "above the fold", of high traffic, high profile
websites) is sold almost exclusively by non-automated means. The
primary reason for this is a perception that auction/market
mechanisms cannot be made to work for the types of campaign-level
expressiveness required for display ads (e.g., as required by brand
advertisers). This parallels the situation is sourcing, where
advances in modeling and optimization have led to the adoption of
expressive bidding (and expressive bid-taking) for what had
previously been widely viewed as "too valuable" to leave to auction
mechanisms. The expressive auction mechanisms are now used also for
striking strategic long-term contracts on the most valuable parts
of the sourcing spend.
[0013] In campaign-level-expressiveness, a variety of
expressiveness forms are outlined (these include impression
targets, smoothness of delivery, temporal sequencing, complements
and substitutes, and many others). Although sophisticated bidding
strategies can be effective in optimizing some limited forms of
expressive preferences (e.g., long-term budgets) in an inexpressive
auction, arbitrarily large inefficiencies can nevertheless arise.
Allowing richer languages in which advertisers can express their
campaign preferences directly, rather than forcing them into
standard per-impression or per-click bidding models, is critical to
admitting the automated matching and selling of banner ads.
[0014] A key bottleneck remains: the use of expressive bidding
requires optimization to match ad supply with advertisers' demand.
The richer the expressiveness forms, the more complex the
optimization. For example, a stochastic optimization model can
allocate ad supply to advertisers who bid using rich,
campaign-level expressiveness forms. It can explicitly account for
uncertainty in both supply and demand. However, it has long been
recognized that even with very limited forms of expressiveness as
simple as per-impression value/pricing with budget constraints and
bid expiration that optimization is critical to extracting full
value from one's ad inventory. Indeed, using simple myopic
mechanisms like GSP can lead to significant loss in efficiency and
revenue.
SUMMARY OF THE INVENTION
[0015] Herein, an impediment to the use of optimization in ad
auctions, namely, that of channel explosion is addressed. A key
advantage advertisers have in online settings is the ability to
segment the target audience using an enormous variety of
attributes: both static attributes (like user demographic) and
dynamic attributes such as context (e.g., current browsing history,
location) or historical data (e.g., past purchases, activity,
etc.). This means that the number of attributes over which ad
allocation must occur is extremely large. And the number of
specific ad channels to which ads can be assigned--i.e., the number
of distinct attribute instantiations--grows exponentially in the
number of attributes. Any optimization model must (usually quite
explicitly) assign advertisers to explicit channels over time--both
simple linear programming (LP) models that use only budget
constraints and sophisticated mixed-integer (MIP) models use
variables of the form x.sub.j.sup.i to denote the allocation of
some amount of the supply of channel j to advertiser i. For
example, each distinct keyword/query can be a channel; and bids (or
more precisely, slates of bids) are allocated to each query.
Tractability is achieved by focusing on only the few thousand
highest-volume queries. The MIP model of this uses assignment
variables for losslessly "abstracted" channels consisting of (bid,
attribute)-intersections, and is limited to a relatively small
number of channels. These models simply cannot scale directly to
problems involving more than a few thousand channels (e.g., on the
order of 10-15 (Boolean) channel attributes).
[0016] The channel explosion problem can be addressed through the
use of channel abstraction. Intuitively, an abstract channel is any
aggregation of these "concrete" channels (i.e., attribute
instantiations) into a single abstract channel for the purposes of
optimization. During allocation optimization, ads are assigned to
abstract channels rather than concrete channels; as discussed
below, dispatch of ads assigned to an abstract channel will
generally be sensitive to the actual channel, or full attribute
instantiation, in question hence with appropriate abstraction,
exponential reduction in the number of channels can be obtained,
thus rendering optimization practical. Furthermore, a well-chosen
abstraction will often provide very little sacrifice of revenue or
efficiency (often even providing an optimal, lossless abstraction).
Such abstractions should be derived by considering their impact on
value (e.g., efficiency or revenue) as opposed to clustering based
on purely, say, statistical properties of the attributes in
question.
[0017] Proposed herein are computer-implemented techniques for
automatically generating abstractions and for optimization using a
set of abstract channels. The first computer-implemented algorithm
uses a form of column generation to generate an abstraction:
starting with a crude abstraction, the abstraction is gradually
refined by introducing distinctions that have maximal impact on
objective value. Unlike standard column generation, the present
invention determines which collection of columns to add (and
remove). Novel scoring techniques have been developed to do just
this. This computer-implemented algorithm can be used for arbitrary
expressiveness forms, but is designed to take advantage of LP
structure. Also developed is a new computer-implemented constraint
generation algorithm for optimizing an ad allocation MIP using a
specific set of abstract channels (e.g., those generated utilizing
the column generation algorithm described herein). This method
incrementally refines the allocation of bids to abstract channels
by posting constraints to ensure advertisers are assigned only
relevant ad slots. This method will converge, in principle, to an
optimal solution given enough time. However, the technique can be
cut short with an approximate solution, and it can be used to
suggest further channel refinement for purposes of
tractability.
[0018] The invention is a computer-implemented method of
determining an abstraction of a plurality of differentiated goods
available for exchange. Each differentiated good is differentiable
based on at least one property of the good and each property has
associated therewith at least one attribute. The
computer-implemented method can include: (a) storing in a computer
storage accessible to a processor of a computer data regarding each
differentiated good of a plurality of differentiated goods, wherein
the data regarding each differentiated good can include an
attribute value assigned to at least one attribute of the
differentiated good; (b) the processor determines a first
abstraction of the plurality of differentiated goods based on the
data in step (a), wherein the first abstraction comprises at least
one abstract good, each abstract good can include one or more
differentiated goods, and at least one abstract good of the first
abstraction can include at least two distinct differentiated goods;
(c) the processor determines for each abstract good a specification
for said abstract good based on the data regarding one or more
differentiated good comprising the abstract good; and (d) the
processor stores in the computer storage the specification
determined for each abstract good in step (c).
[0019] The data in step (a) can further include a supply quantity
of at least one differentiated good over a time period, wherein the
supply quantity is either known with certainty or estimated.
[0020] The method can further include: (e) storing in the computer
storage data about at least one set of one or more first offers to
purchase, wherein each offer to purchase is either an actual offer
or an estimate of a current or future offer to purchase, and each
offer to purchase can include at least one differentiated good set,
wherein each differentiated good set can include at least one
differentiated good and an associated price, wherein: step (b) can
include the processor determining the first abstraction further
based on the set of first offers to purchase; and the specification
for at least one abstract good in step (c) can include data about a
supply quantity available or predicted to be available for the
abstract good, and that portion of the supply quantity for the
abstract good that satisfies one or more attribute values.
[0021] The specification of the prices related to the one or more
sets of goods includes at least one of the following: a price to be
paid per unit of good in any set; for each set, a price to be paid
per unit of good in that set; a price per unit of good satisfying
some property that is delivered to the buyer; an overall budget
constraint limiting the maximum total payment for the one or more
sets of goods; for each set, a budget constraint on the maximum
total payment for goods in that set; for each set, a price to be
paid for the entire set; or a price to be paid for the satisfaction
of the one of more of constraints on the relationship between goods
within a set or the quantity of goods in a set.
[0022] The at least one differentiated good set in step (e) can
include at least one of the following: at least one property that
each differentiated good in the differentiated good set must
satisfy; at least one constraint on a relationship between the
properties of a first subset of differentiated goods in the
differentiated good set and the properties of a second subset of
differentiated goods in the differentiated goods set; and at least
one constraint on the quantity of differentiated goods in the
differentiated goods set.
[0023] At least one constraint on the quantity of goods in the set
of at least one good associated with the first offer to purchase
includes at least one of the following: a minimum quantity
constraint, a maximum quantity constraint, and a constraint on the
quantity of goods with at least one temporal property.
[0024] The at least one property that each differentiated good in
at least one differentiated good set must satisfy can include at
least one of the following: inclusion in a set of one or more
differentiated goods; the differentiated good has an attribute with
a value from a set of one or more attribute values; and the values
assigned to each of the attributes associated with the
differentiated good satisfies a Boolean formula using operators
AND, OR or NOT.
[0025] The at least one constraint on the relationship between the
properties of the first subset of differentiated goods and the
properties of the second subset of differentiated goods can include
a constraint on the ratio of a quantity of differentiated goods in
the first subset of differentiated goods and a quantity of
differentiated goods in the second subset of differentiated goods,
where the differentiated goods in the first subset of
differentiated goods satisfy a first property and the
differentiated goods in the second subset satisfy a second
property.
[0026] The price associated with at least one differentiated good
set can include at least one of the following: a per-unit price for
each differentiated good in the differentiated good set; a per-unit
price for each differentiated good in the differentiated good set
that satisfies some property of the differentiated good; a budget
constraint on a maximum total payment for differentiated goods in
the differentiated good set; and a total price for the
differentiated good set
[0027] At least one first offer to purchase can include at least
one of the following: a price to be paid per unit of differentiated
good in each differentiated good set associated with the first
offer to purchase; and an overall budget constraint limiting the
maximum total payment for goods across all differentiated good sets
associated with the first offer to purchase.
[0028] The at least one differentiated good and an associated price
in step (e) can include at least one of the following: a constraint
requiring that every good in the set satisfies at least one
property, including either verifiable or stochastically verifiable
properties; a condition requiring that an observable action occurs
in response to the allocation of a good; a condition that defines a
set of allowable substitutes for at least one good; a payment rule
with one of an all-or-nothing bonus, tiered pricing, a trigger
condition and effect; a constraint requiring that a minimum
quantity of goods are allocated; a constraint requiring that a
maximum quantity of goods is allocated; a constraint on the
sequencing of the allocation of goods to an offer to purchase over
a specified interval of time, including at least one of a
smoothing, ramp-up, or sequencing constraint; a payment rule
specifying a positive adjustment in the event that complementary
goods are allocated; a payment rule that can include a counting
constraint on the maximum number of goods for which payment will be
made; a budget constraint on the total payment that can be made
associated with an offer to purchase; an aggregate volume
constraint on the total volume of goods that can be allocated to an
offer to purchase; a frequency constraint on the frequency with
which goods are allocated over some time interval to an offer to
purchase; a joint allocation constraint on the allocation of one or
more goods to the offer to purchase based on the allocation of one
or more related goods to at least one other offer to purchase; and
a payment rule with an adjustment that has at least one
prerequisite that must be satisfied as a condition to adjusting the
payment.
[0029] The method can further include the processor receiving a set
of second offers to purchase, wherein at least one second offer to
purchase can include at least one set of at least one abstract good
and a price associated with said set of abstract goods, where the
abstract goods included in the second offer to purchase are
selected from the abstract goods of the first abstraction
determined in step (b).
[0030] The first offers to purchase can be actual offers to
purchase and the method can further include: (f) the processor
determining an allocation of differentiated goods to at least one
first offer to purchase in response to allocating none or at least
a portion of one or more abstract goods of the first abstraction of
step (b) to the first offer to purchase; (g) the processor
determining a payment rule for the supply of abstract goods
allocated to the first offer to purchase in step (f), wherein said
payment rule is based on the specification of the abstract goods
allocated in step (f) and the at least one differentiated good set
included in the first offer to purchase in step (f); and (h)
storing in the computer storage each allocation of differentiated
goods and the payment rule.
[0031] The payment rule in step (g) can be determined in one of the
following ways: when differentiated goods associated with an
abstract good allocated to each first offer to purchase in step (f)
are allocated randomly, the processor determines for each first
offer to purchase the number of differentiated goods allocated to
the first offer to purchase that are included in the one or more
differentiated good sets associated with the first offer to
purchase, and calculates a payment based on this determination; or
the processor allocates to each first offer to purchase in step (f)
only differentiated goods comprising said abstract good that are
included in each differentiated good set associated with the first
offer to purchase, and calculates a payment based on this
determination.
[0032] The allocation in step (f) can satisfy one of the following
objectives: maximizing total payments given a known supply of
goods; maximizing total estimated payments given an estimated
supply of goods; maximizing worst-case payments over a plurality of
possible realizations of the estimated supply of goods; maximizing
risk-adjusted revenue or expected utility given a distribution on
possible realizations of estimated supply of goods; or maximizing
total social welfare given estimated utility of buyers associated
with offers to purchase and an estimated supply of goods.
[0033] The method can further include (e) storing in the computer
storage data about at least one set of one or more first offers to
trade, wherein each offer to trade is either an actual offer or an
estimate of a current or future offer to trade, each offer to trade
can include at least one set of at least one of the plurality of
differentiated goods, wherein each differentiated good set can
include at least one differentiated good and an associated price,
and each offer to trade is either an offer to purchase or an offer
of sale, wherein step (b) can include the processor determining the
first abstraction further based on the set of first offers to
trade.
[0034] The first offers to trade can be actual offers to trade and
the method can further include: (f) the processor determining an
allocation of differentiated goods to at least one first offer to
trade in response to allocating none or at least a portion of one
or more abstract goods of the first abstraction of step (b) to the
first offer to trade; (g) the processor determining a payment rule
for the allocation of abstract goods to at least one first offer to
trade in response to allocating at least a portion of one or more
abstract goods of the first abstraction in step (b) to said first
offer to trade, wherein said payment rule is based on the
specification of the abstract goods allocated in step (f) and the
at least one differentiated good set included in the first offer to
trade; and (h) storing in the computer storage each allocation of
differentiated goods and the payment rule.
[0035] The method can further include: (f) the processor receiving
a set of second offers to trade; (g) the processor determining an
allocation of differentiated goods to at least one second offer to
trade in response to allocating none or at least a portion of one
or more abstract goods of the first abstraction of step (b) to the
second offer to trade; (h) the processor determining a payment rule
for the allocation of abstract goods to at least one second offer
to trade in response to allocating at least a portion of one or
more abstract goods of the first abstraction in step (b) to said
second offer to trade, wherein said payment rule is based on the
specification of the abstract goods allocated in step (g) and the
at least one set differentiated good set included in said second
offer to trade; and (i) storing in the computer storage each
allocation of differentiated goods and the payment rule.
[0036] The method can further include: (f) the processor receiving
a set of second offers to purchase; (g) the processor determining
an allocation of differentiated goods to at least one second offer
to purchase in response to allocating at least a portion of each
abstract good of the first abstraction of step (b); (h) the
processor determining a payment rule for the supply of abstract
goods allocated to each second offer to purchase in step (g),
wherein said payment rule is based on the specification of the
abstract goods allocated in step (g) and the at least one
differentiated good set included in the second offer to purchase;
and (i) storing in the computer storage each allocation of
differentiated goods and its payment rule.
[0037] The allocation in step (g) can satisfy one of the following
objectives: maximizing total payments given a known supply of
goods; maximizing total estimated payments given an estimated
supply of goods; maximizing worst-case payments over a plurality of
possible realizations of the estimated supply of goods; maximizing
risk-adjusted revenue or expected utility given a distribution on
possible realizations of estimated supply of goods; and maximizing
total social welfare given estimated utility of buyers associated
with offers to purchase and an estimated supply of goods.
[0038] The payment rule in step (h) can be determined in one of the
following ways: when differentiated goods associated with an
abstract good allocated to each second offer to purchase in step
(g) are allocated randomly, the processor determines for each
second offer to purchase the number of differentiated goods
allocated to the second offer to purchase that are included in the
one or more differentiated good sets associated with the second
offer to purchase, and calculates a payment based on this
determination; or the processor allocates to each second offer to
purchase in step (g) only differentiated goods comprising said
abstract good that are included in each differentiated good set
associated with the second offer to purchase, and calculates a
payment based on this determination.
[0039] Determining the first abstraction in step (b) can include:
(i) storing in the computer storage a current abstraction of the
plurality of differentiated goods, wherein said current abstraction
comprises at least one abstract good; (ii) the processor creating
one or more candidate abstractions based on the data in step (a)
and the set of first offers to purchase; (iii) the processor
determining a score for each candidate abstraction based on at
least the set of first offers to purchase; (iv) the processor
replacing the current abstraction with a candidate abstraction that
satisfies a predetermined objective, said objective being either a
maximal score or minimal score; and (v) repeating steps (ii)-(iv)
until a predetermined termination criterion is met, whereupon the
current abstraction is selected as the first abstraction.
[0040] The at least one constraint on a first and second candidate
abstract good that can co-occur in an abstraction requires that no
good appears in more than one abstract good.
[0041] The termination criterion can include one of the following:
a property of the current abstraction meets, is greater than, or is
less than a threshold, where the property is either: the number of
abstract goods in the abstraction, or the improvement in absolute
score relative to the prior current abstraction; the number of
repetitions of steps (ii)-(iv) meets or exceeds a limit; and the
computation time spent in steps (ii)-(iv) meets or exceeds a
limit.
[0042] The first abstraction in step (b) can include: (i) storing
in the computer storage a plurality of candidate abstract goods,
wherein each candidate abstract good can include one or more
differentiated goods; (ii) storing in the computer storage at least
one constraint to preclude the first and second candidate abstract
goods stored in step (i) from both occurring in at least one
abstract good that comprises an abstraction; (iii) the processor
determining one or more candidate abstractions based on the data in
step (a) and the set of first offers to purchase; (iv) the
processor determining a score for each candidate abstraction based
on at least the set of first offers to purchase; and (v) the
processor determining the first abstraction comprising one or more
candidate abstract goods of step (i), wherein the first abstraction
satisfies the at least one constraint of step (ii); and the first
abstraction satisfies either minimal score or maximal score among
all candidate abstractions comprised of one or more candidate
abstract goods from step (i) and the first abstraction satisfies
the at least one constraint of step (ii).
[0043] The method can further include: (f) the processor
determining an allocation of differentiated goods to at least one
first offer to purchase in response to allocating at least a
portion of one or more abstract goods of the first abstraction of
step (b); (g) the processor determining a payment rule for the
supply of abstract goods allocated to each first offer to purchase
in step (f), wherein said payment rule is based on the
specification of the abstract goods allocated in step (f) and the
at least one differentiated good set included in the first offer to
purchase; and (h) storing in the computer storage each allocation
of differentiated goods and the payment rule.
[0044] The method can further include: (f) the processor receiving
a set of second offers to purchase; (g) the processor determining
an allocation of differentiated goods to at least one second offer
to purchase in response to allocating at least a portion of one or
more abstract goods of the first abstraction of step (b); (h) the
processor determining a payment rule for the supply of abstract
goods allocated to each second offer to purchase in step (g),
wherein said payment rule is based on the specification of the
abstract goods allocated in step (g) and the at least one
differentiated good set included in the second offer to purchase;
and (i) storing in the computer storage each allocation of
differentiated goods and its payment rule.
[0045] Determining a score for each candidate abstraction in step
(iii) can include: the processor determining properties of the
abstract goods that comprise the candidate abstraction from the
specification for each abstract good determined in step (c),
wherein the properties of the abstract goods include at least the
total amount of supply available or estimated to be available of
each abstract good and either none or the portion of each abstract
good that satisfies one or more attribute values; the processor
determining an allocation of differentiated goods to at least one
first offer to purchase in response to allocating none or at least
a portion of each abstract good of the candidate abstraction to the
first offer to purchase in a way that is feasible given the
available or estimated supply of the abstract good; and the
processor determining the score of the candidate abstraction as a
sum of payments from all first offers to purchase, wherein said
payments depend on the allocation.
[0046] When the first offers to purchase are actual offers to
purchase, the allocation of differentiated goods to first offers to
purchase computed for the selected first abstraction is output by
the processor to at least one of the following: a display of the
computer or a computer network.
[0047] The payment from each first offer to purchase the allocation
is determined in one of the following ways: when differentiated
goods associated with an abstract good allocated to each first
offer to purchase are allocated randomly, the processor determines
for each first offer to purchase the number of differentiated goods
allocated to the first offer to purchase that are included in the
one or more differentiated good sets associated with the first
offer to purchase, and calculates a payment based on this
determination; or the processor allocates to each first offer to
purchase only differentiated goods comprising said abstract good
that are included in each differentiated good set associated with
the first offer to purchase, and calculates a payment based on this
determination.
[0048] The offers to purchase can be actual offers to purchase, and
the allocation that corresponds to the first abstraction can be
implemented by allocating the abstract goods to offers as dictated
by the allocation, and the payments are those determined in
determining the score for the abstraction.
[0049] Determining the first abstraction in step (b) further can
include: the processor determining a current allocation of the
abstract goods in the current abstraction based on an objective of
maximizing a total payment to be received from the current
allocation and in a way that is feasible given the available or
estimated supply of the abstract goods; the processor determining
at least one candidate abstract good to introduce into the current
abstraction and creating at least one candidate abstraction by
introducing at least one of the candidate abstract goods into the
current abstraction; the processor determining, for each of the at
least one candidate abstract goods, an estimate of the improvement
in total payment that would be realized using a candidate
allocation instead of the current allocation, said candidate
allocation allocating none or at least portion of one or more
abstract goods from the set comprising the current abstraction and
the candidate abstract good, to one or more first offers; and for
each candidate abstraction determining a score based on the
computed estimate in improvement in total payment for each
candidate abstract good into the current abstraction to form the
candidate abstraction.
[0050] The method can further include the processor removing from
each candidate abstraction each abstract good that is in the
current abstraction and which intersects with a candidate abstract
good that is introduced to the current abstraction to create the
candidate abstraction.
[0051] The score of the at least one candidate abstract good that
is introduced into the current abstraction is determined as one of
the following: a maximum reduction in costs of variables
corresponding to the candidate abstract good in a linear
programming solution to the allocation of the current abstraction;
a sum of reduced costs of variables corresponding to the candidate
abstract good in a linear programming solution to the allocation
problem of the current abstraction; a sum of non-negative reduced
costs of variables corresponding to the candidate abstract good in
a linear programming solution to the allocation problem of the
current abstraction; a sum of the reduced costs of variables
corresponding to the assignment of some portion of the candidate
abstract good to each first offer to purchase in a linear
programming solution to the allocation problem of the current
abstraction, wherein at least one variable for each offer is
included in the sum; and a sum of non-negative reduced costs of
variables corresponding to the assignment of some portion of the
candidate abstract good to each first offer to purchase in a linear
programming solution to the allocation problem of the current
abstraction, wherein at least one variable for each offer is
included in the sum.
[0052] Each candidate abstract good can be a child abstract good of
a parent abstract good included in the current abstraction; and
each child abstract good can include a subset of the goods in the
parent abstract good included in the current abstraction.
[0053] The method can further include determining a score for each
child abstract good given the current abstraction as follows: the
processor determining a set of values for the child abstract good,
wherein each child abstract good value is based on at least a
reduced cost of each variable corresponding to the child abstract
good in a linear programming solution to the allocation problem for
the abstract goods that comprise the current abstraction and on the
total supply of differentiated goods available in the child
abstract good; the processor determining a set of values for a
complement child abstract good of the child abstract good, wherein
said complement child abstract good comprises each differentiated
good in the parent abstract good that was not included in the child
abstract good, wherein each complement child abstract good value is
based on the reduced cost of each variable corresponding to the
complement of the child abstract good in a linear programming
solution to the allocation problem for the abstract goods that
comprise current abstraction and on the total supply of
differentiated goods included in the complement child abstract
good; and the processor computing the score for the child abstract
good based on the set of values for the child abstract good and the
set of values for the complement child abstract good.
[0054] Each abstract good in each current abstraction can be
associated with a property. The property of each abstract good can
be determined from values assigned to one or more attributes of the
differentiated goods included in the abstract good or by a Boolean
formula using operators AND, OR or NOT over such assignments.
Determining the first abstraction in step (b) further can include:
the processor selecting at least one parent abstract good from the
current abstraction and, for each selected parent abstract good,
selecting at least one attribute-value pair; the processor
determining for each selected parent abstract good and each
attribute value pair selected for that parent abstract good: (a) a
child abstract good that can include differentiated goods in the
parent abstract good that satisfy the attribute-value pair, wherein
a property associated with the child abstract good corresponds to
the Boolean conjunction of the parent abstract good's property and
its attribute-value pair; and (b) a complement child abstract good
that can include differentiated goods in the parent abstract good
that do not satisfy the attribute-value pair, wherein a property
associated with the complement child abstract good corresponds to
the Boolean conjunction of the parent abstract good's property and
the negation of its attribute-value pair; and the processor
determining at least one candidate abstraction by adding to the
current abstraction the child abstract good and the complement
child abstract good corresponding to the selected parent abstract
good and removing the selected parent abstract good from the
current abstraction; and the processor determining a score for each
candidate abstraction based on the reduced costs of the variables
corresponding to the child abstract good and the reduced costs of
the variables corresponding to the complement child abstract good
in a linear programming solution to the allocation problem that
assigns abstract goods in the current abstraction to the set of
first offers to purchase.
[0055] Each abstract good in each current abstraction is associated
with a property. The property of each abstract good can be
determined from values assigned to one or more the attributes of
the differentiated goods included in the abstract good or by a
Boolean formula using operators AND, OR or NOT over such
assignments. Determining the first abstraction in step (b) can
further include the processor selecting at least one parent
abstract good from the current abstraction and, for each selected
parent abstract good, the processor determining a selected target
refinement of the selected parent abstract good, wherein the
selected target refinement is a child abstract good that is a
subset of the parent abstract good.
[0056] The selected target refinement of the selected parent
abstract good can be determined as follows: (i) storing in the
computer storage a current target refinement corresponding to the
parent abstract good and associating with the current target
refinement the property associated with the parent abstract good;
(ii) storing in the computer storage a set of one or more
attribute-value pairs, wherein each attribute value pair comprises
an attribute used to describe properties of differentiated goods
and a value assigned to said attribute; (iii) the processor
determines a set of one or more candidate conjunctive target
refinements, wherein for each attribute-value pair in step (ii) the
candidate conjunctive target refinement is a child abstract good
comprising the differentiated goods in the parent abstract good
that satisfy the Boolean conjunction of the property associated
with the current target refinement and said attribute-value pair,
and associating with each candidate conjunctive target refinement
the property comprised of the Boolean conjunction of the property
associated with the current target refinement and said
attribute-value pair; (iv) the processor determines for each
candidate conjunctive target refinement, a complement child
abstract good comprising the differentiated goods in the parent
abstract good that do not satisfy the conjunction of the property
associated with the current target refinement and the
attribute-value pair for the candidate conjunctive target
refinement; (v) the processor determines a set of one or more
candidate disjunctive target refinements, wherein for each
attribute-value pair in step (ii) the candidate disjunctive target
refinement is a child abstract good comprising the differentiated
goods in the parent abstract good that satisfy the Boolean
disjunction of the property associated with the current target
refinement and said attribute-value pair, and associating with each
candidate disjunctive target refinement the property comprised of
the Boolean disjunction of the property associated with the current
target refinement and said attribute-value pair; (vi) the processor
determines for each candidate disjunctive target refinement a
complement child abstract good comprising the differentiated goods
in the parent abstract good that do not satisfy the disjunction of
the property associated with the current target refinement and the
attribute-value pair for the candidate disjunctive target
refinement; (vii) the processor determines a score for each
candidate target refinement in the set of one or more candidate
conjunctive target refinements in step (iii) and in the set of one
or more candidate disjunctive target refinements in step (v),
wherein each score is based on a reduced costs of the variables
corresponding to the child abstract good and its complement child
abstract good in a linear programming solution to the allocation
problem that assigns abstract goods in the current abstraction to
the set of first offers to purchase; (viii) the processor
designates as the current target refinement the target refinement
from step (vii) that has satisfies a predetermined objective, the
predetermined objective being either a maximal score or a minimal
score, from the set of one or more candidate conjunctive target
refinements in step (iii) and the set of one or more candidate
disjunctive target refinements in step (v); and (ix) repeating
steps (ii)-(viii) until a second predetermined termination
condition is met, whereupon the current target refinement becomes
the selected target refinement of the parent abstract good.
[0057] The processor can determine the set of one or more candidate
abstractions by constructing a candidate abstraction from the
current abstraction for each of the selected parent abstract goods
by adding the child and complement child abstract goods
corresponding to the selected target refinement for the selected
parent abstract good to the current abstraction, and removing the
selected parent abstract good from the current abstraction.
[0058] The second predetermined termination criterion can include
one of the following: a score computed for the new target
refinement meets or exceeds some maximum threshold; a score
computed for the new target refinement meets or is less than some
minimum threshold; a difference in the score of the new target
refinement and the score of the current target refinement is below
some minimum threshold; a number of repetitions of steps
(ii)-(viii) exceeds a maximum number of repetitions; a computation
time of steps (ii)-(viii) meets or exceeds a maximum computation
time.
[0059] In the method the differentiated goods available for
exchange can be advertising slots on TV, internet or other media;
and the properties by which advertising slots are differentiable
are of interest to a set of at least one potential advertiser.
[0060] In the method, the properties of an advertising slot include
at least one of the following: a time at which the advertising slot
is displayed to one or more members of the slot's audience; an
expected or actual audience size for the advertising slot; an
expected or actual demographic property of the slot's audience,
such demographic property including at least one of the following:
gender, income, geographic location, number of times audience
members have previously viewed the advertisement allocated to that
slot, or related advertisements; in the case of television, a show
or show genre in which the advertising slot occurs; in the case of
the Internet, content properties of a web page or web site on which
the advertising slot occurs; a nature or content of advertisements
displayed in temporally proximal advertising slots to said
advertising slot; a nature or content of advertisements displayed
in spatially proximal advertising slots to said advertising slot on
a web page; and past trackable behavior of an audience member,
including television viewership of previous shows or
advertisements, previous visits to the web site or page on which
the advertising slot occurs, or the pattern of previous visits to
other web sites or web pages.
[0061] At least one offer to trade can be an offer to purchase that
can include at least one set of one or more advertising slots and
an associated price, and can further include at least one of the
following: in connection with the Internet, a specification of at
least one property of web-sites on which the one or more
advertising slots occur; in connection with the Internet, a
specification of at least one property of the content or meta-tags
of web-pages or web-sites on which the one or more advertising
slots occur; in connection with the Internet, a specification of at
least one temporal property of the one or more advertising slots;
in connection with the Internet, a specification of at least one
demographic property of at least one user who views the one or more
advertising slots; in connection with the Internet, a specification
of at least one behavioral property of at least one user who views
the one or more advertising slots, said behavioral property
describing past web browsing, click-stream, or purchasing behavior
of said user; in connection with the Internet, a specification of
at least one constraint on the advertisements that can be shown in
other advertising slots spatially or temporally proximate to the
one or more advertising slots; in connection with the Internet, a
specification of a minimum quantity of the one or more advertising
slots; in connection with the Internet, a specification of a
maximum quantity of the one or more advertising slots; in
connection with the Internet, a specification of at least one
temporal constraint on the timing of two or more advertising slots
or the advertisements displayed in said advertising slots; in
connection with the Internet, a specification of a price for a set
of advertising slots based on per-impression, per-click, or
per-action pricing; in connection with the Internet, a
specification of a single price for an entire set of advertising
slots; in connection with the Internet, a specification of price
discounts or price bonuses based on reaching designated advertising
slot volume thresholds; in connection with television, a
specification of at least one property of the TV shows on which the
one or more advertising slots occur; in connection with television,
a specification of at least one property of the content of the TV
shows on which the one or more advertising slots occur; in
connection with television, a specification of at least one
demographic property of the audience of the TV show on which the
one or more advertising slots occur; in connection with television,
a specification of at least one behavioral property of at least one
user who views the one or more advertising slots, said behavioral
property describing the past viewing behavior, remote control
usage, or personal video recorder usage of said user; in connection
with television, a specification of at least one constraint on the
advertisements that can be shown in other advertising slots in the
same or temporally proximate commercial breaks as the one or more
advertising slots; and in connection with television, a
specification of a price for a set of advertising slots based on
expected or actual demographic data of the TV audience.
[0062] Abstract goods can be represented using Boolean Decision
Diagrams (BDDs), the available supply of differentiated goods
satisfying specific attributes can be represented using BDDs, and
new abstract goods can be determined using standard BDD
algorithms/software for creating logical combinations of
formula.
[0063] The properties of each abstract good determined from the
specification of differentiated goods can be computed by
manipulating the BDD representation of each abstract good and the
BDD representation of supply.
[0064] The invention is also a computer-implemented method for
determining an allocation of an abstraction of a plurality of
differentiated goods to a set of offers. the method includes: (a)
storing in a computer storage accessible to a processor of a
computer an abstraction of a supply of differentiated goods, said
abstraction comprising a set of one or more abstract goods, each
abstract good an aggregation comprising at least one differentiated
good, and at least one abstract good in the abstraction comprises
at least two differentiated goods; (b) storing in the computer
storage data about at least one offer to purchase, where each offer
to purchase can include at least one set of at least one of the
plurality of differentiated goods and a price associated with said
set of differentiated goods; (c) storing in the computer storage a
current allocation of none or a portion of each abstract good to
each offer to purchase; (d) the processor determining for each
abstract good whether the current allocation is feasible or
infeasible, wherein the current allocation is feasible when there
is an allocation of differentiated goods to each offer to purchase
assigned a portion of the abstract good in the current allocation
such that: (i) the allocated differentiated goods are contained in
the allocated abstract good; (ii) the allocated differentiated
goods lie within the one or more sets of goods associated with said
offer to purchase; and (iii) the quantity of allocated
differentiated goods meets or exceeds the portion of the allocation
of the abstract good allocated to the offer to purchase, and
wherein the current allocation is infeasible when there is no such
allocation of differentiated goods to each offer to purchase
assigned a portion of the abstract good; (e) when at least one
abstract good is determined to be infeasible in step (d), the
processor determining a new allocation of none or a portion of each
abstract good to each offer to purchase that satisfies the
constraint that, for any abstract good determined to be infeasible
in step (d) and for all offers that are allocated a portion of said
abstract good in the current allocation, the new allocation of said
abstract good to each offer to purchase is feasible in that an
allocation of differentiated goods exists that does not exceed the
supply of any differentiated good, and designating the new
allocation to be the current allocation; and (f) repeating steps
(d) and (e) until a termination condition is met.
[0065] The allocation of abstract goods to the set of at least one
offer to purchase in steps (c) and (e) can achieve one of the
following objectives: maximizes total payments for a known supply
of differentiated goods; maximizes total estimated payments for an
estimated supply of differentiated goods; maximizes worst-case
payments over a plurality of possible realizations of the estimated
supply of differentiated goods, maximizes risk-adjusted revenue or
expected utility given a distribution on possible realizations of
estimated supply of differentiated goods; or maximizes total social
welfare given estimated utility of buyers associated with offers to
purchase and an estimated supply of differentiated goods.
[0066] The termination condition can be one of the following: each
abstract good is determined in step (d) to be feasible in the
current allocation; a number of repetitions of steps (d)-(e) meets
or exceeds a threshold number of repetitions; a computation time of
steps (d)-(e) meets or exceeds a threshold computation time; a
number of abstract goods that are infeasible in the current
allocation in step (d) falls below a threshold number of infeasible
abstract goods; a maximum amount of infeasibility for every
abstract good in the current allocation in step (d) falls below a
threshold amount of infeasibility, where the degree of feasibility
for an abstract good given an allocation is based on the amount by
which the quantity of supply required of goods to meet an assigned
portion of abstract goods exceeds the available supply; a value of
the objective criteria achieved in determining the new allocation
meets or exceeds a threshold value; a value of the objective
criteria achieved in determining the new allocation meets or falls
below a threshold value; and an absolute difference in the value of
the objective criteria of the new allocation and the current
allocation meets or falls below a threshold value.
[0067] The allocation of abstract goods to offers in steps (c) and
(e) can be determined by the processor solving a linear or mixed
integer program. The processor can determine the feasibility of the
current allocation of the supply of an abstract good by checking
the feasibility of a linear program that assigns to each offer
assigned a portion of an abstract good, a quantity of each
differentiated good that comprises the abstract good and lies
within the one or more sets of goods associated with said offer,
subject to a set of at least one constraint of a type (i) that
requires respecting the available supply of each differentiate good
and constraints of a type (ii) that require meeting or exceeding
the portion of the abstract good allocated to said offer to
purchase in the current allocation. For each abstract good that is
determined to be infeasible in step (d), the processor determines a
second set of constraints that is jointly infeasible and is
comprised of at least one constraint from the set of constraints of
type (i) and at least one constraint from the set of constraints of
type (ii), and then introducing a new constraint based on this
second set of jointly infeasible constraints to the determination
of the new allocation in step (e).
[0068] The second set of constraints that are jointly infeasible
can be computed as a minimal infeasible set.
[0069] In determining whether the allocation of an abstract good in
the current allocation is feasible, the processor can construct a
sub-abstraction of said abstract good, said sub-abstraction
comprising a set of sub-abstract goods, each sub-abstract good
comprising an aggregation of at least one differentiated good from
the abstract good, and at least one sub-abstract good in the
sub-abstraction comprising at least two differentiated goods from
the abstract good.
[0070] The differentiated goods available for exchange are
advertising slots on TV, internet or other media; and the
properties by which advertising slots are differentiable are of
interest to a set of at least one potential advertiser.
BRIEF DESCRIPTION OF THE DRAWINGS
[0071] FIG. 1 is a block diagram of an exemplary computer that can
be utilized either alone or as part of a computer network for
implementing the present invention;
[0072] FIG. 2 is an exemplary search tree that can be utilized for
choosing a refinement of an abstract channel by generating split or
child abstract channels;
[0073] FIG. 3 is a representation of an overall refinement process
for forming and scoring child/split abstractions of a parent
abstraction; and
[0074] FIG. 4 is a graph showing the fraction of upper bound on
optimal allocation value obtained after generation of a given
number of abstract channels (abstract goods).
DETAILED DESCRIPTION OF THE INVENTION
[0075] The present invention will be described with reference to
the accompanying figures.
[0076] With reference to FIG. 1, the present invention is embodied
in computer software which executes on one or more standalone or
networked computers 2. Each computer 2 includes a microprocessor or
processor 4, a local or remote computer storage 6, and an
input/output system 8. Each computer 2 can also include a media
drive 10, such as a disk drive, CD-ROM drive, and the like. Media
drive 10 can operate with a computer storage medium 12 capable of
storing the computer software that embodies the present invention,
which computer software is able to configure and operate computer 2
in a manner to implement the present invention. Input/output system
8 can include a keyboard 14, a mouse 16, and/or a display 18.
Computer 2 is exemplary of standalone or network computers capable
of executing the computer software of the present invention and is
not to be construed as limiting the invention.
[0077] An example of the concept of abstract goods as described
herein, their value, and the difficulties that arise when
allocating them will now be described.
[0078] Suppose a supply of widgets is offered for sale by auction.
The widgets have three different attributes, Color, Size and
Quality, with the different attributes each taking the following
different values: Color can be red, green, or blue; Size can be
small, medium or large; and Quality can be high-quality or
low-quality.
[0079] There are 18 different widget configurations in this
example: red, small, high-quality; red, small, low-quality; red,
medium, high-quality; blue, large, low-quality; and so on. But as
the number of attributes grows, the number of configurations grows
exponentially. In general, with N attributes, each taking on K
distinct values, there will be K.sup.N configurations. (E.g., if
N=20, and K=3, then there will be roughly 3.5 million distinct
configurations.)
[0080] In principle, a seller must determine how to allocate each
of these different configurations to bidders in way that maximizes
revenue. This allocation problem can be computationally very
difficult, even practically unsolvable, as the number of distinct
widget configurations grows. In accordance with abstraction
techniques described herein, the allocation of these different
configurations is addressed by aggregating distinct goods--in this
case, widget configurations--into sets of (generally, related)
goods, and allocating portions of the smaller sets of goods to
bidders.
[0081] For example, group the widgets into three sets: W1, the set
of red, small widgets (either high or low quality); W2, the set of
blue, large widgets (either high or low quality); and W3, all other
widgets.
[0082] Each of these sets of widgets is called an abstract good.
The supply of abstract good W1 is equal to the supply of the
differentiated widgets (red, small, low quality, or red, small,
high quality) that make it up. An especially convenient way to
construct and describe abstract goods is by reference to the
attributes satisfied by its elements. In this example, [0083] W1 is
characterized by the logical (Boolean) formula: Color=red AND
Size=small; [0084] W2 is characterized by the logical (Boolean)
formula: Color=blue AND Size=medium; and [0085] W3 is characterized
by the logical (Boolean) formula: NOT(Color=red AND Size=small) AND
NOT (Color=blue AND Size=medium)
[0086] These logical formulas are the "properties" corresponding to
the abstract goods W1, W2, and W3. For example, the abstract good
W1 comprises those widgets satisfying the property "Color is red
and Size is small." Each abstract good can be characterized
alternatively as a particular set of differentiated goods, or as a
specific property, namely, the property defining which goods belong
to that set.
[0087] An abstraction is any aggregation of differentiated goods
into a smaller collection of abstract goods such as these. In this
example, the "abstraction" is the set of abstract goods {W1, W2,
W3}. One value of such an abstraction is that is provides a simpler
way of describing the available inventory. For instance, one might
limit bidders to bidding on abstract goods instead of arbitrary
widget configurations.
[0088] This would allow for simpler communication of bids, and
could also be used to promote competition.
[0089] Another benefit of such an abstraction arises if the seller
allocates the supply of abstract widgets instead of the supply of
actual widgets to bidders. This makes the optimization problem
considerably easier. Instead of considering how many widgets of
each of the 18 different configurations should be allocated to each
bidder, the seller can now consider how to allocate the supply of
the 3 abstract widgets to each bidder. Once the allocation of this
abstract supply is determined, the supplier can then either
allocate actual widgets from the abstract supply so-allocated to
each bidder randomly, or can attempt to only allocate widgets of
interest to each bidder.
[0090] The techniques disclosed herein describe how to do both.
[0091] This also illustrates a potential drawback of allocating
abstract goods to bidders rather than allocating actual
differentiated goods. Suppose a bidder B is interested in red,
small, high-quality widgets. By allocating abstract widgets to
bidder B using the abstraction above, the seller has no means of
allocating only red, small, high quality widgets to B. Instead, the
seller is forced to allocate abstract widgets W1 to B, leading to a
situation in which some of the (differentiated) widgets do not meet
B's needs. The techniques disclosed herein resolve this issue in
two ways.
[0092] First, the techniques choose the abstraction very carefully
to minimize the amount of "wasted supply" assuming random
allocation. For example, actual widgets from the abstract supply of
W1 are allocated randomly to B.
[0093] If the supply of widgets is distributed such that most
small, red widgets are of high quality, then most of the widgets in
W1 (all of which are small and red) will be high quality and thus,
most widgets allocated to B will satisfy B's needs. Similarly,
suppose that most bidders who desire small, red widgets are
indifferent with respect to quality. In this case, allocating
abstract widgets from W1 will meet the full demands of such
bidders. The techniques described hereinafter for creating
abstractions, using data about both supply and demand
characteristics, will ensure that allocating using abstraction
creates as little "waste" as possible while keeping the number of
abstract goods to be allocated as small as possible.
[0094] Second, once abstract goods are allocated to a bidder, the
constraint generation technique described herein will determine how
best to allocate the differentiated goods constituting an abstract
good in way that ensures each bidder receives only differentiated
goods that it desires.
[0095] One important application of the techniques is in the
exchange of advertising inventory, for example, TV advertising
slots or internet display advertising slots. For example, a TV
network may have tens of thousands of 30-second prime-time
advertising slots for sale over its prime-time season, and hundreds
of potential advertisers interested in them. Each advertising slot
is characterized by a number of attributes, such as, without
limitation: what day of the week, what week of the year, what time,
what position in a commercial break, the genre of the show in which
it appears, the identity of the show, various demographic elements
(e.g., age, gender, geographic distribution, income level, etc.),
and so on.
[0096] Of course, predicted and actual ratings, broken down by
various demographic elements are vital, characterizing the
(predicted) amount of inventory available. Different advertisers
have interest in different attributes, and each will generally
desire a collection of slots satisfying some different combination
of slot properties to meet its advertising campaign needs. The
optimal allocation problem is completely intractable for problems
of a decent size. Slot (or channel) abstraction aggregates these
different channels into groups, and allows the allocation problem
to become much more tractable. As above, the abstraction is
desirably chosen very carefully to ensure that advertiser demand is
met as effectively as possible.
[0097] As a very simple example, a TV network might aggregate all
of the ad slots on a particular TV show into a single abstract
channel when Advertisers who desire slots on that show have no
specific preferences for timing of their ads shown on that show. It
might aggregate all the TV shows within a specific genre (e.g.,
crime drama), while still distinguishing day, week, and slot
position, if no advertiser has specific preferences for shows
within that genre.
[0098] Finally, while the examples above are described in
connection with an auction (forward auction), the benefits of
abstraction apply equally well in a reverse auction in which a
single buyer solicits offers from multiple sellers/supplier, or in
a two-sided exchange with multiple buyers and sellers.
[0099] Having thus generally described the present invention, the
detailed techniques (algorithms) for implementing various aspects
of the present invention to realize the present invention will now
be described.
[0100] Channels: Combinations of Contextual Attributes.
[0101] As described above, the set of contextual properties (e.g.,
search terms in a search engine; user or web site attributes in
banner ads) is the key determinant of the value of an ad placement
to the advertiser and the revenue generated for the publisher or ad
network. Let a contextual attribute be any attribute associated
with either the display of an advertisement, or with some event
(such as a click, purchase, or other measurable/estimable user
action) that an advertiser is allowed to include within a condition
on their ad placement(s), or the payment associated with (the set
of) ad placement(s). Let F be the set of such attributes. Each
attribute F.sub.i.di-elect cons.F has an associated set (or domain)
of attribute values Dom(F.sub.i), corresponding to the values that
attribute can take. For example, "Medical Content on Web Page" is a
attribute that takes values "true" or "false"; "query contain
keyword betting" is another boolean attribute; "time of day" is a
multivalued attribute that could take values 1 AM, 2 AM, 3 AM,
etc.; "user location" is a attribute that could be determined
roughly in some contexts (e.g., using IP address) or very precisely
in others (e.g., using cellular or GPS localization for users of
mobile devices).
[0102] Each ad placement (or event caused by such) has a value for
each attribute F.sub.i.di-elect cons.F of interest.
[0103] This allows the potential ad supply to be broken into supply
channels. Formally, a channel c is any instantiation of the set of
attributes:
c.di-elect cons..times.{Dom(F.sub.i):F.sub.i.di-elect cons.F}
The supply S(c) of channel c is the number of user impressions
expected to satisfy the contextual conditions dictated by c. These
channels are the differentiated goods that are offered for sale by
the publisher. This supply is dependent on the time frame of
interest, and is typically estimated using statistical models of
user behavior (e.g., search behavior in a search engine, or viewing
habits in publisher's site).
[0104] Notice that the number of channels grows exponentially in
the number of attributes (if they have discrete and finite
domains).
[0105] Optimization for Advertising Assignment.
[0106] By far the most common way in which ads are allocated in
online settings is through auctions using relatively simple schemes
to decide on winners (i.e., which advertisers are awarded which ad
slots on a web page or search page) and the price paid by the
winning advertisers for the ad placement (or event caused by the
placement, e.g., click). One common auction scheme is the
generalized second price auction (GSP). Some instantiations of this
mechanism adjust the bids using some quality score (e.g., the
predicted click rate for the ad).
[0107] A key drawback of such auction schemes is their myopic
nature. The supply of ad slots available over a particular period
of time is not allocated to bids in a way that maximizes an
objective such as efficiency (best use of the slots) or publisher
(or ad network) revenue. Instead as ad slots become available--as
users view web pages or access web services--they are myopically
allocated to the "highest bidder" using schemes such as GSP.
[0108] It is well-known that myopic allocation of ad slots to
bidders in this fashion is generally suboptimal. For example, if
bidders have budgets, such myopic schemes can often fail to exhaust
a bidder's budget, thus reducing ad revenue (and in most cases,
using ad slots inefficiently). Optimization techniques have been
proposed as a means to overcome this problem. Roughly speaking,
optimization techniques attempt to allocate the entire supply of ad
slots over some specified time interval to the set of bidders in a
way that maximizes some objective (such as revenue or reported
efficiency). The key distinction from myopic models is that in
optimization techniques the allocations of future slots is done
jointly with active slots, so that more effective use of slots
results.
[0109] Most sophisticated optimization require the explicit
allocation of bids (or bidders or advertisers) to ad slots within
specific channels. Specifically, quantities (or variables)
x.sub.j.sup.i relating the amount of supply from channel j that is
allocated to advertiser or bid i, for all channels j and
advertisers i, must be considered to determine an optimal
allocation. The quantity x.sub.j.sup.i can safely be ignored if
advertiser i has no interest in channel j. But since each channel
is of interest to at least one bidder, as discussed below, there is
at least one such relevant quantity x.sub.j.sup.i for each channel
j.
[0110] The enumeration of these channels within an algorithmic
optimization model can cause tremendous difficulties. As noted
above, the number of channels grows exponentially in the number of
attributes of interest to advertisers. Optimization techniques
involving linear programming, mixed integer programming, heuristic
search, etc. have computational complexity that scales at least
polynomially in the number of channels (LP methods), and often
exponentially (MIP methods). Thus considering allocations of
advertisers to all concrete channels is inherently problematic.
[0111] Consider the problem faced by an ad network selling and
serving banner ads over a variety of web sites. Ads are served
(dispatched) to specific locations on web pages as the pages are
served by members of the network. Dispatch decisions can be based
on a variety of attributes (attributes of the impression in
question): either attributes of the web page (e.g., page identity,
page category, predicted demographic profile of users, page
content, etc.), attributes of the user, if available (e.g.,
demographic properties such as gender, income level, geographic
location), and transient contextual attributes (e.g., day-part,
browsing history, past purchases, etc.).
[0112] In typical ad auctions, advertisers bid for ad slots
satisfying specific attributes. Advertisers that match the
attributes of the current ad slot are often allocated via the GSP.
Such mechanisms are deployed much more frequently for sponsored
search than for banner ads. Regardless of the setting, however,
expressiveness is typically non-sequential and restricted to
per-item bidding (e.g., a bid price is offered per-impression or
per-click), time eligibility conditions, and simple budget
constraints (often linking multiple bids/items). Structured
(tree-based) languages have been proposed for specifying item
prices over ad attributes. These do not extend expressiveness
beyond per-item, but allow compact, natural specification of a set
of item prices that can be exploited in optimization.
[0113] Even in such a simple setting, the need for optimization can
be acute. Consider the following example.
[0114] There are two sites A and B. Bidder b.sub.1 bids $1 per
thousand impressions on A and $0.50 on B, with a budget of $50K.
Bidder b.sub.2 bids $0.50 per thousand impressions on A, with a
budget of $20K. Suppose supply on A is 5 times that of B for the
first 50K units, but is then exhausted (only B has supply from then
on). In a typical per-item auction, b.sub.1 will win all of A's and
B's supply until its budget is exhausted. Specifically, b.sub.1
would win (500/11)K impressions ((1)x+0.5x/5=50K). At this point
b.sub.2 wins the remaining (50/11)K impressions on A. Total revenue
is 50+(0.5)(50/11).apprxeq.$52.3K. The optimal allocation would
collect revenue of $70K by selling 40K units of A to b.sub.2, and
10K units of A plus 80K units of B to b.sub.1.
[0115] Optimization is also critical when one considers slates of
ads (multiple advertisers shown on a single web page).
[0116] The need for richer expressiveness in ad auctions is
evident, especially campaign-level expressiveness for banner ads. A
similar need for campaign-level expressiveness is clear in other
media as well, e.g., in TV advertising. The techniques disclosed
herein apply directly to such problems, though the channel
explosion problem is somewhat mitigated by the inability (or at
least, current unwillingness) to segment individual impressions by
very fine-grained attributes. The technology and willingness to
monitor viewing habits and individual details to target ads as they
are online would make the problem addressed here as acute in such
settings. This point was described in U.S. patent application Ser.
No. 11/396,410, filed Mar. 31, 2006, entitled "System For And
Method Of Expressive Sequential Auctions In A Dynamic Environment
On A Network" which is incorporated herein by reference, where
various forms of expressiveness are described, along with an
optimize-and-dispatch architecture in which: (a) optimization is
used to allocate ads over time at a coarse level of time
granularity; and (b) a dispatcher assigns ads in real time to
specific page impressions using parameters determined by the
optimizer. Further forms on campaign-level expressiveness can
include algorithms for the online, approximate solution of the
Markov decision process induced by the allocation model.
Specifically, given uncertain supply (in the form of web page hits)
and demand (in the form of bids or contracts), the approach
optimizes the allocation of (long-term) expressive ad contracts to
ad channels (that is, groups of attributes satisfying specific
properties) based on the distribution of predicted supply.
Regarding the cost of inexpressiveness in ad auctions, it can be
shown that GSP can be arbitrarily inefficient for certain
distributions of agent preferences (even with per-item
preferences), and that this inefficiency can be removed by adding
expressiveness.
[0117] In what follows, it is assumed that advertisers make
expressive offers: offers in which their preferences for particular
sequences or sets of impressions are articulated (clicks,
conversions, etc., can also be mixed with impressions if desired).
These can include per-item bids, budgets, and other standard fauns,
but are extended to include much richer offer terms. Below are just
a few examples of expressiveness over sets of impressions:
[0118] Minimum targets/threshold preferences: bidder pays a fixed
amount only if a minimum impression threshold is met during a
target period (e.g., $d for 300K impressions satisfying some
condition .phi.). Multiple targets may be mixed, as may
per-impression bids with bonuses for achieving specific targets.
Maximums, even frequency capping at the site or individual level,
can be imposed as well.
[0119] Temporal sequencing/smoothness: bidder desires a minimum
number of impressions satisfying condition .phi. in each of a set
of time periods (e.g., 200K impressions per day for two weeks); or
the bidder may make a threshold or per-impression offer that is
only "valid" if the variance in the number of impressions per time
period is no more than 10% (here validity may mean that the
impressions outside that range are not counted, or that the entire
contract is invalid).
[0120] Complements: ads on site A and site B must appear in a 2:1
ratio (either over the life of the campaign, or during each
relevant time period, e.g., hour, day-part, day, week).
[0121] While per-item expressiveness and budgets can usually be
incorporated directly into an LP model, some of these richer forms
of expressiveness require the introduction of binary variables
(e.g., threshold preferences). However, existing LP/MIP models are
unable to scale to practical problems involving a large number of
attributes (impression attributes); yet it is precisely the ability
to segment on very detailed attributes that explains the appeal of
online advertising! The key bottleneck is channel explosion: the
number of specific ad channels to which ads can be assigned in an
LP/MIP--i.e., the number of distinct attribute instantiations grows
exponentially in the number of attributes. Hereinafter this problem
is discussed through the use of channel abstraction.
[0122] Channel Abstraction.
[0123] Channel Abstraction offers a means for overcoming the
problem of a large (possibly exponential) number of channels in
optimization techniques for ad auctions. An abstract channel is any
aggregation of supply channels: the channels within the abstract
channel are then treated as indistinguishable at one or more points
during the process of determining the allocation of supply channels
to bidders. Channel abstraction is one specific embodiment of the
abstraction technique described herein for any type of
differentiated goods, where the goods in question are the
advertising channels made available to advertisers.
[0124] In what follows, focus is on the use of abstract channels
during the optimization phase of ad assignment. However, abstract
channels can be used at any or all of the phases the online
advertising cycle:
[0125] During bidding or contract negotiation: the publisher or ad
network can restrict advertisers to expressing their demand over
abstract channels. This can be accomplished by restricting the
attributes an advertiser can express, or by mapping more expressive
demands into the smaller set of abstract channels. Advantages of
such restrictions include ease of communication and
understandability (advertiser attention is focused on a smaller set
of distinctions); ability to guide advertisers to relevant supply
through appropriate abstract channel definition; and demand
intensification, which can increase publisher/network revenue by
preventing fine-grained market segmentation and encouraging
competition.
[0126] During the offline allocation of channels to advertisers:
This is elaborated on below. The benefit of using abstract channels
is a reduction--often dramatic--in computational complexity of the
allocation procedure.
[0127] During the online serving or dispatch of ad slots: While
offline optimization and allocation determines the supply of an
abstract channel that is given to a particular advertiser, the ads
must be dispatched in an online fashion to (concrete) channels as
the supply is realized. At dispatch time, fine-grained distinctions
among channels can be made (subject to the allocation constraints
imposed by the offline assignment of abstract channels); or the
fine-grained distinctions can be ignored, implying the use of
abstract channels at dispatch time.
[0128] The need for channel abstraction is
clear--optimization-based approaches to online advertising cannot
scale to problems of realistic size and scope. The number of
channels that must be accounted for grows so quickly in practice
that optimization-based approaches cannot be used. However, with
suitable channel abstraction, the number of channels can be reduced
exponentially, thus allowing optimization-based techniques to be
used in practice. However, channel abstraction usually implies some
form of approximation: assign advertisers are assigned to abstract
channels that incorporate some channels that are not of interest to
them. As such, they pay only for some of the ad displays on an
abstract channel (in particular, they pay only for those displays
that are dispatched on a concrete channel of interest). Thus the
appropriate construction of abstract channels is vital to meet the
goals outlined above (e.g., demand intensification or computational
efficiency) with as little sacrifice in objective value (e.g.,
economic efficiency or revenue) as possible.
[0129] A detailed description of the invention and several
illustrative examples will now be described.
[0130] The invention is a computer-implemented method for
automatically constructing a set of abstract supply channels for
the purposes of assigning advertisements to advertising inventory
(or ad slots) in an online advertising environment. For the purpose
of describing the invention, the following will be assumed
hereinafter.
[0131] A collection of ad slots are available to be allocated to
advertisers over some time frame of interest. These available slot
inventory for the period may be known in advance, or may be
uncertain. Uncertainty arises because of the inherent uncertainty
of user views of specific web pages or use of specific web
services. In the latter case, estimates of inventory may be
available (e.g., using statistical models), or other
characterizations of the uncertainty may be available (inventory
level bounds, probability distributions, etc.).
[0132] Ad inventory is characterized by a set of attributes F.
Specifically, for each attribute F.di-elect cons.F, each ad slot
has associated with it a specific attribute value f.di-elect
cons.Dom(F) (the possibility of unknown or inapplicable attributes
is allowed). As with ad inventory, the attribute values may be
known precisely at the time the time the ad slot is created/becomes
available, may be estimated in a statistical sense, or may be
unknown. The term condition or property will be used to refer to
logical combinations of attribute values, saying, for instance,
that a slot satisfies condition or property .phi..
[0133] Ad inventory is broken into channels, where each channel
groups together ad slots that are indistinguishable according to
the attributes of interest. Specifically, any two slots within the
channel are indistinguishable from the point of view of satisfying
any advertiser's expressed interests.
[0134] Potential advertisers communicate their interest in having
their ads displayed on ad slots: this interest captured by
expressing the attribute values desired of the slots on which their
ads are shown. They can also express their interest in receiving
various collections or sequences of such ad slots during the period
of interest. Advertisers also express their willingness to pay for
displays on ad slots--or events (e.g., clicks) caused by such
displays--satisfying their conditions of interest (including
potential conditions on collections or sequences of ad displays).
The communication of interest and willingness to pay can be
effected via an auction mechanism, by the direct purchase of slots
at prices posted by the publisher/network, manual negotiation, or
any other means.
[0135] Advertisers pay the publisher, ad network, or other provider
of the advertising inventory for the displays of their ads--or the
events caused by the display of their ads--as a function of the
attribute values of said displays or events, their expression of
interest and willingness to pay for ads satisfying specific
conditions, and possibly the expressions of interest and
willingness to pay of other advertisers. (e.g., in an auction, the
price for an ad display paid by a winning bidder often depends on
the bids of other bidders.)
[0136] The publisher, ad network, or provider of advertising
inventory wishes to determine an assignment of (actual or
predicted) ad slots to specific advertisers over the time period of
interest in order to achieve some objective. Potential objectives
include, but are not limited to: (a) maximizing total (predicted)
payment by advertisers for ad displays (where predictions are made
with respect to realizations of ad inventory, attribute values, and
events caused by ad display); (b) maximizing expected payment (with
expectation taken with respect to possible realizations of
inventory, attribute values, and events); (c) maximizing worst-case
revenue over possible realizations of inventory, attributes, and
events; (d) maximizing risk-adjusted revenue (or utility); (e)
maximizing predicted social welfare or efficiency (with respect to
predicted utility of advertisers given their declared interests and
willingness to pay); (f) and so on.
[0137] The publisher, ad network, or provider of advertising
inventory will determine the assignment of (actual or predicted) ad
slots using some form of optimization, for instance, mathematical
programming, heuristic search, specialized combinatorial
algorithms, or some other algorithmic technique. The provider may
insist on an optimal assignment, may allow an approximately optimal
assignment (where a difference from optimal can be guaranteed), or
may allow purely heuristic assignments (where it is expected that
the assignment will perform well with respect to the provider's
objective, but no guarantee can be provided).
[0138] As noted above, exact, approximate, and heuristic means of
determining assignments all run into computational difficulty when
the number of supply channels is large. And since the number of
supply channels grows exponentially in the worst case with the
number of attributes of interest, the number of supply channels is
too large to allow practical optimization of ad assignments except
in very small domains (small attribute set or small number of
bidders).
[0139] Proposed herein is a computer-implemented means for
constructing a set of abstract channels that supports effective
optimization with as little sacrifice of objective quality as
possible. Algorithmically, the approach starts with an initial set
of abstract channels, and iteratively refines these channels into
more fine-grained abstract channels until either: (a) the number of
abstract channels achieves some (statically or dynamically) defined
limit (e.g., reflecting limits on the ability of an optimizer to
handle a large number of channels, or limits on the communication
complexity desired to express ad conditions); or (b) until the true
or estimated quality of the set of abstract channels, with respect
to the objective criterion, reaches a satisfactory level.
[0140] To establish terminology, let C be the set of (concrete)
channels determined by the attributes of interest. An abstract
channel a.OR right.C consists of a non-empty set of (concrete)
channels {c.sub.1.sup.a, . . . , c.sub.n.sub.a.sup.a}, where
n.sub.a is the number of concrete channels that have been
aggregated within a. Each concrete channel is viewed as a special
case of an abstract channel (namely, a singleton set). A split of
an abstract channel a is a set {a.sub.1, . . . , a.sub.n} of
mutually disjoint subsets of a that covers a (i.e., a partition of
a). The abstract channel a is often referred to as the parent
abstract good, and the subsets of a are often referred to as child
abstract goods, in reference to such a split. When a is split, it
is "replaced" by this set of more fine-grained abstract channels,
a.sub.1.OR right.a, a.sub.2.OR right.a, etc. A split of a set of
abstract channels A is any the union of splits of its constituent
components. Thus if A={a,b} and {a.sub.1, a.sub.2, a.sub.3} is a
split of a and {b.sub.1,b.sub.2} is a split of b, then {a.sub.1,
a.sub.2, a.sub.3, b.sub.1, b.sub.2} is a split of A. A binary split
of a is any partitioning of a into two subsets, while a binary
split of a set of abstract channels A is any binary split of a
single abstract channel in A.
[0141] The number of c-channels |C| grows exponentially in the
number of attributes of interest. This number can be pruned by
eliminating any attributes that interest no bidder. A tighter bound
on the number of required channels can be provided by aggregating
c-channels that are indistinguishable to every bidder; this
provides a simple lossless abstraction by grouping sets of
c-channels corresponding to (logically consistent) formulae of the
form .+-..phi..sup.i; i.e., conjunctions over all bid formulae or
their negations.
[0142] However, such simple lossless abstraction is unlikely to
render optimization (whether LP or MIP) practical: exponential
growth would be expected in the number of channels, even when
abstracted in this way. Instead, the use of "approximate" abstract
channels (a-channels) is considered. An abstract channel is any
aggregation of concrete channels, and can be represented as a
logical formula .alpha. over F. An abstraction is any partitioning
of c-channels C into a set A of a-channels, i.e., a set of mutually
exclusive and covering formulae {.alpha..sub.1, . . . ,
.alpha..sub.|A|}. When convenient, a-channel and its logical
representation .alpha. are treated indistinguishably (e.g., writing
c.di-elect cons..alpha. to emphasize .alpha. as a collection of
c-channels, or c|=.alpha. to focus on its logical form).
Abstractions can also admit overlapping abstract channels, but
preferred embodiments of the method described herein do not
generally use overlapping (nondisjoint) abstract channels.
[0143] Given an abstraction A, the optimization problem becomes one
of assigning ads for a particular advertiser or bid to a-channels
rather than c-channels. To accomplish this, the supply of an
a-channel .alpha. is defined to be
s(.alpha.)=.SIGMA.{s(c):c.di-elect cons.C,c|=.alpha.}. In the
simple LP case discussed above reflecting per-impression value, the
value of an .alpha.-impression to bid i is defined:
v.sub..alpha..sup.i=v.sup.iPr(.phi..sup.i|.alpha.), where
Pr(.phi..sup.i|.alpha.)=s(.phi..sup.i.alpha.)/s(.alpha.).
This value reflects the (expected) value of a random dispatch
policy: if i is assigned to an abstract channel .alpha., it will be
assigned randomly to the c-channels that constitute .alpha..
Naturally, the dispatch of ads can be handled more intelligently:
no ad for i will actually be assigned to a channel not satisfying
.phi..sup.i; intelligent dispatch can be used to reassign such
wasted supply to ads that can exploit it. In this sense, the values
v.sub..alpha..sup.i will underestimate true value in practice. This
is discussed further below, wherein methods to assign ads to
abstract channels in a more refined fashion are developed. When bid
i has a budget g.sup.i capping its total expenditures, the optimal
allocation under the random dispatch assumption is given by the
LP:
max x .alpha. j i i .alpha. j v .alpha. j i x .alpha. j i
##EQU00001## s . t . i x .alpha. j i .ltoreq. s ( .alpha. j )
.A-inverted. .alpha. j .di-elect cons. C ##EQU00001.2## i v .alpha.
j i x .alpha. j i .ltoreq. g i .A-inverted. i .di-elect cons. B
##EQU00001.3##
[0144] In the case of more general IP expressiveness, value is not
associated directly with impressions, but with arbitrary properties
of the entire allocation; specific impressions satisfying logical
formulae .phi..sup.i "count towards" satisfaction of a bid's
conditions. Thus, the impressions that count toward bid
satisfaction by Pr(.phi..sup.i|a) are generally discounted rather
than discounting objective function value. Of course, the value
discount in the per-impression LP is a just special case of
this.
[0145] At each iteration t of the algorithm, let A.sup.t denote the
current set of abstract channels. The approach begins with an
initial set of abstract channels A.sup.0--this could be as simple
as the single abstract channel containing all inventory, or it
could reflect the a priori knowledge of the publisher/network
regarding good initial abstract channels. At iteration t, a set of
possible splits of A.sup.t is examined, and each of these is scored
using a suitable (heuristic or exact) measure of the impact of the
split on the ability to better assign ads according to the
objective in question (e.g., revenue or efficiency). This score is
assessed in the context of A.sup.t--that is, it measures the
(relative or absolute) degree to which splitting A.sup.t will
improve objective value. The split with the best score (e.g.,
highest) is the split that is adopted. Scores need not be computed
explicitly for all possible splits. Some splits may be determined
to be dominated by (i.e., worse than) other splits by means of some
logical (or other) relationship between them, without requiring the
explicit computation of the score of the dominated split. In some
cases, a best split can be determined without explicit computation
of its score either. Once adopted, the set of abstract channels is
updated by replacing any split channels in A.sup.t by their splits,
resulting in an updated set of abstract channels A.sup.t+1. At
iteration t+1, the process is repeated unless an appropriate
termination criterion (as discussed above) is satisfied by
A.sup.t+1. Optionally, once the appropriate splits of A.sup.t are
determined at iteration t, merging of channels can be considered.
Merging involves coalescing several abstract channels into a more
abstract channel by replacing the merged set by its union. Merging
of two or more channels may be made attractive by the new splits
entered at iteration t: it can, for example, reduce computational
complexity, while hindering objective value only minimally.
[0146] A special case of this procedure in which only a single
round of splits is described followed by an algorithm for doing a
single round of splits.
[0147] Criteria that can be included in scoring possible splits
include, but are not limited to: [0148] explicit scoring of a split
by solving for the optimal allocation using that split, and
comparing the resulting optimal objective value with the split,
with that obtained without the split; the value of the optimal
objective value (both with and without the split can be
approximated as well); and [0149] heuristic scoring of a possible
split using a function f that combines into a single score: the
relative (predicted or actual) supply of ad slots within the split
abstract channels; the relative (predicted or actual) demand for ad
slots within the split abstract channels; the (predicted or actual)
prices bid (or posted) for ad slots within the spit abstract
channels; the number of bidders/advertisers expressing interest in
ad slots in the split abstract channels; the complexity of the
description (e.g., in a logical language over attributes) of the
abstract channels that result from splitting; distinctions between
the channels within split abstract channels in their ability to
offer more or less precise predictions of user events caused by ad
displays (e.g., clicks or conversions); and other relevant
metrics.
[0150] Channel Abstraction by Column Generation.
[0151] One specific example of the process for channel abstraction
will now be described. This process is utilized for the case of
advertisers expressing interest in ad slots using arbitrary logical
conditions or properties of ad slot attributes, with per-impression
values and budgets. The extension to events (e.g., clicks) induced
by ad impressions is straightforward using standard conversion of
per impression bids to "per event" bids. Assume LP-based
optimization is used to allocate impressions to abstract channels
(though the specific algorithm, whether exact or approximate is not
critical). Assume the objective is revenue maximization, however,
the generalization to other linear objectives is straightforward.
Additional forms of expressiveness will be described later.
[0152] Bids.
[0153] Assume a set of logically independent attributes
.UPSILON.={A, B, C, . . . }, and for simplicity assume these are
boolean and can take values a, , b, b, . . . . Each bid takes the
form:
B.sup.i=(.phi..sup.i,v.sup.i,g.sup.i
where .phi..sup.i is some logical formula over the propositional
variables a, b, c, . . . ; v.sup.i>0 is bid i's value for any
impression satisfying .phi..sup.i; and g.sub.i>0 is bid i's
budget limit.
[0154] Channel Supply.
[0155] Let .OMEGA. denote the set of value assignments to variables
in .UPSILON. (i.e., the set of possible worlds, or truth
assignments in the boolean special case). Assume the supply
s(.omega.) is known for each possible world .omega..di-elect
cons..OMEGA.. The supply associated with any formula .phi. is
s(.phi.)=.SIGMA.{s(.omega.):.omega.|=.phi.}.
Each .omega..di-elect cons..OMEGA. corresponds to an unabstracted
channel: if an .omega.-impression is assigned to any bid i, it is
known categorically whether or not the bid's formula .phi..sub.i is
satisfied.
[0156] Unabstracted Optimization.
[0157] The unabstracted optimization problem can be formulated as a
very straightforward LP. Let x.sub..omega..sup.i denote the number
of impressions of channel .omega. assigned to bid i. Define
v.sub..omega..sup.i to be i's value for an .omega.-impression:
v .omega. i = { v i if .omega. | = .phi. i 0 if .omega. | .noteq.
.phi. i ##EQU00002##
Then, optimize:
max x .omega. i i .omega. v .omega. i x .omega. i ##EQU00003## s .
t . i x .omega. i .ltoreq. s ( .omega. ) .A-inverted. .omega.
.di-elect cons. .OMEGA. ##EQU00003.2## .omega. v .omega. i x
.omega. i .ltoreq. g i .A-inverted. i .di-elect cons. Bids
##EQU00003.3##
Obviously, variables corresponding to irrelevant bid-channel pairs
can be removed (i.e., where v.sub..omega..sup.i=0).
[0158] Numbers of Channels.
[0159] Generally speaking, the number of (unabstracted) channels
|.OMEGA.| will grow exponentially in the number of attributes of
interest to bidders. Even with aggressive pruning, the number of
channels will not permit even writing out the LP above, let alone
solving it. The ideal way to construct channels is not actually to
consider attributes of interest, but to manage the logical formula
associated with bids. The relevant abstract channels are those
corresponding to the formulae given by the logical conjunction of
each bid formula:
i .di-elect cons. Bids .+-. .phi. i ##EQU00004##
Logically inconsistent conjunctions can be removed from this set
(since no property can satisfy their conditions); in addition, any
formula that has (near) zero-supply can also be ignored (we'll call
such potential formulae/channels supply inconsistent). The number
of channels can be bounded the minimum of 2.sup.|Bids| and
2.sup.|.UPSILON.| (where the latter provides an upper bound on the
number of logically distinct formulae that can be generated by the
(full) conjunction of bid formulae (and their negations)).
Practically, of course, if bids have much in common, there is
likely to be considerable pruning due to logical and supply
inconsistency: how much is an empirical question.
[0160] Simple LP Channel Abstraction.
[0161] Each formula .psi. corresponds to an abstract channel: an
assignment of an abstract channel to a bid is interpreted as
randomly assigning impressions from the constituent unabstracted
channels {.omega.:.omega.|=.psi.} to that bid. In other words,
abstract channel .psi. aggregates those unabstracted channels
consistent with it. If a bid i is assigned to abstract channel
.psi., but has bid on impressions satisfying a distinct formula
.phi..sup.i.noteq..psi., then not all of the impressions will have
value to i; instead, it will pay only for a fraction of these
impressions. Specifically, the expected value of a .psi.-impression
is defined as:
v .psi. i = v i s ( .phi. i .psi. ) s ( .psi. ) ##EQU00005##
(if s(.psi.)=0, let v.sub..psi..sup.i=0). Note that:
[0162] 1. if .psi.|=.phi..sub.i, then v.sub..psi..sup.i=0 (logical
inconsistency);
[0163] 2. if s(.psi..phi..sub.i)=0, then v.sub..psi..sup.i=0
(supply inconsistency);
[0164] 3. if .psi.|=.phi..sub.i, then v.sub..psi..sup.i=v.sup.i
(i.e., channel .psi. specializes .phi..sub.i--note that the
definition of v.sub..omega..sup.i is a special case of this).
[0165] The abstract assignment problem requires the assignment of a
collection .PSI. of disjoint (and generally, exhaustive) set of
abstract channels to each bid. The optimization is again
straightforward: When x.sub..psi..sup.i denote, the number of
impressions of abstract channel .psi. assigned to bid i, then:
max x .psi. i i .psi. .di-elect cons. .PSI. v .psi. i x .psi. i
##EQU00006## s . t . i x .psi. i .ltoreq. s ( .psi. ) .A-inverted.
.psi. .di-elect cons. .PSI. ##EQU00006.2## .psi. v .psi. i x .psi.
i .ltoreq. g i .A-inverted. i .di-elect cons. Bids
##EQU00006.3##
Again irrelevant bid-channel pairs can be removed (i.e., where
v.sub..psi..sup.i=0).
[0166] It is important to note that no intelligent dispatch in
being considered here. If several bids are assigned to an abstract
channel, an intelligent dispatch protocol could generally make more
effective use of the channel than is indicated by the objective
above. Desirably, this formulation is utilized to help generate
abstractions "quickly" below. Once an abstraction is fixed,
optimization of the channel assignment in a way that accounts for
intelligent dispatch could be used.
[0167] Column Generation for Abstractions.
[0168] Now consider the use of column generation to aid in the
construction of a suitable abstraction: a set of
computationally-manageable size, yet whose optimal solution
provides a good--ideally, optimal or near-optimal--solution to the
original unabstracted LP/MIP. Column generation is naturally
suitable to LP forms of expressiveness, although in practice it can
be shown that it is also effective for IP expressiveness. It can
also be shown how IP expressiveness can be further handled via
constraint generation techniques below.
[0169] The All-Channels Linear Program.
[0170] To begin, consider the LP that allows the assignment of bids
to any channel at any level of abstraction. When .PSI. is the set
of all logical formulae in some canonical representation (that is,
each subset of .OMEGA. is represented by exactly one formula),
then:
max x .psi. i i .psi. .di-elect cons. .PSI. v .psi. i x .psi. i
##EQU00007## s . t . i x .psi. i .ltoreq. s ( .psi. ) .A-inverted.
.psi. .di-elect cons. .PSI. ##EQU00007.2## .psi. v .psi. i x .psi.
i .ltoreq. g i .A-inverted. i .di-elect cons. Bids
##EQU00007.3##
In matrix form, a reasonably nice structure can be seen. If the
language has the three boolean variables A, B and C, then:
[ v T 1 _ v T 2 _ v T 3 _ v a 1 _ v a 2 _ v a 3 _ v a _ 1 _ v a _ 2
_ v a _ 3 _ v abc _ 1 _ v abc _ 2 v abc _ 3 _ 1 1 1 0 0 0 0 0 0 0 0
0 .ltoreq. s ( T ) 0 0 0 1 1 1 0 0 0 0 0 0 .ltoreq. s ( a ) 0 0 0 0
0 0 1 1 1 0 0 0 .ltoreq. s ( a _ ) 0 0 0 0 0 0 0 0 0 1 1 1 .ltoreq.
s ( abc _ ) v T 1 0 0 v a 1 0 0 v a _ 1 0 0 v abc _ 1 0 0 .ltoreq.
g 1 0 v T 2 0 0 v a 2 0 0 v a _ 2 0 0 v abc _ 2 0 .ltoreq. g 2 0 0
v T 3 0 0 v a 3 0 0 v a _ 3 0 0 v abc _ 3 .ltoreq. g 3 ]
##EQU00008##
Of course, this LP allows overallocation of channel supply, since
channels that overlap can be allocated. For example, channel a
might be allocated up to its capacity s(a), while simultaneously
allocating "additional" supply from channel ab despite the fact
that this is included in the former channel. Constraints can be
imposed that prevent this--integer variables recording allocation
to each channel can be constrained to ensure that no overlapping
channels have positive allocations (see below). However, the column
generation approach discussed hereinafter will make such
constraints unnecessary: no overlapping channels will be permitted
in the LP itself.
[0171] The problem with this LP is that it has a number of
variables and constraints that is doubly exponential in the number
of impression attributes of interest: |.OMEGA.| is exponential in
the number of attributes, and |.PSI.|=2.sup.|.OMEGA.|.
[0172] Aside: Single IP to find Optimal Abstraction.
[0173] It is interesting to consider how one would find an optimal
abstraction in a single optimization. Each channel corresponds to a
logical formula .phi., or equivalently, the subset of the possible
worlds .parallel..phi..parallel..OR right..OMEGA.. We'll say
channel .phi.' is a (strict) refinement of .phi., written
R(.phi.',.phi.), if .parallel..phi.'.parallel..OR
right..parallel..phi..parallel.. Channel .phi.' is an immediate
refinement of .phi., written IR(.phi.',.phi.) if it is a refinement
of .phi. and there is no .phi.'' that is both a refinement of .phi.
and is refined by .phi.'. (This simply says that .phi.' is a child
of .phi. is the subset lattice over .OMEGA..) A valid abstraction
is any packing of .OMEGA. (i.e., a set of logically disjoint
channels). A complete abstraction is any partitioning of .OMEGA.
(i.e., a set of logically disjoint and exhaustive channels).
[0174] This is encoded as follows: [0175] For any channel .phi.,
let 0-1 variable I.sub..phi. denote that .phi. is a channel in the
abstraction. [0176] Let 0-1 variable D.sub..phi. denote that some
refinement of .phi. is in the abstraction. [0177] Then impose the
following constraints: [0178] (a) I.sub..phi..ltoreq.D.sub..phi.'
for all .phi. and all .phi.' s.t. IR(.phi.,.phi.') (i.e., if .phi.
is in the abstraction, each parent of .phi. has a descendent in the
abstraction). [0179] (b) D.sub..phi..ltoreq.D.sub..phi.' for all
.phi. and all .phi.' s.t. IR(.phi.,.phi.') (i.e., if .phi. has a
descendent in the abstraction, then so does each of its parents).
[0180] (c) D.sub..phi..ltoreq.1-I.sub..phi. for all .phi. (i.e., if
.phi. has a descendent in the abstraction, it cannot be in the
abstraction). [0181] (d) x.sub..phi..sup.i.ltoreq.I.sub..phi. for
all bids i, channels .phi. (i.e., bids can only be assigned supply
from channels in the abstraction).
[0182] This IP is enough to ensure that an optimal abstraction is
found (when added to the all-channels LP). (It has exponentially
many integers vars (in |.OMEGA.|), but only linearly many
constraints per integer var. Of course, .OMEGA. itself has
exponential size in the number of attributes.) This IP has no bias
toward smaller abstractions: this could be accomplished with the
addition of penalty terms, of suitable magnitude, associated with
finer levels of abstraction.
[0183] Column Generation: General Approach.
[0184] One way to think of generating an abstraction is as column
generation. Column generation is used to solve linear programs with
very large numbers of variables by solving a version of the LP with
very few variables (columns), then adding new variables into the LP
at each iteration and then resolving, using a heuristic scoring
method to determine which new variables to add at each iteration.
This approach is adopted here, but with some significant
enhancements that account for the fact that multiple columns must
be introduced at once that exploit the special structure of the
problem. The basic approach is as follows: solve an abstract LP
using some initial level of abstraction (e.g., aggregating all
concrete channels into a single abstract channel T). Refine the
abstraction heuristically by choosing an abstract channel .alpha.
to split into two by conjoining a formula .beta. and its negation,
thus replacing .alpha. by .alpha..beta. and .alpha. .beta.. Solve a
new LP with the new abstract channels, and the process repeats
until the improvement in LP objective value falls below some
threshold or the number of channels reaches a specified limit.
[0185] Suppose the All-Channels LP is solved but use only a small
subset of channels corresponding to a particular abstraction are
used. The LP will have only variables corresponding to these
channels. Refining this abstraction amounts to "splitting" one or
more of the abstract channels into finer-grained channels. This can
be seen as adding variables/columns to the abstracted LP. If it is
desired that channels in the abstraction not overlap, then any
channel (variables) so split will need to be removed from the LP.
Thus considered is the use/adaptation of column generation methods
to determine which columns to add (equivalently, which channels to
split).
[0186] To motivate the method, consider a simple example. Suppose
the starting point is an abstract LP with two abstract channels, a
and . By ignoring all other channel variables, the problem is
simplified considerably: only 2|Bids| variables (corresponding to
these two channels) and 2+|Bids| constraints (the two capacity
constraints for the two channels and the budget constraints for
each bid) are needed. For instance, suppose there are three bids,
then the abstract LP is:
Obj : v a 1 v a 2 v a 3 v a _ 1 v a _ 2 V a _ 3 Constraints : 1 1 1
0 0 0 .ltoreq. s ( a ) 0 0 0 1 1 1 .ltoreq. s ( a _ ) v a 1 0 0 v a
_ 1 0 0 .ltoreq. g 1 0 v a 2 0 0 v a _ 2 0 .ltoreq. g 2 0 0 v a 3 0
0 v a _ 3 .ltoreq. g 3 ##EQU00009##
Note that by restricting attention to specific abstract columns
removing columns from the All-Channels LP--results in effectively
reducing the number of constraints since those corresponding to the
capacity of the removed, finer-grained channel are vacuous in the
abstract LP.
[0187] Once solved, then determine which of these two channels, a
or , would the most value to the LP solution if split. In classic
column generation, the unused variable that has greatest reduced
cost would be utilized. Here things are different, since the
introduction of an arbitrary variable/column to the LP is not
allowed; instead multiple columns are added and multiple columns
are removed at each refinement: [0188] If a channel is added to the
LP, then introduce all of the variables associated with it; for
instance, if channel ab is added, then add the columns
corresponding to x.sub.ab.sup.i for each bid i. [0189] An existing
channel is split, thus adding two new channels that partition an
existing channel. This means two new sets of columns are
introduced. In this example, two sets of columns would need to be
added if a is split: {, :i.di-elect cons.Bids}. [0190] The columns
for the channel that is split must then be removed to ensure no
overlapping channels exist in the refined abstraction. [0191] This
procedure is repeated until a satisfactory abstraction is
obtained.
[0192] Aside: Overlapping Abstraction in Restricted
Circumstances.
[0193] There might be the temptation, when splitting a channel a
into ab and a b, to retain variables for x.sub.a.sup.i when adding
x.sub.ab.sup.i,x.sub.a b.sup.i. (Here think of a, b as arbitrary
formulae, not just literals). Intuitively, the resulting a-channel
can be thought of as undifferentiated supply: i can be assigned to
either ab, to a b, or to arbitrary a supply which doesn't
distinguish b or b. This undifferentiated channel is denoted by a*
(indicating that at least some of a has been carved off into more
specific channels). The only natural way to define the capacity of
a* is with respect to what's left of a after ab and a b are
accounted for:
s ( a * ) = s ( a ) i ( x a b i + x a b _ i ) . ##EQU00010##
This leads to one conceptual problem and one technical problem.
Conceptually, there is no value in allowing all three variables
x.sub.a*.sup.i, x.sub.ab.sup.i, x.sub.a b.sup.i for a single bid i.
So, assume that bids i are partitioned into two classes, those that
can be assigned differentiated supply only, and those that can be
assigned undifferentiated supply only. To be fully expressive all
bids could classified as "differentiable" since anything that can
be done with a* can be done with ab and a b jointly. The value in
classifying some bids as "undifferentiable" is that an allocation
variable is saved.
[0194] Intuitively, think of a bid i as not caring about the
difference between ab and a b if its value for each of the two
channels were the same. For instance, if i doesn't care about b or
anything involving b, it would be desirable like to assign i supply
from a*. Unfortunately, pricing becomes a problem. Recall the value
of the abstract channel a* to i is given by
v.sub.a*.sup.i=Pr(.phi..sup.i|a*). But Pr(.phi..sup.i|a*) can
depend on the (a priori unknown) assignment to ab and a b:
Pr ( .phi. i | a * ) = s ( a b ) - j x ab j s ( a *) Pr ( .phi. i |
a b ) + s ( a b _ ) - j x a b _ j s ( a *) Pr ( .phi. i | a b _ )
##EQU00011##
[0195] In other words, the value of v.sub.a*.sup.i cannot be fixed
in advance for an undifferentiated channel that has more specific,
differentiated subchannels. This should be obvious due to the
channel skimming effect. Note that this holds even if i doesn't
"care" about b directly: if b is correlated with some other
attribute that does impact i's value, then skimming of b is a
problem.
[0196] This can be made to work if .phi..sup.i is independent of b
given a: if Pr(.phi..sup.i|ab)=Pr(.phi..sup.i|a b), then
Pr(.phi..sup.i|a*)=Pr(.phi..sup.i|a) and
v.sub.a*.sup.i=v.sub.a.sup.i. In this case, replace allocational
variables x.sub.ab.sup.i, x.sub.a b.sup.i with x.sub.a*.sup.i,
since v.sub.ab.sup.i and v.sub.a b.sup.i are identical in the
objective. Any such bid can be added to the set of bids that use
undifferentiated supply a*. A couple points: [0197] If
a|=.phi..sup.i, then this independence is logical, not just
statistical, and will persist with any further splits of a, ab,
etc. No special tracking is required. [0198] Otherwise, this
independence is purely statistical and can disappear with further
splits: e.g., while .phi..sup.i is independent of b given a, it may
not be independent of c given ab. In this case, additional splits
are monitored and the set of bids that are differentiable or not
are adjusted. This requires some additional bookkeeping. [0199] The
bookkeeping will be fairly detailed (more so in the latter case
than in the former) because of the existence of multiple levels of
undifferentiated supply. For instance, if ab is split into abc and
ab c, undifferentiated supply channels a* and ab* will result, and
the definition of s(a*) must be modified accordingly.
[0200] Computing Reduced Costs.
[0201] The solution to the abstract LP gives rise to a set of basic
variables (including slack variables), and the basis matrix B. Any
x.sub..phi..sup.i variables not present in the abstract LP can be
viewed as nonbasic variables. Reduced cost measures the increase in
objective value per unit increase in the (nonbasic) variable in
question. Specifically, any variable that has positive reduced cost
may improve the value of the objective. Standard column generation
algorithms solve a pricing subproblem to identify columns absent
from an LP with positive reduced cost, and generally add the column
with maximum reduced cost (for maximization problems), terminating
when no reduced costs are positive. Note though that when applied
to channel abstraction, column generation requires bringing in both
sides of the channel split while, typically, removing the parent
channel. Furthermore, the reduced cost of the split are related to
the dual prices of the parent channel in a special way. These
aspects imply a need for novel approaches to scoring a split.
[0202] To compute the reduced cost is reasonably straightforward.
Let B be the basis matrix corresponding to the abstract LP
solution. Let c.sub.B be the vector of objective coefficients
corresponding to this basis. First, compute the vector of simplex
multipliers .pi.=c.sub.BB.sup.-1. These can be interpreted as
shadow or dual prices corresponding to the constraints in the
current abstract LP. In other words, these are prices associated
with the "consumption" of either abstracted supply or a bid's
budget such that the objective value "produced" by each basis
variable is equal to the cost of the supply/budget that it
consumes. Each simplex multiplier corresponds to a specific
constraint, so they are labeled as follows: .pi..sup..psi. for any
.psi. in the current abstraction (corresponding to the supply
constraint for .psi.); and .pi..sup.i for each bid i (corresponding
to bid i's budget constraint). In this example, the shadow prices
are .pi.=y.sup.a, y.sup. , y.sup.1, y.sup.2, y.sup.3.
[0203] Let .pi..sub..phi..sup.i be the column corresponding to
x.sub..phi..sup.i. The reduced cost of variable x.sub..phi..sup.i
is
c.sub..phi..sup.i=v.sub..phi..sup.i-.pi.u.sub..phi..sup.i.
For instance, in this example, the reduced cost of x.sub.ab.sup.2
would be
x.sub.ab.sup.2-.pi..sup.a,.pi..sup.
,.pi..sup.1,.pi..sup.2,.pi..sup.3u.sub.ab.sup.2.
Examining the structure of column u.sub..phi..sup.i it can be seen
that it is a zero vector except for:
[0204] 1. entry 1 in the supply constraint for .phi.;
[0205] 2. and entry v.sub..phi..sup.i in bid i's budget
constraint.
[0206] Of course, constraint (b) is not actually present in the
abstract LP. In this example,
u.sub.ab.sup.2=v.sub.ab.sup.2010
where the last two entries 10 correspond to the proposed split
supply ab and a b.
[0207] This gives the problem of dimensionality mismatch. More
precisely, the supply associated with the refined channel .phi.
(which is one part of a split of some parent channel) has not been
priced. (Note that included are the constraints for ab and a b in
the abstract LP, they would have been vacuous, leading to dual
variable values/shadow prices of zero in any case.)
[0208] There are several approaches that could be taken to
computing dual variables for the s(ab) and s(a b) constraints. The
critical point is not to re-solve the LP with the new constraints
in place. Accordingly, consider four approaches: the first makes
intuitive sense, but in the end is incorrect; the second seems to
suggest something reasonable; the third describes a more principled
way to use LP dual prices; and the fourth is the preferred method
for using dual prices. In particular, the fourth method uses dual
prices in a way that exactly computes the improvement of a split
under a restricted model of expressiveness. It is a heuristic
overestimate of the improvement for more general forms of
expressiveness.
[0209] Approach 1: Redefining Variables.
[0210] However, if channel a is split into ab and a b, there is
still an intuitive notion of price that can be applied to these
split channel constraints at the current solution. Recall that .pi.
satisfies c.sub.B=.pi.B: these are prices of the budget and supply
constraints at which the total consumption cost of each basic
variable equals its contribution to objective value. For the column
x.sub.a.sup.i, these simplex multipliers must satisfy
v.sub.a.sup.i=.pi..sup.a+v.sub.a.sup.i.pi..sup.i
(this is simply one equality in the system c.sub.B=.pi.B). The
consumption of channel a by any variable x.sub.a.sup.i can be
broken into the consumption of the subchannels ab and a b. But
within the current abstraction, only x.sub.ab.sup.i and x.sub.a
b.sup.i can be allocated with fixed ratios Pr(b|a) and Pr( b|a).
If:
x.sub.ab.sup.i=Pr(b|a)x.sub.a.sup.i
x.sub.a b.sup.i=Pr( b|a)x.sub.a.sup.i
then the abstract LP can be rewritten in terms of the supply
constraint on s(ab):
i Pr ( b | a ) x a i .ltoreq. s ( a b ) ##EQU00012##
The optimal solution of the abstract LP with this constraint
replacing the supply constraint dictated by s(a) will be identical
to the original. The shadow price .pi..sup.ab associated with this
constraint satisfies
v.sub.a.sup.i=Pr(b|a).pi..sup.ab+v.sub.a.sup.i.pi..sup.i
This implies that
.pi. a b = 1 Pr ( b | a ) .pi. a . ##EQU00013##
Note that the shadow price for the ab supply is at a premium of
1 Pr ( b | a ) ##EQU00014##
relative to a. For instance, if s(ab)=0.25 s(a), then
.pi..sup.ab=4.pi..sup.a and (since a b makes up the other
three-quarters of a-supply)
y a b _ = 4 3 .pi. a . ##EQU00015##
Intuitively, this makes sense. Differentiated supply is much more
"valuable" than undifferentiated supply since much less of it is
"wasted" in general.
[0211] This enables a reduced cost to be assigned to any new column
x.sub..phi..sup.i:
v .phi. i - .pi. Par ( .phi. ) 1 Pr ( .phi. | Par ( .phi. ) ) -
.pi. i v .phi. i ##EQU00016##
(Here Par(.phi.) is the parent channel of .phi., i.e., the channel
in the current abstraction that is split to create .phi..) If the
column has positive reduced cost, then adding the column to the LP
will improve value (ignoring the refinement constraints).
Intuitively, the inequality
.pi.u.sub..phi..sup.i<v.sub..phi..sup.i means that at the
current solution (in current abstraction), there is more marginal
value associated with the allocating to this new variable than the
marginal cost of the supply/budget that it consumes at the current
shadow prices.
[0212] Approach 2: Shadow Price Division.
[0213] An alternative approach is to simply "share" the dual price
associated with the constraint on channel supply with the split
supply. For instance, suppose channel a is split into ab and a b.
Recall the dual variable .pi..sup.a induces a total cost of
.pi..sup.as(a) on the use of channel a (recall the constraint is
tight if y.sup.a is nonzero). Intuitively, the total price assigned
to the "resources" s(ab) and s(a b) should induce the same total
cost: hence the dual prices on the split channels should satisfy:
.pi..sup.as(a)=.pi..sup.abs(ab)+.pi..sup.a bs(a b) (s.t.,
.pi..sup.ab, .pi..sup.a b.gtoreq.0). This equality has infinitely
many solutions; but notice that since s(a)=s(ab)+s(a b), a trivial
(and natural) solution is to set gab .pi..sup.ab=.pi..sup.a
b=.pi..sup.a. Hence, simply use the shadow price of the parent
channel for the split channels when computing reduced costs. More
generally, the reduced cost of any new column x.sub..phi..sup.i
is:
.pi..sub..phi..sup.i-.pi..sup.Par(.phi.)-.pi..sup.iv.sub..phi..sup.i
This approach has some intuitive appeal. Computationally, it seems
to point in the direction of appropriate splits when tested on some
small numerical examples (three bids, eight channels over three
attributes). However, it does seem to lack any sensitivity to the
differences in split channel capacity. The final reduced cost
derivation will address this directly.
[0214] Approach 3: One Method for Using Dual Prices.
[0215] The abstract LP is then solved to determine an optimal
solution to the abstracted problem--in this example, with channels
a and . This solution is also an optimal solution for the less
abstract LP in which one of the original channels is split, e.g.,
where a is split into ab and a b, subject to the constraint that ab
and a b are assigned to each bid in a fixed ratio Pr(b|a):Pr( b|a).
Of course, breaking the abstraction by splitting a in this way, can
be viewed as simply dropping the fixed ratio constraint. The
optimal solution of this fixed-ratio LP is not optimal for the
relaxed version; but it is a feasible solution. As such computed
are the dual prices associated with this sub-optimal feasible
solution in the split LP. This would reflect the true value of
dropping the constraint, i.e., adopting the split.
[0216] While this suggests the need to solve the LP associated with
each split, it turns out that the dual prices of the split channels
in the split LP are tightly related to the dual price of the parent
in the abstract LP.
[0217] In the split LP (dropping the fixed ratio constraint), the
dual prices for s(ab) and s(a b) must satisfy (for each bid i):
v.sub.ab.sup.i=.pi..sup.ab+.pi..sub.ab.sup.i
v.sub.a b.sup.i=.pi..sup.iv.sub.a b.sup.i
Solving for .pi..sup.ab, produces (again for each bid i):
y a b = v a b i - .pi. i v a b i = Pr ( b | a ) ( v a i - .pi. i v
a b i ) = Pr ( b | a ) .pi. a ##EQU00017##
The first equality is simple term rearrangement; the second follows
for the definition of v.sub.ab.sup.i; and the third follows from
the constraints on shadow prices in the abstract LP, which
satisfies v.sub.a.sup.i=.pi..sup.a+v.sub.a.sup.i.pi..sup.i for all
bids i (see above). Similarly,
.pi..sup.a b=Pr( b|a).pi..sup.a
This is another suitable means of computing shadow prices of the
new split channel constraints without solving a new LP.
[0218] In general, the reduced cost of any new column
x.sub..phi..sup.i is defined as follows:
.pi..sup.a b=Pr(b|a).pi..sup.a
Apart from its principled derivation, computationally, it seems to
point in the direction of appropriate splits when tested on some
small numerical examples (three bids, eight channels over three
attributes).
[0219] Reduced Costs of a Split.
[0220] The reduced cost approaches above score the potential value
of adding specific columns. Splitting a channel .psi. into
.psi..phi. and .psi..phi. does not introduce a single column, but
the set of columns : i.di-elect cons.Bids}. The split of any
channel is heuristically scored by summing the reduced costs of
each column generated by the split, with the exception of any
columns with negative reduced costs (since they do not have to be
used in the solution at the less abstract level).
[0221] One difficulty with a pure positive-sum scoring metric is
the fact that it does not account for the fact that different
columns "compete" for specific constraints (i.e., "resources"). For
instance, if variables/columns are added for both x.sub.ab.sup.i
and x.sub.a b.sup.i, these will both compete for bid i's budget.
Similarly, if both x.sub.ab.sup.1 and x.sub.ab.sup.2 are added,
they each compete for the new channel ab. In both cases, summing
the score has the potential to overestimate the value of a split by
ignoring such interactions. Positive-sum scoring can be refined by
restricting the sum to include at most one column per bid and at
most one column per new channel. This would require some very
trivial maximization: consider the split ab/a b. Let
x.sub.ab.sup.i*=arg max.sub.i x.sub.ab.sup.i and let x.sub.a
b.sup.j*=argmax.sub.j x.sub.a b.sup.j (maximizing reduced costs in
each case). Then, compute the two pseudo-scores:
h.sub.1=x.sub.ab.sup.i*+arg max.sub.j.noteq.i* x.sub.a b.sup.j and
h.sub.2=x.sub.a b.sup.j*+arg max.sub.i.noteq.j* x.sub.ab.sup.i, and
score the split using the max of h.sub.1, h.sub.2.
[0222] Alternative scoring measures include simply using the score
of the maximum column reduced cost (and possibly others).
[0223] Aside: Greedy Splitting.
[0224] There are simpler versions of this process that are very
easy to optimize. In other words, the search over possible splits
becomes straightforward (in contrast to the process discussed
below). For example, if the supply constraints are ignored and
focus is solely on value, the reduced cost of a column
x.sub..phi..sup.i can be defined as
v.sub..phi..sup.i-.pi..sup.iv.sub..phi..sup.i. This makes it easy
to find the split with max reduced cost by searching over bids: for
any bid with y.sup.i<1, the reduced cost will be maximized at
the split .phi..sup.i (i.e., splitting on bid i's formula). Thus
the bid with maximum value, scaled by its budget shadow price, is
the (single) column with max reduced cost.
[0225] This process basically incrementally splits off the formulae
associated with specific bids, doing it in a greedy order
corresponding to adjusted value. Thus, it is a bit more
sophisticated than simply ordering bids by marginal bid value and
splitting off their formulae--this accounts for the fact that
previous channel splits can do a good job of satisfying a bid with
high marginal value, leaving it with very low adjusted value, and
allowing us to focus on other bids. However, while computationally
convenient, this procedure completely ignores supply constraints.
It is unlikely to perform well for this reason. But it might be
worth exploring.
[0226] Approach 4: Preferred Method Using True Dual Prices.
[0227] Consider the following LP to allocate bids B={1,2} to a
single abstract channel .alpha. (with no budget or other
constraints). A single channel is illustrated to reduce notational
clutter. Unless .ident.T, this LP will have a set of a-channels
.alpha..sub.j and allocation variables x.sub.j.sup.i for each bid i
and a-channel .alpha..sub.j:
Max v .alpha. 1 x .alpha. 1 + v .alpha. 2 x .alpha. 2 s . t . x
.alpha. 1 + x .alpha. 2 .ltoreq. s ( .alpha. ) ##EQU00018##
[0228] Refining a-channel .alpha. requires introducing the columns
corresponding to .alpha..beta., .alpha. .beta. for some .beta..
[0229] Scoring Abstract Channel Splits.
[0230] Assume the solution of the abstract LP above, the value, or
score, of a potential split of a into two channels .alpha..beta.,
.alpha. .beta. is determined. The preferred method scores a split
by: (a) scoring the new columns introduced by the split using a
form of column generation scoring; and (b) combining the scores of
these new columns in a particular way that exploits the special
structure of the problem.
[0231] Let .pi..sub.a denote the value of the dual variable
corresponding to the supply constraint for a-channel .alpha. in the
dual of the abstract LP (equivalently, the shadow price of the
constraint). The reduced cost of variable is:
rc()=-c.pi.
where c is column and .pi. is the vector of dual variables. The
reduced cost of is defined similarly. Unfortunately, the abstract
LP does not include relevant supply constraints for .alpha..beta.
or .alpha. .beta., meaning shadow prices for these constraints
cannot be directly obtained from the LP. Consideration is given to
adding two new rows to the original abstract LP, reflecting split
channel supply, as follows:
Max v .alpha. 1 x .alpha. 1 + v .alpha. 2 x .alpha. 2 s . t . x
.alpha. 1 + x .alpha. 2 .ltoreq. s ( .alpha. ) Pr ( .beta. |
.alpha. ) x .alpha. 1 + Pr ( .beta. | .alpha. ) x .alpha. 2
.ltoreq. s ( .alpha. .beta. ) Pr ( .beta. _ | .alpha. ) x .alpha. 1
+ Pr ( .beta. _ | .alpha. ) x .alpha. 2 .ltoreq. s ( .alpha. .beta.
_ ) ##EQU00019##
[0232] Since s(.alpha..beta.)=Pr(.beta.|.alpha.)s(.alpha.)
(similarly for .beta.), these new constraints are multiples of the
original s(.alpha.) constraint, leaving the optimal solution
unaffected. This allows pricing of the two new constraints: when
the dual of this LP is considered, one optimal solution sets the
dual variable .pi..sub..alpha. to its value in the original
abstract dual LP, and sets the two new dual variables ==0. As a
result, the reduced costs of the variables corresponding to the
split channels using terms available from the solution of the
original abstract LP can be determined:
rc()=-c.pi.=-.pi..sub..alpha.
rc()=-c.pi.=-.pi..sub..alpha.
[0233] In this simple case, with only supply constraints, the
change in objective value resulting from a split can be measured
exactly. Without budget constraints, it is clear that all supply of
the new split channel .alpha..beta. will be allocated to the bid i
that has maximum value , giving total objective value improvement
of rc()s(.alpha..beta.). Here the reduced cost component reflects
the precise difference in objective value if an .alpha.-impression
to a current winning bid is replaced by an .alpha..beta.-impression
to bid i, while the supply component informs how much substitution
is available. Applying the same argument to .alpha. .beta. gives
the following heuristic for scoring the split of any channel
.alpha. into two subchannels .alpha..beta. and .alpha. .beta.:
score ( .alpha. , .beta. , .beta. _ ) = max i .di-elect cons. B {
rc ( x .alpha. .beta. i ) s ( .alpha. .beta. ) } + max i .di-elect
cons. B { rc ( x .alpha. .beta. _ i ) s ( .alpha. .beta. _ ) }
##EQU00020##
[0234] This scoring function has the desirable property that the
score of a split is exactly the induced improvement in objective
value when the only constraints are supply constraints. Of course,
almost all natural problems will have other constraints: budget
constraints most certainly, and other expressive forms as well.
However, if limited to LP expressiveness, the reduced cost
calculation remains straightforward, requiring one vector product
(using dual/shadow prices computed in the LP solution). The scoring
function itself becomes heuristic, providing an upper bound on the
possible improvement in objective value (e.g., consider the case
where the maximizing bid i for split .alpha..beta. has a budget
constraint that prevents it from consuming the entire split
supply). Despite this, it provides much better performance than
using reduced costs alone. One could envision more complex scoring
functions that attempt to solve small optimization problems to
better estimate the improvement in objective value for a given
split. Folklore in column generation suggests this is rarely
worthwhile. However, a key advantage is that the scoring function
requires no additional computation over standard reduced cost
calculations (using terms readily available from the LP solve)
apart from a trivial maximization. This is critical, since as
discussed next the number of potential splits is doubly
exponential.
[0235] Choosing a Refinement.
[0236] Unfortunately, the number of potential splits of an
a-channel .alpha. is doubly exponential in n (i.e., 2.sup.k.sup.n
formulae over n attributes with domain size k), and there does not
seem to be much of the way of compositional structure of these
scores that would allow optimal searching over these columns (to
find the split with optimal score). In addition, there is a need to
evaluate splits of each a-channel .alpha..sub.j in the current
abstraction A.
[0237] Unfortunately, there are doubly exponentially many splits
that need to be evaluated. Instead, a simple greedy search scheme
that will allow construction of a Disjunctive Normal Form (DNF)
formula .beta. describing a split of channel .alpha. into
.alpha..beta. and .alpha. .beta. can be used. The search can be
defined in such a way that any DNF formula can be constructed, this
giving the ability to find any split, in principle. It is also
possible to introduce other biases into the process by restricting
the form of the splits that can be constructed (e.g., conjunctions
of literals, k-term DNF, etc.)
[0238] Computing the score of any formula .alpha..beta.
representing the refinement of an existing abstract channel is
computationally simple, requiring at most 2|B| reduced cost
calculations. This is in fact an overestimate, since any bid i that
cannot use abstract channel .alpha..beta. (i.e.,
.alpha..beta.|=.phi..sup.i) will not have a variable x.sub.j.sup.i
and will not contribute to the score. The scores of such formulae
can be used to guide the incremental construction of a split. Since
each original abstract channel gives rise to mutually exclusive
formulae when they are split, possible splits are independently
evaluated for each existing channel in the current abstraction. A
process for one channel only will now be described with reference
to FIG. 2.
[0239] Consider a search tree with split formulae .beta. labeling
nodes (the formula .beta. is implicit). Assume each .beta. is
written in DNF. In FIG. 2 the root node is labeled with
.beta..sub.c=cd f. Note that FIG. 2 depicts only a portion of a
search tree, with .beta..sub.c actually being an intermediate node
in the broader tree rooted at .beta..sub.r=T.
[0240] The children of any node .beta. in the search tree are
defined as those formula that can be constructed using one of two
classes of operations: [0241] for any literal l, the formula
resulting from conjoining the literal to one of the terms of
.beta.; and [0242] for any literal 1, the formula resulting from
disjoining the literal as a new term.
[0243] The first to nodes at level one of the tree in FIG. 2
illustrate the first operation using literal a (which is added to
terms cd and f, respectively). The third node illustrates the
second operation, again with literal a.
[0244] Any search procedure can be used to explore this tree. The
key is that the reduced costs scoring metric can be used as a
heuristic to guide the process. It remains to look more deeply into
the relationship between the reduced costs of different formulae to
determine whether useful bounding and search ordering information
can be derived. For simplicity, a very simple, greedy search
mechanism over this tree will be described below.
[0245] The non-backtracking search process proceeds as follows. Let
.beta. be the current node. For each of .beta.'s children, .beta.',
score .beta.' using the reduced cost metric for those columns
introduced by adding channels .alpha..beta.' and .alpha..beta.'.
The node with the highest score becomes the current node and the
search continues from that point. Essentially, the process greedily
commit to the literal with the maximal score at each iteration of
the search process; but there is no commitment to this as a
split.
[0246] The search process can terminate when one of several
conditions are met: [0247] The best score falls some threshold.
[0248] The search tree reaches some depth. [0249] The set of all
improving children (over some threshold) exceed a complexity
limit.
[0250] In the last category, for example, might include restricting
formulae to k-DNF (at most k literals per term) or m-term DNF (at
most m-terms) or a combination. Biases such as conjunctive splits
can be imposed by considering only 1-term DNF.
[0251] Despite the iterative nature of this formula search, no
additional optimization is required. The scores of all candidate
formulae are computed relative to the solution (basis) of the
original abstract LP. Only once the actual split is determined at
the end of the process is a new, less abstract LP solved.
[0252] Another restriction could be introduced into the search
process: namely, only evaluate branches or moves in the search tree
that lead to a formula that more closely "approximates" the bid
formula .phi..sup.i of some bid i. The only way in which a split
can enhance objective value is if it allows more accurate
discrimination of the value associated with at least one bid. This
could be accomplished in several ways, including methods that look
at the statistical correlations between bid formula and potential
splits. By focusing on splits that create channel formulae that are
"logically" closer to an existing bid formula, it intuitively is
more likely that the bid will be satisfied when assigning the new
channel to it.
[0253] The moves made to construct DNF formulae will not permit
adherence to a strict definition of closeness, but it is worth
providing such a definition: at least semantically. Let
.psi.=.alpha..beta. be the current channel formula (i.e., the DNF
formula labeling the current tree node) and let
.psi.'=.alpha..beta.' be one of its child formula (i.e., that
obtained by adding a literal to .beta.). It can be said .psi.' is
closer to bid formula .phi. than .psi. iff the symmetric set
difference is reduced; more precisely, iff
Diff(.parallel..psi.'.parallel.,.parallel..phi..parallel.).OR
right.Diff(.parallel..psi..parallel.,.parallel..phi..parallel.),
where Diff(S,T)=S\T.orgate.T\S. In other words, the formula
.psi.'moves closer to .phi. if the set of possible worlds that
satisfy one but not the other becomes smaller. A sequence of moves
that take us (strictly) closer to a target formula .phi. is
guaranteed to converge (monotonically) to .phi..
[0254] This is a very strict definition of closeness and is hard to
enforce using very limited moves (adding literals to terms or
adding disjuncts) as can be seen below. Closeness can be defined
somewhat more weakly by removing the implicit conjunction in the
set difference definition. Specifically, say .psi.' is weakly
closer to .phi. than .psi. if either: (a)
(.parallel..psi.'.parallel.\.parallel..phi..parallel.).OR
right.(.parallel..psi..parallel.\.parallel..phi..parallel.); or (b)
(.parallel..phi..parallel.\.parallel..psi.'.parallel.).OR
right.(.parallel..phi..parallel.\.parallel..psi..parallel.). This
requires only that either: (a) .psi.' falsifies some possible
worlds satisfied by .psi. that are falsified by .phi.; or (b)
.psi.' satisfies some possible worlds falsified by .psi. that are
satisfied by .phi.. Unlike with symmetric difference, this allows
the new formula to "overshoot" while adding or removing worlds from
.psi. to more closely approximate .phi..
[0255] The following restrictions on syntactic moves will ensure
that each move takes us closer to some bid formula in this weaker
sense: [0256] a literal l can be added to a term t if there exists
a bid formula .phi..sup.i containing a term t' such that t'|.noteq.
[0257] a literal l can be added as a new term iff there exists a
bid formula .phi..sup.i containing a term t' such that l is
contained in t'.
[0258] The other syntactic restrictions mentioned above can be
applied (e.g., so cycles are not introduced). It is not hard to
show that each move satisfies the weaker definition of closeness.
Notice that it is not required that each move is closer to a fixed
bid formula, just that each stage moves closer to some bid
formula.
[0259] Although DNF can be a convenient form for manipulating
logical formulae, unfortunately negation of a DNF formula can be
computationally expensive (which is required for one side of the
split .alpha. .beta.). An alternative is to represent formula via
binary-decision diagrams (BDDs), which can represent many formulae
very compactly, and for which negation is computationally simple.
For BDDs, consider a more restricted form of the formula search
process, wherein the formula .beta..sub.j on which .alpha..sub.j is
split is built up as follows. Denote Dom(F.sup.i)\{f.sub.k.sup.i}
as f.sub.k.sup.i, i.e., the exclusion of the value k for attribute
i (note that this definition generalizes to the case in which
attributes have more than two values). First, consider each)
.beta..sub.j.sup.1 consisting of f.sub.k.sup.i for a single i and
k. That is, at the first "level" consider splits that exclude one
attribute-value and "commit" to a single attribute-value exclusion
with the best score score(.alpha..sub.j, .beta..sub.j.sup.1,
.beta..sub.j.sup.1). Then, consider refining .beta..sub.j.sup.1 by
conjoining with some new f.sub.k.sup.i or disjoining with some new
f.sub.k.sup.i (conjoining tightens .beta..sub.j.sup.1, disjoining
relaxes it). Each resulting .beta..sub.j.sup.2 is scored in a
similar fashion, and a commitment is made to the .beta..sub.j.sup.2
with the highest score. This continues for m iterations, where m is
either a fixed threshold or is determined dynamically by requiring
a minimum score improvement be met. The best split of .alpha..sub.j
is determined heuristically as .beta..sub.j, .beta..sub.j, where
.beta..sub.j=.beta..sub.j.sup.m.
[0260] This more restricted formula search can be realized without
using BDDs, but with some other representation of the logical
formulae corresponding to abstract channels.
[0261] Given a current abstraction A, the .alpha..sub.j.di-elect
cons.A with the highest-scoring best split is adopted, creating a
new abstraction A' with .alpha..sub.j replaced by
.alpha..sub.j.beta..sub.j and .alpha..sub.j .beta..sub.j. The LP
resulting from the new abstraction is solved, and the search for a
best split repeated until the score of the best split of A falls
below some threshold .tau..
[0262] Overall Refinement Process.
[0263] With reference to FIG. 3 the overall refinement process can
be visualized as FIG. 3. Initially, start with a particular
abstraction, or collection of channels, for instance, beginning
with a fully abstract, single channel problem, and solve the LP
corresponding to Abstraction 0. With the optimal solution in hand,
arbitrary splits of this abstraction can now be scored. This
information is used to guide the search process described in the
previous section. Once the DNF (or BDD) formula-split tree is
searched, a specific split of an existing channel is made. This
leads to a refined Abstraction 1. The corresponding abstract LP,
LP1, is then solved and the process repeats. Notice that only one
LP is solved per abstraction. The split-search requires no
reoptimization.
[0264] Adding LP Constraints, LP Expressiveness, IP
Expressiveness.
[0265] The model thus far can accommodate only budget and supply
constraints. In principle, there is no problem accommodating other
forms of constraints that don't involve integer variables. Of
course, there is a significant question as to what types of
constraints can and should actually be modeled this way. Supply
constraints are obvious constraints, as are budget constraints. But
many other forms of expressiveness are not most naturally modeled
as constraints, but perhaps can be usefully approximated this way
as long as they never force infeasibility. Specifically, are
desirably formulated constraints (e.g., smoothness) in such a way
that a trivial option exists to maintain feasibility (such as no
assignment to a bid).
[0266] Forms of expressiveness that can be expressed using LP
constraints can be introduced into the process using the same
column generation process. Reduced cost calculations can be
modified appropriately to handle constraints in addition to budget
constraints.
[0267] Some forms of expressiveness are most naturally expressed
using integer variables, thus requiring a MIP formulation. One
limitation of the column generation model as proposed is its focus
on LP expressiveness. One approach is to use MIP column generation
(branch-and-price) techniques known in the art. However, recall
that the abstraction process is used to create the set of abstract
channels to be used in MIP optimization; i.e., the intended output
of this process is a set of a-channels, not (necessarily) the
allocation itself. Thus, given an allocation problem with IP
expressiveness, column generation with a linear relaxation of the
problem can be used to generate abstract channels. Once the
abstract channels are constructed, the "original" MIP can then be
solved using allocation to the abstract channels created, with
appropriate discounting of impression values or count variables by
the probability of a bid receiving a relevant impression within an
a-channel. If the original problem uses only LP expressiveness,
then the LP solution used to create the final refinement will be
the optimal allocation and no re-solve is needed.
[0268] To illustrate this column generation approach for MIPs,
consider using the column generation process described above is
used to handle bonus (all-or-nothing) bids. A bonus bid requires an
integer variable to denote whether its target has been met (and
hence whether payment is to be made). This can be approximated in
an LP as follows: replace each bonus bid by a simple item-price bid
with a budget constraint (e.g., if k impressions of a are valued at
d, all or nothing, replace with a per-item bid for a with value d/k
per impression and a budget constraint of d). Record when each such
approximated bid i has a significant influence on the choice of a
split; this can be recognized by examining the contribution of the
x.sup.i variable to the reduced cost computation for a split that
has been acted on. With each LP solution (after each split), it is
expected that the bid should get close to meeting its budget
constraint. If it is close, there is good reason to believe that
the IP based on the existing abstraction will assign it what it
needs to meet its threshold (intuitively, the remaining supply
needed has very high marginal value). However, if the bid is not
"close" to its target, this suggests that its average value per
impression is simply not competitive, and that the IP is unlikely
to satisfy the bid. So any splits where bid i has had undue
influence can be retracted. In this manner, channel capacity is not
wasted trying to satisfy a bid that is unlikely to be satisfied in
the final IP.
[0269] If there is a split that improves objective value (for an
all-or-nothing bid) in the MIP, then there is a value-improving
split in the LP as well. LP estimates of split values will be less
than the MIP value in general.
[0270] Using Abstractions in Ad Auction Optimization.
[0271] To explore this column generation approach to channel
abstraction, experiments were run with the column generation model
on a collection of random problems, some with LP expressiveness
only, others with IP expressiveness. The experiments were run on a
computer with a 3.8 GHz Xeon CPU, 2BM cache, and 16 GB RAM.
[0272] LP Expressiveness.
[0273] The first battery of problems involves bids that use only LP
expressiveness; specifically, each bid has per-impression
valuations for a particular set of attribute-values over a given
time period, along with a total budget. Optimizations are performed
over a time horizon of 30 periods. This battery includes multiple
sets of problem instances, with each set characterized by two
parameters: m binary attributes and n bidders. Sets of instances
were run with n=10 m for m.di-elect cons.{10, 20, 30, . . . ,
100}.
[0274] Supply distribution. The probability of a unit of supply
satisfying attribute-value f.sub.1.sup.i is drawn from U[0,1]:
since Dom(F.sup.i)={f.sub.1.sup.i,f.sub.2.sup.i}, then
Pr(f.sub.2.sup.i)=1-Pr(f.sub.1.sup.i). The total supply of
impressions, over all attribute-values, is 1,000,000 for each time
period.
[0275] Bids. Each bid j has the form .phi..sup.j, v.sup.j, g.sup.j,
w.sup.j. Bid j cares about a set of attributes A.sup.j, with size
|A.sup.j|.about.U[0,10]. The model embodies the assumption that
bidders tend to have a lot of commonality with respect to the
attributes they care about, so a bid's attributes are sampled from
a Zipfian distribution, with the probability of choosing attribute
F.sup.i being
Pr(F.sup.i)=(1/i)/(.SIGMA..sub.1.ltoreq.k.ltoreq.m1/k), without
replacement. For a given attribute F.sup.i.di-elect cons.A.sup.j,
the bid requires that impressions satisfy f.sub.z.sub.i.sup.i, with
z.sub.i.di-elect cons.{1,2} chosen uniformly at random. The bid's
formula is then the conjunction of all required attributes,
.phi..sup.jF.sup.i.di-elect cons.A.sup.jf.sub.z.sub.i.sup.i.
[0276] The bid valuation model reflects the intuition that bidders
tend to place higher value on more specific bids (i.e., with more
attributes), and higher value if the attributes in their bid
formula (i.e., that they care about) are in greater demand Bidder
j's per impression value v.sup.j is determined as follows. First,
draw a "base value" {circumflex over (v)}.sup.j from U[0.1,1] then
adjust it to account for the aforementioned factors by setting
{circumflex over (v)}.sup.j(1+10.SIGMA..sub.F.sub.i.sub..di-elect
cons.A.sub.jPr(F.sup.i)). That is, if the bid cares about no
attributes, then v.sup.j={circumflex over (v)}.sub.j, whereas if
were to care about all m attributes, then v.sup.j=11{circumflex
over (v)}.sub.j.
[0277] A bid's time window w.sup.j is chosen as follows. Select
numbers t.sub.1 and t.sub.2 from U[-10, 40], set
w.sup.j=[min(t.sub.1,t.sub.2), max(t.sub.1,t.sub.2)], then truncate
w.sup.j to be in [1,30]. This incorporates the idea that some bids
will have time windows that extend beyond the optimization
horizon.
[0278] A bid's budget is set to a fraction of the value of the
total supply that it cares about. Namely, if .sigma..sub.j is the
total supply of formula .phi..sup.j of during time window w.sup.j,
then the budget is g.sup.j=.tau..sup.j.sigma..sub.j with
.tau..sup.j.about.U[0.1,1].
[0279] In addition to the bids described above, an additional
"market" bid is included with value 0.1, unlimited budget, and no
attribute preferences (i.e., .phi.=True). This is used to account
for value that might be obtained from other sources (e.g., future
bids or a spot market).
[0280] Optimization Parameters.
[0281] During an iteration of column generation, a search is made
for a suitable split so long as a channel refinement can be found
that provides a score that offers a certain minimum improvement
over the previous abstraction. Parameter MI sets this target: if
some refinement offers at least an MI fractional improvement over
the allocation value of the most recent LP, the process continues;
if there is no such refinement on any channel, column generation is
terminated. Note that, even if there is no MI improvement, it does
not necessarily mean the allocation value is within MI fraction of
the true optimal value. Rather, it means there is no myopic
improvement of at least MI that can be obtained within the
restricted channel splitting space considered. Furthermore, it
could be that some sequence of channel refinements could obtain a
higher improvement. The restricted space of channel splits under
consideration can obviously impact the ability to find a suitable
refinement. Even without this restriction (i.e., even if splitting
into arbitrary pairs of subsets is allowed), it can be shown that
myopic splitting is insufficient in general when IP expressiveness
is admitted. For certain forms of LP expressiveness, however, it
can be shown that, if an abstraction is not lossless, there always
exists a two-way split of some channel that improves value. Hence,
a myopic search (over an unrestricted split space) is sufficient to
find an optimal, lossless abstraction.
[0282] Estimating an Upper Bound on the Optimal Value.
[0283] To measure how good an allocation is, an estimate is needed
of the true optimum value that could be achieved if all relevant
columns are generated. An upper bound on the optimal is computed as
follows. When column generation is complete, another optimization
is run using undiscounted values. That is, remove all
Pr(.phi..sup.i|.alpha..sub.j) terms. This is clearly an upper bound
on the optimum because it assumes that bids could actually make use
of the entire amount of a channel it is allocated (rather than the
only Pr(.phi..sup.i|.alpha..sub.j) fraction it actually cares about
for channel j). However, this is a very loose upper bound. It can
be tightened by ensuring that a bid's allocation does not exceed
the supply that it actually cares about. That is, additional
constraints of the form
x.sub.j.sup.i.ltoreq.s(.phi..sup.i.alpha..sub.j)/s(.alpha..sub.j)
are added for all bids i and channels j. The allocation value
obtained from this optimization is still an overestimate because it
does not account for interactions between multiple bids. However,
empirically, this upper bound is quite close to an even tighter
upper bound that can be generated via constraint generation.
[0284] Experimental Results.
[0285] Table 1 shows results from runs with parameter MI=0.01,
averaged over 20 instances for each m,n pair. The table shows
several key measures including the number of a-channels generated.
The fraction of the upper bound on the optimal value obtained by
the abstract LP when column generation terminates ("Frac UB") is
also shown (giving us a lower bound on the quality of the abstract
allocation relative to the true optimal allocation). An estimate of
the improvement in the degree of optimality is shown ("Improve").
This is reported as the average of (Final-Initial)/UB, where Final
is the final LP value, Initial is the LP value at the start of
column generation (when a single abstract channel is used), and UB
is the upper bound on the optimal value. Finally, the average and
range of runtimes is presented.
[0286] Table 2 shows similar results, but for runs with
MI=0.001.
TABLE-US-00001 TABLE 1 Average results for column generation with
LP expressiveness and MI = 0.01, m attributes, and n bidders. #
Frac Runtime (sec) m n channels UB Improve .mu. range 10 100 12.0
0.893 0.447 12 [4, 24] 20 200 11.0 0.828 0.364 40 [8, 74] 30 300
10.2 0.841 0.380 75 [35, 150] 40 400 9.8 0.803 0.334 153 [28, 556]
50 500 10.0 0.816 0.396 212 [23, 418] 60 600 8.6 0.827 0.343 245
[33, 470] 70 700 8.3 0.824 0.304 314 [26, 656] 80 800 9.2 0.824
0.345 461 [101, 940] 90 900 8.6 0.806 0.333 566 [75, 1211] 100 1000
9.3 0.804 0.344 811 [203, 1438]
TABLE-US-00002 TABLE 2 Average results for column generation with
LP expressiveness, MI = 0.001, m attributes, and n bidders. # Frac
Runtime (sec) m n channels UB Improve .mu. range 10 100 32.4 0.965
0.515 53 [10, 112] 20 200 33.8 0.905 0.439 317 [21, 758] 30 300
27.1 0.899 0.438 538 [112, 1384] 40 400 28.6 0.871 0.399 1247 [211,
4159] 50 500 26.8 0.871 0.450 1543 [153, 4027] 60 600 22.7 0.877
0.392 1775 [88, 4798] 70 700 19.3 0.867 0.346 1959 [66, 5878] 80
800 24.2 0.873 0.393 3746 [469, 8670] 90 900 24.0 0.858 0.374 4956
[807, 14534] 100 1000 25.7 0.853 0.392 6687 [1677, 17047]
[0287] As can be seen, with LP expressiveness, column generation
can obtain a significant fraction of the upper bound value for
problems in which it would be impossible to even enumerate the full
unabstracted LP. Setting a lower value for the minimum improvement
parameter MI allows a greater fraction of the upper bound to be
obtained, but with a fairly significant increase in run time. This
suggests adopting a more sophisticated technique that occasionally
computes an upper bound during the course of column generation
(using the current set of channels), then weighs the additional
potential improvement against the amount of time already spent.
[0288] Fortunately, although the number of potential channels
increases exponentially in m and n, the column generation procedure
described herein can obtain high value with very few channels.
Indeed, the number of generated channels, and the resulting quality
of solution, are comparable across all m and n tested. Furthermore,
on average, much of the improvement is obtained early in the
procedure. FIG. 4 shows the fraction of upper bound obtained after
a given number of channels has been generated, averaged over 20
instances, with m=10, n=20, and MI=0.001. A high fraction of the
upper bound was obtained from the first few channels generated,
with additional channel splitting providing more modest
improvement.
[0289] IP Expressiveness.
[0290] The second battery of problems include some bidders with
all-or-nothing bonus bids, in addition to bidders with only
per-impression values and budgets. Note that all-or-nothing bids
require binary variables, hence the LP relaxation of the column
generation procedure will be only an approximation.
[0291] All problems had 100 attributes, n.sub.b bonus bidders, and
n.sub.i=4n.sub.b per-impression bidders, with n.sub.b={10, 20, . .
. , 60}. The preferences of per-impression bidders is determined as
before. A bonus bidder had .phi..sup.j and w.sup.j chosen
similarly. However, its per-impression value is v.sup.j=0, and
instead the bidder is willing to pay b.sup.j if he receives at
least q.sup.j impressions satisfying .phi..sup.j, but zero
otherwise. q.sup.j is selected to be a fraction .tau..sup.j of the
total supply the bid cares about, namely,
q.sup.j=.tau..sup.j.sigma..sub.j, with .tau..sup.j.about.U[0.1,1],
and .sigma..sub.j the total supply of formula .phi..sup.j during
time window w.sup.j. Then, set b.sup.j={circumflex over
(b)}.sup.jq.sup.j where {circumflex over (b)}.sup.j is chosen as
for a flat bidder, but then multiplied by a factor chosen from
U[0.1,1]. Also included is a "market" bid as described above.
[0292] Table 3 shows results with MI=0.01, averaged over 20
instances from each n.sub.b,n.sub.i pair. Shown are the number of
channels generated, the fraction of the upper bound (on the
optimum) obtained by when column generation terminates ("Frac UB"),
the improvement over the fraction of the upper bound obtained
before column generation ("Improve"), and the range of runtimes
over the 20 instances. For these instances, although LP relaxation
is used to determine which channels to split, MIPs are run to
determine the abstract allocation value (hence fraction of the
upper bound) obtained before column generation and at the end of
column generation.
TABLE-US-00003 TABLE 3 Average results for column generation with
IP expressiveness, MI = 0.01, 100 attributes, n.sub.b bonus
bidders, and n.sub.i per-impression bidders. # Frac Runtime (sec)
n.sub.b n.sub.i channels UB Improve .mu. range 10 40 6.6 0.847
0.248 41 [5, 82] 20 80 6.6 0.815 0.252 66 [15, 129] 30 120 7.0
0.769 0.264 91 [14, 205] 40 160 8.5 0.790 0.296 153 [31, 282] 50
200 8.8 0.823 0.325 188 [39, 613] 60 240 6.8 0.814 0.289 92 [5,
325]
[0293] Although column generation operates on a relaxation of the
true IP, the scoring function is nevertheless effective in guiding
the procedure to good channel splits. Indeed, the performance with
IP expressiveness compares favorably to that with LP
expressiveness. It is emphasized that these campaign-level
optimizations are run offline, and used to parameterize dispatch
policies that are then implemented in real time. Thus the times
reported here allow frequent, multiple optimizations (and
reoptimization) of offline allocations.
[0294] Overall Architecture and Other Considerations.
[0295] The following details the potential use of abstract channels
created by the column generation approach (or by any other
approach) in a fully integrated optimization-based advertising
system. [0296] Overall architecture: it is envisioned using the
LP-column generation approach to provide a core abstraction that
will be refined as specific bids are received. More precisely,
statistical summaries could be used to produce a set of
prototypical bids and/or demand profiles from which a concrete
abstraction is generated. However, this abstraction will not use up
all channel capacity. By this is meant: suppose that IP
optimization problems can be solved with up to k channels in the
online setting; then the offline abstraction will include m
channels, with m<k. At optimization time, the actual set of bids
active at that time will be used to refine the core abstraction,
refining channels in a way that accommodates the specific bids that
need to be optimized in the IP. This refinement process could
involve the same column generation algorithm, or it could be quite
different (e.g., by dealing directly with IP expressiveness). The
key is that, with judicious choice of the prototypical bids used to
generate the core abstraction, (hopefully) very few refinements
will be needed to adjust the abstraction to the current set of
bids. There are many refinements on this high-level idea too (e.g.,
focusing only on high-level bids), solving the core abstraction
using only the simplest item-price and demand distributions for all
combinations of attributes, etc. [0297] There needn't be a single
core abstraction: there could be different core abstractions for
different parts of the day, reflecting the fact that demand and
supply are likely to be highly variable. Accordingly, the use of
day-part, day-of-week, seasonal, or other core abstractions is
envisioned. [0298] Suppose T is the optimization time interval
(i.e., assignments to channels are considered over the course of a
day, and optimize over some number of periods (i.e., days)). Bids
distinguishing subperiods of the default period are treated as any
other attribute, e.g., day-parts would simply be attributes that
could be used to define channels like gender. [0299] Defining a
suitable set of attributes is important. Especially important is
how ordered (discrete or continuous) variables are treated. If many
bids care about golfers of a certain age, then it is not
necessarily desirable to split on all possible ages. Instead, the
process could take a "data-driven" approach much like in decision
tree induction. If some bids want age 25-45 and other bids want age
35-60, the ages could be aggregated into three intervals: 25-35,
35-45, 45-60.
[0300] Lossless Abstraction.
[0301] To create abstractions using full MIP expressiveness in such
a way that no loss in objective value (e.g., revenue) occurs, the
following process is used with a sequence LP solves.
[0302] Full Attribute Expansion.
[0303] Assume that there are n distinct binary attributes that have
been specified in bids (although the algorithms in no way rely on
attributes being binary). To correctly encode this in a MIP, every
bid in the ad auction language will expand to up to 2.sup.n binary
variables. The degree of expansion will be greater for less
specific bids. Additionally, up to 2.sup.n binary variables will be
needed to encode the supply constraints.
[0304] As an example, consider attributes for gender and age.
Gender can be male (m) or female ( m), age can be young (y) or old
( y) Denote a bid by x (gender, age), where "don't care" is
specified by .perp.. One advertiser bids for males, x.sup.1 (m,
.perp., .perp.), the second bids for young people, x.sup.2 (.perp.,
y, .perp.), and the third bids for everything, x.sup.3 (.perp.,
.perp.). The bids expand into MIP variables as follows:
x.sup.1(m,.perp.):x.sup.1(m, y),x.sup.1(m,y)
x.sup.2(.perp.,y):x.sup.2( m,y),x.sup.2(m,y)
x.sup.3(.perp.,.perp.):x.sup.3( m, y),x.sup.3( m,y),x.sup.3(m,
y),x.sup.3(m,y)
[0305] Letting s(gender, age) be the quantity of a channel
available, the following constraints are needed:
x.sup.3( m, y).ltoreq.s( m, y)
x.sup.2( m,y)+x.sup.3( m,y).ltoreq.s( m,y)
x.sup.1(m, y)+x.sup.3(m, y).ltoreq.s(m, y)
x.sup.1(m,y)+x.sup.2(m,y)+x.sup.3(m,y).ltoreq.s(m,y)
[0306] Attributes can result in an optimization problem that is too
large to solve. If there are enough attributes, it may even be too
large to represent. One way to deal with this is through attribute
abstraction, that is by ignoring some of the attributes during
optimization, as described above.
[0307] Dispatching with Attribute Abstraction.
[0308] If the optimization is solved with abstracted channels,
there is an issue of how to dispatch. An intelligent dispatch would
consider attributes specified in a bid, even if some of those
attributes were ignored during optimization.
[0309] Consider the simple example with above with three bids.
Assume the optimizer determines fractional allocations for
undifferentiated supply. Assume these are: x.sup.1=0.6,
x.sup.2=0.3, and x.sup.3=0.1. The problem is that this allocation,
while feasible in the abstracted space, is not feasible in the
non-abstracted space.
[0310] How should this allocation be translated differentiated
supply for dispatch? One simple, but unsatisfactory, approach would
be to first determine dispatch eligibility (based on all
attributes), then rescale the probabilities accordingly. For
instance, if a young male arrived, then x.sup.2 and x.sup.3 would
be eligible and the probabilities would be scaled by 0.4, giving
x.sup.2=0.3/0.4 and x.sup.3=0.1/0.4. If an old female arrived, then
x.sub.1 and x.sub.3 would be eligible, the probabilities would be
scaled by 0.7. This approach can lead to realized allocation that
differs quite a bit from the one computed by the optimizer, even in
a deterministic setting. Assume that the attribute-specific
supplies are as follows (expressed as fractions of the
undifferentiated supply): s( m, y)=0.1, s( m,y)=0.1, s(m, y)=0.7,
s(m,y)=0.1. In this case, advertiser 3 will get all of the old male
supply, plus some of the other supplies, which will amount to
significantly more than the 0.1 fraction of the overall supply
specified by the optimizer. On the other hand, advertiser 1 will
get significantly less than the 0.6 fraction of the overall supply
specified by the optimizer because 1) the fraction of females is
only 0.2 of the overall supply, and 2) advertisers 2 and 3 will get
some of the female supply.
[0311] The optimization using the abstraction generated by the
above-described column generation process assumes that any ad
allocated to an a-channel .alpha. will be randomly dispatched to
the component c-channels that make up .alpha.. This is reflected in
the MIP (or LP) objective function by replacing the per-impression
value v.sup.i that bid i has by
v.sub..alpha..sup.iPr(.phi..sup.i|.alpha.). With a well-crafted
abstraction, this may be sufficient to produce an optimal
allocation (e.g., consuming as much of each advertiser's budget as
possible). However, if the number of a-channels is limited for
computational reasons, the "pessimistic" assumption of random
dispatch may leave revenue on the table.
[0312] Allocating with Abstract Channels
[0313] Another means of optimizing with abstract channels that
relies on constraint generation in the abstract MIP (or LP) to
allocate the supply of abstract channels to bids non-uniformly,
thus improving the revenue generated by ad allocation will now be
described.
[0314] Let A be a channel abstraction and M the optimistic MIP in
which bids are assigned to abstract channels, but in which each
impression to a bid i is assumed to satisfy the bid formula
.phi..sub.i. This assumption is embodied by replacing the
per-impression value v.sub..alpha..sup.i for a-channel .alpha. by
v.sup.i itself: in other words, assume that every ad i assigned to
.alpha. will be dispatched intelligently, thus guaranteeing that
.phi..sup.i is satisfied. In a simple two-bid, two a-channel
example, the resulting MIP (in this case, LP) would be:
Max v 1 x .alpha. 1 1 + v 2 x .alpha. 1 2 + v 1 x .alpha. 2 1 + v 2
x .alpha. 2 2 s . t . x .alpha. 1 1 + x .alpha. 1 2 .ltoreq. s (
.alpha. 1 ) x .alpha. 2 1 + x .alpha. 2 2 .ltoreq. s ( .alpha. 2 )
##EQU00021##
[0315] The optimistic assumption embodied in this formulation is,
of course, unreasonable in general. There is no reason to believe,
for example, that the allocation of bids to .alpha..sub.1 permits
feasible "packing" of their promised supply in such a way that each
bid i gets only .phi..sup.i-impressions. However, this assumption
can be tested by solving an LP that determines whether there is
enough supply to do just this: this example strives to determine if
.alpha..sub.1 contains enough .phi..sup.1 and .phi..sup.2 supply to
meet the "obligations" contained in the solution of the optimistic
MIP. Test a-channel .alpha..sub.2 can be similarly tested.
[0316] Constraint Generation.
[0317] Next, a constraint generation approach is described wherein
bids are kept in an abstracted form and multiple optimizations
solved, each time adding constraints to correct infeasibilities
when dispatching in the non-abstracted space. The high-level
algorithm is as follows: [0318] 1. Solve the abstracted optimiztic
MIP. [0319] 2. Solve an LP to determine a dispatch policy in the
non-abstracted space. [0320] 3. If the LP in Step #2 feasible, the
process is done. Otherwise, generate a set of mutually
unsatisfiable constraints. From these, generate new constraints for
the MIP in Step #1 and then resolve.
[0321] Dispatch LP.
[0322] Let et {dot over (x)}={{dot over
(x)}.sub..alpha..sub.j.sup.i} be the solution of the optimistic MIP
(Step 1) with a-channels {.alpha..sub.j}. Let W(j)={i:{dot over
(x)}.sub..alpha..sub.j.sup.i>0} denote the "winners" of
a-channel .alpha..sub.j. Thus, solve the following LP for each
.alpha..sub.j (with a constant objective, since the aim is only to
determine feasibility):
max 1 s . t . c .di-elect cons. .alpha. j , c | = .PHI. i x c i = x
. .alpha. j i .A-inverted. i .di-elect cons. W ( j ) ( 1 ) i
.di-elect cons. W ( j ) x c i .ltoreq. s ( c ) .A-inverted. c
.di-elect cons. .alpha. j ( 2 ) ##EQU00022##
[0323] This LP determines a feasible allocation of bids i that
share .alpha..sub.j to the c-channels that constitute
.alpha..sub.j, thus guaranteeing that every impression given to i
satisfies its bid condition .phi..sup.i. The first set of
constraints ensures there is enough .phi..sup.i supply for each bid
i--call these bid adequacy constraints--while the second
establishes that no constituent c-channel is overallocated--call
these channel supply constraints. There is no objective function,
because the goal is to only determine feasibility.
[0324] If the LP gives a feasible solution, the dispatch policy is
defined. If not, additional constraints are generated for the MIP.
Before describing this procedure, the nature of these constraints
will be described.
[0325] Computationally, the most demanding aspect of this algorithm
is the solution of the LPs used to generate constraints. While the
solution of LP(.alpha..sub.j) could, in principle, require an
exponential number of variables (i.e., the x.sub.c.sup.i
corresponding to all c-channels c.di-elect cons..alpha..sub.j) and
constraints, the use of simple lossless channel abstraction
collapses this number. As such, the number of winners for each
channel (and the interaction of their bids) determines the true
complexity of the required LP solves. The interaction is in fact
even less when one accounts for time windows: a separate
feasibility testing/generation process is invoked for each
a-channel, time-period pair.
[0326] The constraint generation algorithm can be used directly to
solve the ad allocation MIP without relying on column generation.
For example, it can be applied directly to the fully abstract MIP
with a single a-channel (T). It could also be used to optimize over
any heuristically chosen abstraction, using a process other than
column generation.
[0327] Projection Into the Abstracted Space.
[0328] By adding additional supply constraints to the abstract MIP,
it can be transformed into a projection of the non-abstracted
problem, ensuring that any solution to the abstract MIP is a
feasible in the non-abstracted space. Herein, two bids (i,i')
overlap on an abstract channel .alpha..sub.j if there is a
c-channel c.di-elect cons..alpha..sub.j where c|=.phi..sup.i and
c|=.phi..sup.i'. Also, a set of bids B is connected on
.alpha..sub.j if, for any pair of bids (i,i') in B there is a
sequence of bids (i=b.sup.1, b.sup.2, . . . , b.sup.n=i') in B such
that every b.sup.k and b.sup.k+1 overlap on .alpha..sub.j. For all
abstract channels .alpha..sub.j where all sets of bids B are
connected on .alpha..sub.j add the following constraint:
i .di-elect cons. B x i .ltoreq. c .di-elect cons. .alpha. j , c |
= .PHI. i , i .di-elect cons. B s ( c ) ( 3 ) ##EQU00023##
Adding all such constraints makes the MIP a projection of the
non-abstracted space.
[0329] In the worst case, this will require 2.sup.m-1 constraints,
where m is the number of bids. Alternatively, the total number of
constraints can be bounded by 2.sup.2.sup.n, where n is the number
of binary attributes. This can be seen by observing that some of
the constraints in Eq. (3) are dominated. In particular,
.SIGMA..sub.i.di-elect cons.Bx.sup.i+.SIGMA..sub.k.di-elect
cons.B'x.sup.i.ltoreq.s dominates (and hence is tighter than)
.SIGMA..sub.i.di-elect cons.Bx.sup.i.ltoreq.s, for some s. Thus, at
most one constraint needs generated for each subset of the set
attribute-value assignments a, giving at most 2.sup.2.sup.n
constraints.
[0330] To see how the projection works, consider the example from
the above section entitled "Full Attribute Expansion". If a full
abstraction is performed (eliminate all attributes), then the bids
are x.sub.1, x.sub.2, and x.sub.3. All of the bids overlap. The
complete set of constraints (including dominated) are:
x.sup.1.ltoreq.s(m, y)+s(m,y) (4)
x.sup.2.ltoreq.s( m,y)+s(m,y) (5)
x.sup.3.ltoreq.s( m, y)+s( m,y)+s(m, y)+s(m,y) (6)
x.sup.1+x.sup.2.ltoreq.s(m, y)+s( m,y)+s(m,y) (7)
x.sup.1+x.sup.3.ltoreq.s( m, y)+s( m,y)+s(m, y)+s(m,y) (8)
x.sup.2+x.sup.3.ltoreq.s( m, y)+s( m,y)+s(m, y)+s(m,y) (9)
x.sup.1+x.sup.2+x.sup.3.ltoreq.s( m, y)+s(m, y)+s(m, y)+s(m,y)
(10)
Only constraints (4), (5), (6), and (10) are undominated. The four
undominated constraints are less than the 2.sup.2.sup.n=16 upper
bound because constraints are not needed for all subsets of the s's
(for instance, a constraint is not needed for s(m,y) alone because
there are no bids for (m,y) only). Nevertheless, in general the
full set of projection constraints will be far too many to
enumerate. Instead, the LP is used to generate only the violated
constraints, because it is believed that far fewer than the entire
set of constraints are needed.
[0331] Generating New Constraints.
[0332] If LP(.alpha..sub.j) is feasible for each .alpha..sub.j,
then it provides an optimal dispatch policy that extracts the full
objective value of the optimistic MIP. If not, then constraints are
posted on the solution of the optimistic MIP and resolve. In
particular, let LP(.alpha..sub.j) be infeasible. Then there must be
some minimal set of constraints that are jointly infeasible. Let
S=S.sub.a.orgate.S.sub.s be such a minimal set, where S.sub.a are
bid adequacy constraints (1) and S.sub.s are channel supply
constraints (2). The solution of the optimistic MIP violates the
inequality:
i .di-elect cons. S a x .alpha. j i .ltoreq. c .di-elect cons. S s
s ( c ) ( 11 ) ##EQU00024##
[0333] The MIP can be resolved by posting this constraint to ensure
that overallocation of the channels in S.sub.s does not occur for
the purposes of maximizing value extracted from bids in S.sub.a. A
tighter version of this constraint can be employed by adding to the
sum on the lefthand side any bid i all of those relevant channels
included in S.sub.s, i.e., any i s.t. {c.di-elect
cons..alpha..sub.j:c|=.phi..sup.i}.OR right.S.sub.s. Note that
while all minimal sets S will induce a violated constraint, other
sets S may also induce violated constraints.
[0334] At each iteration, sets S leading to violated constraints
are identified for each a-channel and posted. These can be
identified using the facilities of standard solvers, such as the
CPLEX IIS (irreducible inconsistent set) routine run on a computer,
like the computer shown in FIG. 1. The MIP is resolved until
feasibility is attained (in which case full optimistic objective
value is obtained), or computational or time bounds are
reached.
[0335] Equation 11 is a violated constraint if it is determined by
the CPLEX IIS routine. First, it is shown that
.SIGMA..sub.i.di-elect cons.S.sub.a
x.sub.a.sub.j.sup.i.ltoreq..SIGMA..sub.c.di-elect cons.S.sub.s s(c)
is a projection constraint (3) for the MIP. This has two
requirements: 1) the bids S.sub.a are connected on .alpha..sub.j,
and 2) .SIGMA..sub.c|=.phi..sub.i.sub.,i.di-elect
cons.S.sub.a,c.di-elect cons..alpha..sub.j
s(c)=.SIGMA..sub.c.di-elect cons.S.sub.s s(c). Consider requirement
1. Assume there are two subsets S.sub.a.sup.1 and S.sub.a.sup.2 of
S.sub.a that are not connected. It must be that none of the
constraints in S.sub.s include a bid from both S.sub.a.sup.1 and
S.sub.a.sup.2. But then it cannot be the case that the constraints
on S.sub.a.sup.1 are fully independent of the constraints on
S.sub.a.sup.2, which violates a condition of the CPLEX IIS routine.
Now consider requirement 2. Note that the LP constraints of type
(1) alone cannot cause infeasibility. In particular, if a
constraint for x.sub.i is in S.sub.a, then all constraints of type
(2) corresponding to x.sub.i must appear in S.sub.s. It immediately
follows that .SIGMA..sub.c|=.phi..sub.i.sub.,i.di-elect
cons.S.sub.a.sub.,c.di-elect cons..alpha..sub.j
s(c)=.SIGMA..sub.c.di-elect cons.S.sub.s s(c).
[0336] Next it can be shown that .SIGMA..sub.i.di-elect
cons.S.sub.a
x.sub.a.sub.j.sup.i.ltoreq..SIGMA..sub.c|=.phi..sub.i.sub.,i.di-elect
cons.S.sub.a.sub.,c.di-elect cons..alpha..sub.j
s(c)=.SIGMA..sub.c.di-elect cons.S.sub.s s(c) is violated in the
MIP. The constraints in S.sub.s may contain some variables that are
not in S.sub.a. Denote as S'.sub.s the constraints S.sub.s but with
the variables not in S.sub.a removed. It must be that
S.sub.a.orgate.S'.sub.b constitutes an irreducible inconsistent set
(IIS), because if the removed variables were required to have an
IIS, the original IIS would include constraints in S.sub.a
containing those variables. Summing the constraints in S'.sub.s,
procedures .SIGMA..sub.i.di-elect
cons.S.sub.a.SIGMA..sub.c|=.phi..sub.i
x.sub.x.sup.i.ltoreq..SIGMA..sub.c.di-elect cons..sub.s s(c). But
since .SIGMA..sub.c|=.phi..sub.i x.sub.x.sup.i={dot over
(x)}.sub..alpha..sub.j.sup.i is in S.sub.a, and since S.sub.a and
S'.sub.s are mutually unsatisfiable, it must be that
.SIGMA..sub.i.di-elect cons.S.sub.a
x.sub..alpha..sub.j.sup.i>.SIGMA..sub.c.di-elect cons.c.di-elect
cons..sub.s s(c). Hence, .SIGMA..sub.i.di-elect cons.S.sub.a
x.sub..alpha..sub.j.sup.i.ltoreq..SIGMA..sub..di-elect cons.S.sub.s
s(c) is violated for allocation {dot over (x)}, which is the wanted
proof.
[0337] Alternatively a special purpose algorithm can be used to
identify violated constraints.
[0338] In general, Equation 11 may be tightened. Let
C.sub.i={c|=.phi..sup.i|c.di-elect cons..alpha..sub.j}. If there is
some bid iS.sub.a such that C.sub.i.OR right.S.sub.c, then
x.sub..alpha..sub.j.sup.i can be added to the left hand side of the
constraint to make it tighter. Thus, instead of Equation 11, add
the constraint
i C i S c x .alpha. j i .ltoreq. c .di-elect cons. S s s ( c ) ( 12
) ##EQU00025##
to the MIP then go back to Step #1 of the constraint generation
procedure.
[0339] Example of the Constraint Generation Procedure.
[0340] Consider three bids from above: x.sup.1(m,.perp.),
x.sup.2(.perp.,y) and x.sup.3(.perp.,.perp.). Assume that the
non-abstract channel supply is s( m, y)=s( m,y)=s(m, y)=s(m,y)=25
In Step #1, a MIP is specified with a single abstract channel, so
the only channel supply constraint is
x.sup.1+x.sup.2+x.sup.3.ltoreq.100 (13)
If the MIP allocation is ({dot over (x)}.sup.1,{dot over
(x)}.sup.2, {dot over (x)}.sup.3)=(50, 50, 0), then in Step #2, an
LP is specified with the following constraints:
x.sup.1(m, y)+x.sup.1(m,y)=50 (14)
x.sup.2(m, y)+x.sup.2(m,y)=50 (15)
x.sup.3( m, y)+x.sup.3( m,y)+x.sup.3(m, y)+x.sup.3(m,y)=0 (16)
x.sup.3( m, y).ltoreq.25 (17)
x.sup.2( m,y)+x.sup.3( m,y).ltoreq.25 (18)
x.sup.1(m, y)+x.sup.3(m, y).ltoreq.25 (19)
x.sup.1(m,y)+x.sup.2(m,y)+x.sup.3(m,y).ltoreq.25 (20)
The LP is infeasible, and the IIS is {(14), (15), (18), (19),
(20)}. Therefore, the constraint
x.sup.1+x.sup.2.ltoreq.75
is added to the MIP and the MIP is resolved. If the next Step #1
MIP allocation is (75, 0, 25), then the Step #2 LP is:
x.sup.1(m, y)+x.sup.1(m,y)=75 (21)
x.sup.2( m,y)+x.sup.2(m,y)=0 (22)
x.sup.3( m, y)+x.sup.3( m,y)+x.sup.3(m, y)+x.sup.3(m,y)=25 (23)
x.sup.3( m, y).ltoreq.25 (24)
x.sup.2( m,y)+x.sup.3( m,y).ltoreq.25 (25)
x.sup.1(m, y)+x.sup.3(m, y).ltoreq.25 (26)
x.sup.1(m,y)+x.sup.2(m,y)+x.sup.3(m,y).ltoreq.25 (27)
This LP is infeasible, and the IIS is {(21), (26), (27)}.
Therefore, add the constraint
x.sup.1.ltoreq.50 (28)
to the MIP and resolve. If the next Step #1 MIP allocation is (50,
25, 25) then the Step #2 LP is:
x.sup.1( m,y)+x.sup.1(m,y)=75
x.sup.2( m,y)+x.sup.2(m,y)=0
x.sup.3( m, y)+x.sup.3( m,y)+x.sup.3(m, y)+x.sup.3(m,y)=25
x.sup.3( m, y).ltoreq.25
x.sup.2( m,y)+x.sup.3( m,y).ltoreq.25
x.sup.1(m, y)+x.sup.3(m, y).ltoreq.25
x.sup.1(m,y)+x.sup.2(m,y)+x.sup.3(m,y).ltoreq.25
A feasible solution to this LP is x.sup.1(m,
y)=x.sup.1(m,y)=x.sup.3( m, y)=25, x.sup.2(m, y)=x.sup.3(
m,y)=x.sup.3(m, y)=x.sup.3(m,y)=0.
[0341] In this example, the initial Step #1 MIP started with only
one supply constraint (13) based only on supply in the abstract
space. A larger subset of the constraints (3) could have been
chosen with the desire of reducing the number of MIP and LP solves.
One set that would be natural to include in the initial MIP contain
those involving a single bid variable:
x i .ltoreq. c = .PHI. i s ( c ) . ##EQU00026##
In this example, including these constraints would have obviated
the generation of the second constraint (28).
[0342] Early Termination.
[0343] The constraint generation procedure is guaranteed to
terminate because every infeasible Step #2 LP implies a violated
projection constraint (3), and there are a finite number of such
constraints. However, undesirably a very large number of these
constraints could be generated. If the process takes too long, the
process may need to be terminated before a feasible LP is found. In
this case, a relaxed LP can be used to determine a dispatch policy
that is feasible in the non-abstracted space. This relaxed LP keeps
the supply constraints (2) from the original LP but relaxes the MIP
consistency constraints (1) as follows:
.A-inverted. x .alpha. j i .di-elect cons. x , c .di-elect cons.
.alpha. , c = .PHI. i x .alpha. j i - i = x . .alpha. j i ( 29 ) i
.gtoreq. 0. ( 30 ) ##EQU00027##
In general, the solution to the relaxed LP will be suboptimal with
respect to the Step #1 MIP with the generated constraints. One
attempt to bound the value loss is to add the objective
function
Minimize.SIGMA..epsilon..sup.i (31)
[0344] Alternatively, a value p.sup.i could be determined on the
bids x.sup.i (accounting for per-impression value, the effective
per-impression value arising from bonuses, etc.) and, rather than
using constraints (29) and (30), replace the MIP consistency
constraints (1) with
c .di-elect cons. .alpha. j , c = .PHI. i x c i .ltoreq. x .
.alpha. j i ##EQU00028##
and use the objective function
Maximize i p i x i . ##EQU00029##
If a time limit is imposed on the overall optimization, and if it
is not expensive to compute the original LP or one of the relaxed
LPs above, the following revised constraint generation procedure
may improve the quality of the solution: [0345] 1. Initialize the
best allocation x* to 0. [0346] 2. Solve the abstracted MIP. [0347]
3. Attempt to solve the LP defined by constraints (1) and (2). If
this is feasible, set x* to the feasible allocation and terminate.
If not, generate a violated constraint and add it to the MIP.
[0348] 4. Solve one of the relaxed LPs described above. Compute the
value of the allocation using all bid preferences. If the value is
greater than x*, set x* to the new allocation. [0349] 5. If the
time limit has been exceeded, terminate. Otherwise repeat steps
2-5.
[0350] When this revised constraint generation procedure
terminates, x* will have the best feasible allocation that has been
generated.
[0351] Empirical Results.
[0352] To evaluate the effectiveness of constraint generation, the
results of experiments on problems with bonus and per-impression
bidders described above are presented. First, perform column
generation, using MI=0.01 as the stopping condition, then extend
with constraint generation.
[0353] To avoid generating an unreasonable number of constraints,
allow a tolerance .epsilon., whereby the feasibility LP allows the
allocations from the MIP to decrease by up to .epsilon.. That is,
replace the first set of constraints in the LP by:
c .di-elect cons. .alpha. j , c = .PHI. i x c i .ltoreq. x .
.alpha. j i .A-inverted. i .di-elect cons. W ( j ) c .di-elect
cons. .alpha. j , c = .PHI. i x c i .ltoreq. x . .alpha. j i -
.A-inverted. i .di-elect cons. W ( j ) ##EQU00030##
Thus, when constraint generation terminates, the allocation is
guaranteed to be feasible, but may be suboptimal. In the
experiments, .epsilon.=0.01 was chosen, however, this is not to be
construed as limiting the invention.
[0354] For larger problems, constraint generation did not always
terminate within a reasonable amount of time. In the experiments,
if constraint generation did not terminate within 600 seconds, the
generation of constraints was stopped and a feasible allocation
that minimized the maximum difference from the MIP allocation was
generated. This was accomplished with the following LP:
min ##EQU00031## c .di-elect cons. .alpha. j , c = .PHI. i x c i
.ltoreq. x . .alpha. j i .A-inverted. i .di-elect cons. W ( j ) c
.di-elect cons. .alpha. j , c = .PHI. i x c i .ltoreq. x . .alpha.
j i - .A-inverted. i .di-elect cons. W ( j ) i .di-elect cons. W (
j ) x c i .ltoreq. s ( c ) .A-inverted. c .di-elect cons. .alpha. j
##EQU00031.2##
[0355] As discussed above, the feasibility LP could require an
exponential number of variables. In practice, if W(j) is no greater
than around 20, the size of the LP is quite reasonable (and much
smaller than 2.sup.20). If at any point the MIP gives W(j)>20,
the channel .alpha..sub.j is split. However, rather than using the
scoring function discussed above, the score(.alpha., .beta.,
.beta.)max({|{i}|: .beta.|=.phi..sup.i}, {|{i}|:
.beta.|=.phi..sup.i}) is minimized in an attempt to reduce the
maximum, over the two new channels, of the bids that care about the
channel.
[0356] When constraint generation is complete, the value of the
allocation is computed based on the final feasible allocation
generated by the LP (which might be different than that of the
final MIP allocation, due to .epsilon.), but the final (infeasible)
MIP allocation is used as an upper bound on the true optimum value.
This bound is close to, but somewhat tighter than the bound
generated above in the section entitled "Constraint
Generation".
[0357] Table 4 shows the results of experiments on the set of
problems with bonus and per-unit bidders described above in the
section entitled "Projection Into the Abstracted Space. Here, the
results are shown only for the constraint generation portion. The
table shows several key measures, including the number of
constraint generation iterations, the number of additional channels
generated and the number of constraints generated. The fraction of
the upper bound on the optimal value obtained by the MIP when
constraint generation terminates ("Frac UB") is also shown. An
estimate of the improvement in the degree of optimality over the
final column generation value is shown ("Improve"). Finally, the
average and range of runtimes is presented.
TABLE-US-00004 TABLE 4 Average results for column generation
followed by constraint generation with 100 attributes, n.sub.b
bonus bidders, and n.sub.i per-impression bidders. # # # Frac
Runtime (sec) n.sub.b n.sub.i iterations channels constraints UB
Improve .mu. range 10 40 12.7 0.1 221 .954 0.104 154 [14, 615] 20
80 12.1 2.3 557 0.939 0.118 636 [118, 1178] 30 120 9.1 4.4 750
0.965 0.190 850 [317, 1750] 40 160 5.9 7.5 787 0.954 0.157 1434
[648, 6609] 50 200 5.6 8.0 721 0.967 0.139 1419 [679, 6235] 60 240
8.4 7.6 803 0.964 0.143 1029 [635, 2269]
[0358] It can be seen that the additional phase increased the value
to a high degree of optimality, although obtaining the improvement
can be time consuming for larger problems.
[0359] Other Uses of Constraint Generation.
[0360] One of the bottlenecks in the effective use of constraint
generation is its tendency to scale poorly in the number of
"winners". Specifically, if an a-channel, time-period pair has a
large number of bids that are allocated to it in the initial
abstract MIP solve, the procedure can generate hundreds of
thousands of constraints, causing the MIP to slow down
significantly and to dominate runtime. As mentioned above, the
number of winners in the MIP can be used to suggest further channel
refinements. The development of effective channel splitting
heuristics that attempt to "separate" bids into different channels
could make constraint generation much more effective. This quick
identification of problematic a-channels during constraint
generation is critical as well: whenever a channel is split, all
the constraints on the split channel must be discarded, and new
constraints must be generated on the new channels, further
extending runtime with wasted computational effort. Thus
problematic a-channels should desirably be identified and discarded
before significant constraint generation takes place.
[0361] Constraint generation can also be used selectively. The MIP
can be solved by using the "optimistic" values on some channel-time
pairs--requiring constraint generation to effectively carve up
supply with those segments--while the random dispatch policy can be
assumed in others (e.g., those where constraint generation cannot
scale effectively). This offers a tractable means for improving on
the abstract allocation problem without necessarily accounting for
intelligent dispatch across the entire space.
[0362] Very Large Attribute Spaces.
[0363] The constraint generation approach relies on the assumption
that it is tractable to solve an LP on the non-abstracted space. If
this is not feasible, it will be necessary to fully abstract some
attributes, i.e., not resolve them in the LP. This will lead to the
dispatch issues described above. However, heuristics can be used to
decide which abstraction might limit the degree of dispatch
error.
[0364] Data Representation and Other Issues.
[0365] The implementation and practical deployment of the foregoing
techniques brings to light a number of subsidiary issues that need
to be addressed. Some of the more significant challenges to
practical deployment and they are addressed will now be
described.
[0366] Data Representation.
[0367] The foregoing approach to channel abstraction requires
manipulation of logical formulae describing both abstract channels
and bids. Furthermore, the natural and compact description of both
bids/campaigns and channel supply requires the use of logical
formulae. In problems with dozens or hundreds of channel
attributes, supply distributions cannot be expected to be
explicitly articulated for each concrete channel. Nor should
bidders be expected to specify their interests explicitly over such
concrete channels.
[0368] The foregoing data distributions make specific independence
assumptions that allow them to be represented tractably. While more
general models can be used (e.g., graphical models of distributions
such as Bayesian networks), the following simple clustering model
is adopted. The channel attribute set F is partitioned into a set
H={H.sub.i} of subsets or factors, and it is assumed an explicit
joint distribution (or potential) .psi..sub.i is provided for each
factor H.sub.i (e.g., if H.sub.i={A, B, C}, then .psi..sub.i is a
joint distribution over Dom(A, B, C)). These potentials are
independent, so the probability of any channel is
Pr(c)=.SIGMA..sub.i.psi..sub.i(c[i]), where c[i] is the restriction
of c-channel c to its attribute values in H. The supply s(c,t) of
any channel at time t is then s(c,t)=s(t) Pr(c). If impression
distributions are nonstationary, the potentials can be indexed by
time, or by time "attributes" such as day-part or week-part. Note
that the assumption of complete attribute independence is a special
case in which each factor contains a single attribute.
[0369] The implementation of channel abstraction uses ordered
Boolean decision diagrams (OBDDs) to represent logical formulae:
this includes the logical representation of bid formulae
.phi..sup.i and of abstract channels .alpha..sub.i. Given the
specification of probabilities in terms of factors and potentials,
efficient algorithms have been devised for: (a) computing the
probabilities of a formula represented as an OBDD (e.g., to compute
Pr(a) for some a-channel .alpha. in order to determine its supply);
and (b) computing the conditional probability of one OBDD given
another (e.g., to compute the probability Pr(.phi..sup.i|.alpha.)
that a-channel .alpha. satisfies bid formula .phi..sup.i).
[0370] Channel Attributes and Stochastic Supply.
[0371] The nature of useful channel attributes varies significantly
from one web site to another. This is captured by aggregating
c-channels into groups known as base channels, typically
corresponding to particular sites (or subsections of sites). Each
base channel (e.g., a specific web site) is characterized by its
total amount of supply and by the set of attributes that are
observable (i.e., attributes that are known with certainty to hold
of a particular impression, such as day-part, gender of
subscription users, etc.), stochastically verifiable (i.e.,
attributes for which a probabilistic estimate of satisfaction can
be given), or inapplicable (attributes for which no information is
available). The distribution of channels with a base channel is
specified using the method above.
[0372] It is important to note that c-channels cannot be defined
using inapplicable attributes or stochastically verifiable
attributes: for any base channel, its c-channels are the
instantiations of its observable attributes only. For instance, on
a site A that has statistical data on gender, but no means of
actually observing gender, no c-channel exists with attributes
site=A, gender=male (since one cannot assign an ad to such an
impression with certainty). The distribution of gender is used only
to predict the number of male-impressions (and hence payment) when
an ad is assigned to the c-channel site=A Similarly, if a attribute
is inapplicable, every attribute value is assumed to go
unsatisfied.
[0373] The abstraction model is presented as if supply is
deterministic. If supply is stochastic, the abstraction techniques
can be generalized using the methods described in Boutilier et al,
"Expressive Banner Add Auctions and Model-Based Online Optimization
for Clearing", AAAI, pps. 30-37, Chicago, 2008, where the results
of deterministic optimization are used in a sampling and
reoptimization framework to manage uncertainty and risk. The data
representation can easily be generalized as well: (a) replace the
point estimate of the supply s(b) of a base channel b by a
distribution (e.g., normal, or other parametric form that makes
sense and can easily be sampled from); (b) instead of a simple
multinomial for each observable attribute, specify a Dirichlet,
with hyperparameters for each domain value. This allows simple
computation of expected values for deterministic optimization, and
simple sampling for stochastic optimization.
[0374] Intelligent Dispatch.
[0375] MIP optimization for ad auctions using the standard
architecture described herein will run into difficulties as the
number of attributes grows. Very roughly, each instantiation of
attribute values can potentially correspond to another channel of
supply; thus the number of channels can grow exponentially with
attributes of interest. While true in general, a lot of aggregation
will typically be possible, especially with decent expressive
bidding languages, but exponential growth can still be envisioned.
Since each bid-channel pair (for which the channel is relevant to
the bid) requires a (real-valued) variable in the MIP corresponding
to the fraction of that supply of that channel allocationed to the
bid, this will be impractical in general.
[0376] Channel abstraction provides a means to reduce the
complexity of the decision space by aggregating certain channels of
supply and assigning fractions of these abstract channels to bids
rather than the actual channels. Herein, the term achannels will
refer to the abstract channels, and the term echannels will refer
to the underlying "exact" channels that would allow exact
optimization.
[0377] There are several problems that must be addressed to make
this work in practice without a severe performance hit in terms of
revenue, while maintaining computational feasibility: [0378] 1. a
method for determining suitable abstract channels without
compromising decision quality to dramatically (and a means for
assessing the tradeoffs between decision quality and computational
efficiency) [0379] 2. an intelligent dispatch method that allows
accounts for the fact that a bid may be allocated some portion of
an achannel, but that certain of the echannels that make up the
achannel do not contribute to the satisfaction of the bid [0380] 3.
a MIP optimization routine that accounts for the intelligent
dispatch rule in determining fractional allocations to achannels
Issue 1 has been addressed by the methods for generating
abstractions described herein.
[0381] A Simple Example.
[0382] A simple example to illustrate the points to follow: imagine
three bids b for news banner ads: b.sub.1 wants banner ads on any
major news site (MN); b.sub.2 wants banner ads on the CNN site
(CNN); b.sub.3 wants the NY Times web site (NYT). Both CNN and NYT
are subclasses of MN. Assume that the distribution of hits over
major news sites is as follows: 15% CNN, 25% NYT, 60% OMN (other
major news). These categorizations correspond to the relevant
echannels for this problem. Suppose it is decided to abstract these
echannels into one achannel called MN (Major News).
[0383] Some Terminology.
[0384] The echannels will be created dynamically according to
actual bids in place and the ability to measure the properties in
question, either in a fully or stochastically verifiable fashion.
This last property is important: if some bidder uses property P in
a bid, but this property cannot be verified or the probability with
which it occurs cannot be estimated, it is useless with respect to
channel definition. In this example, CNN, NYT, and OMN are the
appropriate echannels.
[0385] An echannel e is relevant for bid b iff an ad on the channel
contributes to the satisfaction of that bid. In this example, CNN
is relevant to b.sub.2, NYT is relevant to relevant to b.sub.3 and
all three echannels (including OMN) are relevant to b.sub.1.
[0386] An achannel a is relevant for bid b iff a contains some
echannel relevant for b. If the three echannels in this example are
abstracted into one achannel MN, then MN would be relevant to all
three bids.
[0387] Simple Intelligent Dispatch for Abstract Channels.
[0388] When optimizing with abstract channels, instead of
fractional allocations of echannels to bids, some percentage of
achannels are assigned to each bid. The qualification that channels
are assigned only to bids for which they are relevant is to be
understood: this generally reduces the number of variables to be
less than one per bid-channel pair, whether achannels or echannels.
Let p.sub.j.sup.a be the percentage or fraction of achannel a
assigned to bid b.sub.j by the optimizer.
[0389] The problem is that an allocation to an achannel must be
realized by an actual allocation to some component echannel, and
certain echannels may not be relevant to the bid even though the
achannel is. A blind dispatch policy is one that fails to
distinguish echannels, and simply assigns to b.sub.j the same
fraction p.sub.j.sup.a of each echannel that makes up achannel a.
For instance, if b.sub.2 (for CNN) was assigned 30% of MN, then it
will receive 30% of CNN, 30% of NYT and 30% of OMN. Thus, write
p.sub.2=0.3 instead of p.sub.2.sup.MN=0.3, since there is only one
achannel in this example.
[0390] Optimization under this blind dispatch policy is
straightforward namely, treat the echannel properties as if they
were stochastically verifiable and include the stochastic component
in the MIP when "counting" the impressions that satisfy the bid. In
this example, the MIP will reflect that only 15% of the MN
impressions assigned to b.sub.2 are actually relevant and thus only
"count" 0.15p.sub.2 of the total MN supply as contributing to the
satisfaction of b.sub.2.
[0391] The blind dispatch policy clearly wastes supply on bids that
cannot exploit it. The selective dispatch policy offers the obvious
fix: it assigns an echannel only to those bids for which it is
relevant, and does so in proportion to its achannel fractional
allocation. In this example, the following allocations would result
assuming that the overall (i.e., MN) echannel allocation was
p.sub.1, p.sub.2 and p.sub.3 (and a concrete example with values
0.5, 0.3 and 0.2 respectively):
TABLE-US-00005 Bid Fraction CNN Fraction NYT Fraction OMN b.sub.1 p
1 p 1 + p 2 ( 0.625 ) ##EQU00032## p 1 p 1 + p 3 ( 0.714 )
##EQU00033## p 1 p 1 ( 1.0 ) ##EQU00034## b.sub.2 p 2 p 1 + p 2 (
0.375 ) ##EQU00035## 0 0 b.sub.3 0 p 3 p 1 + p 3 ( 0.296 )
##EQU00036## 0
[0392] There are variants on the above policy as well, such as
those involving monitoring the actual supply and terminating the
allocation to bids online when certain conditions are met (e.g.,
once it reaches some target); these are not considered here and how
they impact offline optimization. However, such policies could be
implemented to use supply rationally, allowing use of the updated
bid states in the reoptimization phase.
[0393] Optimizing with Abstract Channels.
[0394] Next, consider the (offline) optimization problem for
abstracted channels when the selective dispatch rule is used. One
could obviously optimize using the stochastically verifiable
formulation above, yet still dispatch selectively. This however
means the MIP will (substantially, in practice) underestimate the
true revenue/value of its allocation and as a consequence make poor
allocation decisions. Ideally, the selectivity of the dispatch rule
would be encoded directly in the MIP. Mathematically, this is
straightforward. Suppose for example, that bid b.sub.2 (for CNN)
places value per impression on the total number of impressions up
to a max of 100K. The full echannel optimization would define the
count variable C.sub.2.sup.CNN for b.sub.2's CNN impressions
as:
C.sub.2.sup.CNN=p.sub.2.sup.CNNSize(CNN)
where Size(CNN) is the expected number of CNN impressions to be
allocated. In the abstracted formulation, where every bid gets the
same fixed proportion of each relevant echannel (i.e., the
proportion of the achannel it was allocated), the number of
impressions would be defined as follows:
C 2 CNN = p 2 p 1 + p 2 Size ( CNN ) ##EQU00037##
Similarly, the relevant variables for b.sub.1 and b.sub.3 would
be:
C 1 MN = Size ( OMN ) + p 1 p 1 + p 2 Size ( CNN ) + p 1 p 1 + p 3
Size ( NYT ) ##EQU00038## C 3 NYT = p 3 p 1 + p 3 Size ( NYT )
##EQU00038.2##
Note that these count variables do not count the number of
impressions to echannels. Rather, they simply aggregate the
relevant counts for a particular bid, thereby avoiding
combinatorial blow up associated with taking the cross product of
bid attributes. Of course, a complicated bid with multiple clauses
or conditions may have multiple variables counting different
classes of impressions, click throughs, etc. Thus the number of
variables is proportional to the number of bids times the number of
relevant achannels rather than the number of relevant
echannels.
[0395] Of course, these constraints (or in some cases terms in the
objective) are nonlinear. There are several approaches can be used
to approximate the solution to this program.
[0396] Iterative Optimization.
[0397] Note that when the usual MIP optimization is used,
allocations to specific echannels can go "unused" (at least from
the perspective of the MIP). For example, in the standard
optimization, only fraction p.sub.1+p.sub.2 of the CNN echannel
would be counted as used. The allocation of fraction p.sub.3 of
echannel CNN to b.sub.3 is "wasted".
[0398] One simple technique to overcome this involves the iterative
solution of a series of MIPs in which the unallocated or "wasted"
fraction of each echannel from the previous iteration is included
as a "bonus" (in multiplicative fashion) to the allocation at the
next iteration. For instance, suppose the MIP solution in this
example gives allocations p.sub.1=0.5, p.sub.2=0.2, p.sub.3=0.3.
This means that 30% of echannel CNN has gone unused (from the point
of view of the optimizer), in particular, the fraction allocated to
b.sub.3. Assume in the next iteration of the MIP, the allocations
p.sub.1 and p.sub.2 are boosted by
1 0.7 ##EQU00039##
when computing the fraction of the CNN channel they actually
obtain. This is equivalent to solving the MIP with the following
constraints (when considering all of the bids):
C 1 MN = Size ( OMN ) + p 1 p 1 ' + p 2 ' Size ( CNN ) + p 1 p 1 '
+ p 3 ' Size ( NYT ) ##EQU00040## C 2 CNN = p 2 p 1 ' + p 2 ' Size
( CNN ) ##EQU00040.2## C 3 NYT = p 3 p 1 ' + p 3 ' Size ( NYT )
##EQU00040.3##
where the p' terms are constants reflecting fractional allocations
at the previous iteration.
[0399] More generally, [0400] Initialize P.sup.j=1 for each
echannel j (P.sup.j refers to total fractional/percentage usage of
echannel j). [0401] Solve the MIP, defining the number of
impressions (events) assigned to bid i as
[0401] C i = j : echanneljisrelevanttobidi p i 1 P j Size ( j )
##EQU00041## [0402] (where the count variable(s) or other state
variables needed to determine the degree of satisfaction of a bid
can be broken down as finely grained as necessary, and the sum over
echannels is restricted to echannels that meet the needs of that
state variable) [0403] Set
[0403] P j = i : echannel j relevant to bid i p i ##EQU00042##
[0404] Repeat until some convergence criterion is met
[0405] An important aspect of this optimization that will likely
help is to allow the sum of the fractional allocations to exceed
1.0: specifically, the relevant constraints put on the p.sub.i are,
for each echannel j:
i : echannel j relevant to bid i p i .ltoreq. 1 ##EQU00043##
This can allow the total sum of p.sub.i's to exceed one without
actual causing infeasibility: the selective dispatch policy will
never assign more than 100% of an echannel, even with this
relaxation.
[0406] The aim of this procedure is to determine a somewhat
accurate estimate of revenue/objective value accounting for
selective dispatch. It is unclear whether this will converge, or if
it does, to something interesting. Note several important points
however: [0407] The solution obtained is always feasible: that is,
the p.sub.i's can always be implemented by the selective dispatch
policy. The objective value may be underestimated or overestimated
by the MIP (because the "boost" is based on the previous round's
p.sub.i's); but once the p.sub.i's are in hand, computing the true
objective value is trivial. Thus, should this procedure not
converge, the best solution at any round can be stored and used at
termination. [0408] Since the procedures start with an
underestimate of true usage (see how the P.sup.j's are
initialized), it is envisioned that this may actually converge to
something useful.
[0409] Running this iterative procedure in this example here
illustrates its value and its convergence. In general, convergence
to an optimal solution is not guaranteed, but it will generally
improve allocation value significantly relative to optimization
under the "random dispatch" assumption.
[0410] An Illustrative Example.
Consider the example above with the following bids:
TABLE-US-00006 Bid Conditions b.sub.1 will pay 0.10 per impression
of Major News (MN) to a maximum of 70,000 impressions b.sub.2 will
pay 0.20 per impression of CNN to a maximum of 10,000 impressions
b.sub.3 will pay 0.30 per impression of NYT to a maximum of 18,000
impressions
Recall that the distribution of hits over major news sites is as
follows: 15% CNN, 25% NYT, 60% OMN (other major news). Treating
supply as deterministic, with 100K available for the period in
question, results in:
TABLE-US-00007 Impressions eChannel available CNN 15,000 NYT 25,000
OMN 60,000
[0411] The following table shows the convergence behavior of the
iterative algorithm on this simple example. Objective value from
the MIP (actually an LP in this case) is shown as is the actual
revenue realized by the selective dispatch policy (recall that they
are not the same). The p.sub.i terms show the "global" fractional
allocation of the abstract MN channel to bid i and q.sub.i shows
the actual number of relevant impressions realized by bid b.sub.i
under the selective dispatch policy. The optimal allocation
realizes revenue $14,400.
TABLE-US-00008 Round Obj. Revenue p.sub.1 p.sub.2 p.sub.3 q.sub.1
q.sub.2 q.sub.3 1 9250 9250 0.700000 0.000000 0.300000 70000 -0
7500 2 10527 10527 0.529730 0.000000 0.470270 70000 0 11756.7 3
11348 11348 0.420214 0.000000 0.579786 70000 0 14494 4 11919 11919
0.344014 0.000000 0.655986 70000 -0 16399 5 13026 12619 0.288049
0.229342 0.482609 70000 10000 12065 6 13360 13476 0.259519 0.185607
0.554873 0000 5381 18000 7 13624 13591 0.236812 0.176825 0.586363
70000 5958 18000 8 13788 13767 0.218751 0.188564 0.592686 70000
6838 18000 9 13922 13954 0.204726 0.211040 0.584234 70000 7771
18000 10 14052 14117 0.193991 0.237958 0.568051 70000 8585 18000 11
14165 14244 0.185763 0.265567 0.548671 70000 9222 18000 12 14258
14339 0.179364 0.291844 0.528792 70000 9699 18000 13 14329 14400
0.174661 0.314809 0.510128 70000 10000 18000 14 14353 14400
0.174920 0.326316 0.496300 70000 10000 18000 15 14376 14400
0.175185 0.338425 0.486195 70000 10000 18000 16 14392 14400
0.174977 0.346066 0.478737 70000 10000 18000 17 14400 14400
0.175150 0.351254 0.473362 70000 10000 18000 18 14400 14400
0.175350 0.354823 0.469600 70000 10000 18000
[0412] Note that the MIP objective can both over and under estimate
the actual revenue. Notice also the importance of accounting for
the allocations realized by selective dispatch. For instance,
b.sub.1 nominally gets only 17% of the MN achannel, however, the
fact that no other bids compete for the echannel OMN means that it
in fact obtains all of that echannel. When this is taken into
account, it allows p.sub.1 to drop significantly, freeing up the
other echannels for use by the other two bids. Without this,
b.sub.1 would necessarily consume resources (NYT and CNN) that are
not needed, and not allowing the other two bids to exploit them
fully.
[0413] Optimizing Using Abstract Channels Under Uncertainty.
[0414] Optimization is important in ad markets, exchanges, and
allocation systems in which expressive preferences or expressive
bidding is used. For the purpose of describing the invention,
banner ads are described. However, this is not to be construed as
limiting the invention since the general principles and specific
techniques described herein can be applied mutatis mutandis to
other forms of advertising, especially online advertising, such as,
without limitation, keyword search auctions, electronic auctions of
TV and radio ads, etc.
[0415] The technical challenge for expressive ad auctions is
optimization, namely, determining the optimal allocation of ad
channels to very large numbers of complex bids in real-time. This
is further complicated by the stochastic nature of the domain both
supply (number of impressions or CTs) and demand (future bids) are
uncertain which necessitates online allocation. Three models have
been proposed to solve the problem of allocation in the presence of
uncertain supply and demand.
[0416] Solving the Problem as a Markov Decision Process (MDP).
[0417] This provides the optimal solution to the sequential
optimization problem. It has the advantage of accounting for
uncertainty and risk thus induced in precisely the right way,
assuming the goal is revenue maximization (or expected objective
value if some other objective, e.g., efficiency, is used).
Unfortunately, given the sizes of the optimization problems, the
solution of an MDP is intractable in all but the most simple ad
allocation settings.
[0418] Expectation-Based Optimization and Reoptimization.
[0419] An especially simple model is that of expectation-based
optimization. This model uses deterministic optimization using
expectations of all random variables and exploiting powerful mixed
integer programming (MIP) algorithms for expressive market
clearing. Specifically, all random quantities (e.g., ad slot
supply, click rate, future bids/demand) are fixed at their expected
(mean) values. Once this is done, a deterministic optimization
problem is solved maximizing revenue (or some other objective)
assuming these mean values will be realized. Depending on the forms
of expressiveness used, the deterministic optimization can be
formulated as a linear program (LP), a MIP, or in some other
fashion. State of the art MIP-solvers, tuned to economic allocation
problems, can solve very large problems. However, the drawback of
this approach is that such solver does not adequately account for
uncertainty and risk. Thus the quality of allocations can suffer as
there is sufficient uncertainty. This can be especially problematic
for bonus (all or nothing) advertiser bids/preferences/demands.
[0420] The problem of uncertainty can be mitigated somewhat by
periodic reoptimization. Given the realized values of uncertain
quantities (and hence actual allocations to specific advertisers)
at some time period, the optimization can be rerun to update future
allocations. However, this cannot resolve all of the problems.
[0421] Online Stochastic Optimization.
[0422] A third approach is that of online stochastic optimization.
In this model, samples are drawn from a distribution of uncertain
events, and a deterministic optimization problem, or scenario, is
constructed using each sampled realization. Each scenario is solved
and the results are aggregated to construct an approximately
optimal decision at the current period in the underlying MDP. The
method is online in that the sample-based optimization is repeated
after the current realization of uncertain events. That is, the
approach determines the expected value of the best next action (or
dispatch decision) rather than an entire policy for the MDP. This
method requires that a "combining" LP or MIP be solved that brings
together the solutions of the deterministic optimizations for each
sample. As a consequence, in the MIP case, k+1 (deterministic) MIPs
need to be solved if k samples are generated. As with
expectation-based optimization, the online stochastic optimization
is rerun periodically.
[0423] This approach has the advantage of accounting for
uncertainty and risk in a much more robust fashion than
expectation-based optimization; thus it tends to provide much
higher quality solutions (e.g., higher revenue) on average.
Furthermore it is much more tractable than a full MDP solution.
However, scalability is still limited to moderately sized problems.
It cannot solve problems approaching the size of those well-tackled
by expectation-based optimization.
[0424] Next, two techniques will be described for solving ad
allocation problems in optimization settings that have the
attractive computational properties of expectation-based
optimization, but deal with uncertainty much more accurately than
expectation-based optimization, thus realizing the attractive
performance advantage of online stochastic optimization.
[0425] Deterministic Overprovisioning.
[0426] The first technique, deterministic overprovisioning, works
as follows: for any advertiser that expresses preferences, bids or
other demand for certain sets of ad slots, an increase, or
overprovision is applied for that advertiser. Specifically,
expectation-based optimization is run using expected (mean) values
of uncertain quantities (as above), but in the optimization the
quantity of specific ad channels demanded by the advertiser is
replaced by an increased quantity. The optimization then allocates,
in expectation, more ad supply to the advertiser than demanded in
order to extract the relevant payment from the advertiser. This is
the effect on optimization only; payments will be made according to
the actual realized allocation, not this inflated figure. As in
expectation-based optimization, the optimization is rerun
periodically to account for actual realizations of uncertain
supply, demand, etc.
[0427] The benefit of this approach is that uncertainty, which can
cause an expectation-based optimizer running an optimization on a
computer, like computer 2 in FIG. 1, to make allocations that miss
"target levels" with some probability, is now mitigated to some
extent. By overprovisioning in expectation, some "negative"
variance in supply, click rate, or other uncertainty quantities can
be tolerated with sacrificing revenue (or other objective value).
This is especially important when advertisers payments are linked
to meeting specific "all or nothing" targets, when smoothness
conditions are in play, and when other complex forms of
expressiveness are used.
[0428] Apart from handling risk and uncertainty, this approach is
no more computationally demanding than expectation-based
optimization. In fact, the problems are identical in structure,
only target quantities are altered. Adjusting target quantities can
take into account variance/standard deviation and the target
adjustments can be varied over time, with each reoptimization using
a different adjustment (e.g., to account for increase or decrease
in variance).
[0429] Sample-Based Expected Revenue Optimization (SBERO).
[0430] The second technique, sample-based expected revenue
optimization (SBERO), is similar to online stochastic optimization,
with one difference. It works as follows. As in online stochastic
optimization, k samples are drawn from the distribution of
uncertain events: this gives us k trajectories or scenarios fixing
specific supply, demand, click rates, etc. for all uncertain
quantities affecting the optimization. Given a fixed allocation of
channel supply to advertisers, each scenario determines the revenue
(or value of another objective of interest) associated the
allocation (as if it were a deterministic problem). Then, solve an
optimization problem that determines the optimal allocation to
advertisers averaged over the k scenarios. A weighted average can
also be taken to reflect specific sampling strategies. Other
constructed scenarios can also be accommodated, apart from random
sampling. As in expectation-based optimization, the optimizer is
rerun periodically to account for actual realizations of uncertain
supply, demand, etc.
[0431] SBERO has the following distinction: unlike online
stochastic optimization--in which a distinct allocation of supply
must be determined for each scenario--SBERO computes a single
allocation of supply to advertisers that is used in each scenario.
In other words, SBERO computes an allocation assuming no recourse,
i.e., a policy that is oblivious to the actual realization of past
supply. This provides a computational advantage. Rather than
solving k+1 MIPS, with a total of O(kn) decision variables (where n
is the number of decision variables in a deterministic allocation),
as does online stochastic optimization; SBERO solves only a single
MIP with n decision variables. Thus, it is computationally
extremely effective. In practice, it scales roughly linearly in the
number of scenarios (i.e., if there are k scenarios, it takes
roughly k times as long to solve as a deterministic
optimization).
[0432] Despite the impressive computational performance, it handles
risk and uncertainty well. While the lack of explicit recourse
model causes it to suffer marginally relative to online stochastic
optimization--indeed preliminary experiments show little
appreciable difference in revenue in practice--this is more than
offset by its computational performance.
[0433] Herein a suite of techniques have been described based on
column and constraint generation that effectively tackle the
channel explosion problem in the optimal allocation of online ads.
The techniques apply to both simple, current forms of
expressiveness (e.g., simple budget constraints) and other, richer
forms of campaign-level expressiveness that require the solution of
large-scale integer programs. Experiments demonstrate that
high-quality allocations can be determined using very few abstract
channels in optimization. This illustrates the desirable
sensitivity of the techniques to those channel distinctions that
have the greatest impact on value (e.g., revenue or efficiency).
The techniques scale to problems with hundreds of attributes and
bidders. Given the offline nature of the optimization problem
described herein, the computational results suggest that the
techniques can be run and rerun frequently to determine, for
example, (approximately) optimal allocations in stochastic models
that require sampling.
[0434] The search for channel splits in column generation, while
effective for problems, is still quite crude. It is envisioned that
simple heuristics for variable/literal ordering based on channel
supply and bid properties could dramatically improve the
performance of column generation (since the split search, not
optimization, dominates running time). Dynamic programming may be
used in special cases (e.g., under certain independence
assumptions) to find higher quality splits than the techniques
described herein.
[0435] The invention has been described with reference to the
preferred embodiments. Obvious modifications and alterations will
occur to others upon reading and understanding the preceding
detailed description. It is intended that the invention be
construed as including all such modifications and alterations
insofar as they come within the scope of the appended claims or the
equivalents thereof.
* * * * *