U.S. patent application number 13/666544 was filed with the patent office on 2013-05-02 for finding optimum combined plans among multiple sharing arrangements and multiple data sources and consumers.
This patent application is currently assigned to NEC Laboratories America, Inc.. The applicant listed for this patent is NEC Laboratories America, Inc.. Invention is credited to Vahit Hakan Hacigumus, Jagan Sankaranarayanan, Mohamed Sarwat, Haopeng Zhang.
Application Number | 20130110575 13/666544 |
Document ID | / |
Family ID | 48173329 |
Filed Date | 2013-05-02 |
United States Patent
Application |
20130110575 |
Kind Code |
A1 |
Sankaranarayanan; Jagan ; et
al. |
May 2, 2013 |
FINDING OPTIMUM COMBINED PLANS AMONG MULTIPLE SHARING ARRANGEMENTS
AND MULTIPLE DATA SOURCES AND CONSUMERS
Abstract
Systems and methods for data sharing include merging sharing
plans of admissible sharing arrangements to provide a merged
sharing plan. A set of all possible plumbings are determined for
the merged sharing plan. A plumbing with a maximum profit is
iteratively applied, using a processor, to the merged sharing plan
for each plumbing of the set such that a staleness level is
maintained to provide an optimized sharing plan.
Inventors: |
Sankaranarayanan; Jagan;
(Santa Clara, CA) ; Hacigumus; Vahit Hakan; (San
Jose, CA) ; Sarwat; Mohamed; (Minneapolis, MN)
; Zhang; Haopeng; (Seattle, WA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
NEC Laboratories America, Inc.; |
Princeton |
NJ |
US |
|
|
Assignee: |
NEC Laboratories America,
Inc.
Princeton
NJ
|
Family ID: |
48173329 |
Appl. No.: |
13/666544 |
Filed: |
November 1, 2012 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61554157 |
Nov 1, 2011 |
|
|
|
Current U.S.
Class: |
705/7.25 |
Current CPC
Class: |
G06Q 10/06 20130101;
G06Q 10/0631 20130101; G06Q 10/101 20130101; H04L 41/5003
20130101 |
Class at
Publication: |
705/7.25 |
International
Class: |
G06Q 10/06 20120101
G06Q010/06 |
Claims
1. A method for data sharing, comprising: merging sharing plans of
admissible sharing arrangements to provide a merged sharing plan;
determining a set of all possible plumbings for the merged sharing
plan; and iteratively applying a plumbing with a maximum profit,
using a processor, to the merged sharing plan for each plumbing of
the set such that a staleness level is maintained to provide an
optimized sharing plan.
2. The method as recited in claim 1, wherein the maximum profit
includes a maximum difference between a benefit and a cost.
3. The method as recited in claim 1, wherein the sharing plans of
admissible sharing arrangements include a critical time path that
does not exceed a staleness level and a cost that does not exceed a
capacity.
4. The method as recited in claim 1, wherein merging sharing plans
of admissible sharing arrangements includes identifying
commonalities between sharing plans of admissible sharing
arrangements.
5. The method as recited in claim 1, wherein merging sharing plans
of admissible sharing arrangements includes replacing vertices and
edges performing similar operations of two or more different
sharing plans with a common set.
6. A system for data sharing, comprising: a merging module
configured to merge sharing plans of admissible sharing
arrangements to provide a merged sharing plan; the merging module
further configured to determine a set of all possible plumbings for
the merged sharing plan; and the merging module further configured
to iteratively apply a plumbing with a maximum profit, using a
processor, to the merged sharing plan for each plumbing of the set
such that a staleness level is maintained to provide an optimized
sharing plan.
7. The system as recited in claim 6, wherein the maximum profit
includes a maximum difference between a benefit and a cost.
8. The system as recited in claim 6, wherein the sharing plans of
admissible sharing arrangements include a critical time path that
does not exceed a staleness level and a cost that does not exceed a
capacity.
9. The system as recited in claim 6, wherein the merging module is
further configured to identify commonalities between sharing plans
of admissible sharing arrangements.
10. The system as recited in claim 6, wherein the merging module is
further configured to replace vertices and edges performing similar
operations of two or more different sharing plans with a common
set.
Description
RELATED APPLICATION INFORMATION
[0001] This application claims priority to provisional application
Ser. No. 61/554,157 filed on Nov. 1, 2011, incorporated herein by
reference in its entirety.
[0002] This application is related to commonly assigned U.S.
application Ser. No. ______, entitled "GENERATION AND OPTIMIZATION
OF DATA SHARING AMONG MULTIPLE DATA SOURCES AND CONSUMERS,"
Attorney Docket Number 11067A (449-255), filed concurrently
herewith, which is incorporated by reference herein in its
entirety.
BACKGROUND
[0003] 1. Technical Field
[0004] The present invention relates to data sharing and, more
specifically, to the generation and optimization of data sharing
among multiple data sources and consumers.
[0005] 2. Description of the Related Art
[0006] The ability to share data among a number of different
applications is a desired feature for businesses for many reasons,
such as increased organizational efficiency, targeted advertising,
rich user experience though data enrichment, etc. The different
applications may be hosted on the cloud, where shared data and the
cloud service provider provide computing resources to those
applications to provide seamless data sharing. There may be a large
number of sharing agreements among the data sources, who provide
the data, and the consumers, who pay for the data. Each of these
agreements may be described as a sharing plan. In this setting,
executing a sharing plan incurs a cost due to the use of
infrastructure resources, which is paid by the provider. Also, a
consumer may require a certain level of data freshness, which is
described as a service level agreement (SLA). As such, providers
seek to find sharing plans that minimize cost while satisfying
consumer SLAs.
SUMMARY
[0007] A method for data sharing includes generating at least one
sharing plan with a cheapest cost and/or a shortest execution time
for one or more sharing arrangements. Admissibility of the one or
more sharing arrangements is determined such that a critical time
path of the at least one sharing plan does not exceed a staleness
level and a cost of the at least one sharing plan does not exceed a
capacity. Sharing plans of admissible sharing arrangements are
executed while maintaining the staleness level.
[0008] A system for data sharing includes a generation module
configured to generate at least one sharing plan with a cheapest
cost and/or a shortest execution time for one or more sharing
arrangements. The generation module is further configured to
determine admissibility of the one or more sharing arrangements
such that a critical time path of the at least one sharing plan
does not exceed a staleness level and a cost of the at least one
sharing plan does not exceed a capacity. A sharing executor module
is configured to execute sharing plans of admissible sharing
arrangements while maintaining the staleness level.
[0009] A method for data sharing includes merging sharing plans of
admissible sharing arrangements to provide a merged sharing plan. A
set of all possible plumbings is determined for the merged sharing
plan. A plumbing with a maximum profit is iteratively applied to
the merged sharing plan for each plumbing of the set such that a
staleness level is maintained to provide an optimized sharing
plan.
[0010] A system for data sharing includes a merging module
configured to merge sharing plans of admissible sharing
arrangements to provide a merged sharing plan. The merging module
is further configured to determine a set of all possible plumbings
for the merged sharing plan. The merging module is further
configured to iteratively apply a plumbing with a maximum profit to
the merged sharing plan for each plumbing of the set such that a
staleness level is maintained to provide an optimized sharing
plan.
[0011] These and other features and advantages will become apparent
from the following detailed description of illustrative embodiments
thereof, which is to be read in connection with the accompanying
drawings.
BRIEF DESCRIPTION OF DRAWINGS
[0012] The disclosure will provide details in the following
description of preferred embodiments with reference to the
following figures wherein:
[0013] FIG. 1A is a block/flow diagram showing a system/method of
data sharing among multiple data sources and consumers in
accordance with one embodiment;
[0014] FIG. 2A is a block/flow diagram showing a method for
generation and optimization of data sharing among multiple data
sources and consumers in accordance with one embodiment; and
[0015] FIG. 3A is a block/flow diagram showing a method for
determining optimum combined plans among multiple sharing
arrangements in accordance with one embodiment.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0016] In accordance with the present principles, systems and
methods for the generation and optimization of data sharing among
multiple data sources and consumers are provided. For each sharing
arrangement in a set of sharing arrangements, a sharing plan with a
cheapest dollar cost is generated. It is then determined whether
that particular sharing arrangement is admissible. The sharing
arrangement is admissible where a critical time path of the sharing
plan with the cheapest dollar cost does not exceed a staleness
level (e.g., service level agreement) and a cost of the sharing
plan with the cheapest dollar cost does not exceed a capacity.
[0017] If the sharing arrangement for the sharing plan with the
cheapest dollar cost is not admissible, a sharing plan with a
smallest time path is generated. It is determined whether the
sharing arrangement for the sharing plan with the smallest time
path is admissible. If it is not admissible, the sharing
arrangement is rejected. Sharing plans for admitted sharing
arrangements may be provided to a sharing executor. Advantageously,
multiple sharing plans may be executed simultaneously.
[0018] In one embodiment, the sharing plans for admitted sharing
arrangements may be optimized before being provided to the sharing
executor. The sharing plans are first merged to create a merged
sharing plan. A set of all possible plumbings that may be performed
on the merged sharing plan is determined. The plumbing in the set
with the maximum profit is iteratively applied to the merged
sharing plan for each plumbing in the set. The optimized sharing
plan may be provided to the sharing executor.
[0019] Embodiments described herein may be entirely hardware,
entirely software or including both hardware and software elements.
In a preferred embodiment, the present invention is implemented in
software, which includes but is not limited to firmware, resident
software, microcode, etc.
[0020] Embodiments may include a computer program product
accessible from a computer-usable or computer-readable medium
providing program code for use by or in connection with a computer
or any instruction execution system. A computer-usable or computer
readable medium may include any apparatus that stores,
communicates, propagates, or transports the program for use by or
in connection with the instruction execution system, apparatus, or
device. The medium can be magnetic, optical, electronic,
electromagnetic, infrared, or semiconductor system (or apparatus or
device) or a propagation medium. The medium may include a
computer-readable storage medium such as a semiconductor or solid
state memory, magnetic tape, a removable computer diskette, a
random access memory (RAM), a read-only memory (ROM), a rigid
magnetic disk and an optical disk, etc.
[0021] A data processing system suitable for storing and/or
executing program code may include at least one processor coupled
directly or indirectly to memory elements through a system bus. The
memory elements can include local memory employed during actual
execution of the program code, bulk storage, and cache memories
which provide temporary storage of at least some program code to
reduce the number of times code is retrieved from bulk storage
during execution. Input/output or I/O devices (including but not
limited to keyboards, displays, pointing devices, etc.) may be
coupled to the system either directly or through intervening I/O
controllers.
[0022] Network adapters may also be coupled to the system to enable
the data processing system to become coupled to other data
processing systems or remote printers or storage devices through
intervening private or public networks. Modems, cable modem and
Ethernet cards are just a few of the currently available types of
network adapters.
[0023] Referring now to the drawings in which like numerals
represent the same or similar elements and initially to FIG. 1A, a
block/flow diagram showing a system for data sharing among multiple
data sources and consumers 100 is illustratively depicted in
accordance with one embodiment. The data sharing system 102
preferably includes one or more processors 106 and memory 104 for
storing programs and applications. It should be understood that the
functions and components of the system 102 may be integrated into
one or more systems.
[0024] The system 102 may include a display 108 for viewing. The
display 108 may also permit a user to interact with the system 102
and its components and functions. This may be further facilitated
by a user interface 110, which may include a keyboard, mouse,
joystick, or any other peripheral or control to permit user
interaction with the system 102.
[0025] Data sharing system 102 receives input 120, which includes a
set of sharing arrangements 122. Memory 104 includes sharing
optimizer module 112, which includes generation module 114. For
each sharing arrangement of the set of sharing arrangements 122,
the generation module 114 is configured to generate several
different sharing plans that implement the sharing arrangement. The
goal of the sharing optimizer module 112 is to produce a sharing
plan that is admissible, has a low cost to setup, and can be
maintained by the system at the desired level of staleness. Sharing
plans are preferably expressed in terms of vertices and edges
forming a directed acyclic graph (DAG).
[0026] For each sharing arrangement in a set of sharing
arrangements 122, the generation module 114 is configured to
generate a sharing plan with the cheapest dollar cost. The cost of
a sharing plan, expressed in dollars per second, is computed as the
amount of machine, network, and disk capacity consumed per second
to keep the sharing arrangement at the desired staleness level. The
cost may be expressed as the sum of a static cost, which represents
an initial investment to set up derived relations, and a dynamic
cost, which represents the expense incurred to move tuples through
the edges of a sharing plan. The static cost of a sharing plan is
converted to dollars per second by dividing each cost component
(e.g., machine, network, disk, etc.) by a recoup constant (e.g. per
hour, per month, per gigabyte, etc.). The dynamic cost is computed
in terms of the number of tuples stored, moved across the network
and the machine capacity consumed in generating and moving the
tuples per second through the edges in the sharing plan.
[0027] The generation module 114 then determines whether the
sharing arrangement for the sharing plan with the cheapest dollar
cost is admissible. The admissibility forms a hard constraint in
that the sharing generation module 114 should not admit a sharing
arrangement that cannot be handled by the system 102. Thus, sharing
plans that have a critical time path greater than the staleness
cannot be maintained by the system 102 at the desired staleness
level and are therefore not admissible. The critical time path
represents the longest path in terms of time taken to push tuples
from source vertices of the sharing plan to the destination vertex.
Similarly, if a sharing plan exceeds the capacity of a machine by
virtue of placing too many vertices and edge on it, it is also not
admissible.
[0028] If the sharing arrangement for the sharing plan with the
cheapest dollar cost is admissible, the generation module 114 moves
on to the next sharing arrangement in the set of sharing
arrangements 122. If the sharing plan with the cheapest dollar cost
is not admissible, the generation module 114 generates a sharing
plan with the smallest time path for that sharing arrangement. In
some embodiments, a user may choose whether to generate a sharing
plan with a cheapest dollar cost or a sharing plan with the
smallest time path. The smallest time path is determined based on
the critical time path.
[0029] If the sharing arrangement for the sharing plan with the
smallest time path is admissible, then the generation module 114
moves on to the next sharing arrangement in the set of sharing
arrangements 122. If the sharing arrangement for the sharing plan
with the smallest time path is not admissible, the sharing
arrangement is rejected and the generation module 114 moves on to
the next sharing arrangement of the set 122. Rejected sharing
arrangements may involve further negotiation with the consumer. The
generation module 114 thus provides sharing plans for admitted
sharing arrangements.
[0030] In one embodiment, sharing optimizer module 112 may also
include merging module 116 configured to merge the set of sharing
plans after admittance by taking advantage of the commonalities
between sharing arrangements. Merging module 116 merges the sharing
plans to create a single sharing plan D. A set V of all possible
plumbings that can be performed in D is determined. A plumbing
generally refers to the action of providing an alternate yet
identical input to an operator using a mechanism that is different
from the one currently providing input to it. More specifically, a
plumbing determines commonalities between two or more sharing plans
and merges the two or more sharing plans, discarding all operators
from one or more of the sharing plans prior to the commonality.
[0031] The plumbing operation in V that provides the maximum profit
(i.e., maximum benefit-cost) while not violating the staleness SLA
of any of the sharing arrangements is performed on the sharing plan
D. When no more plumbing operations in the set V can be applied to
D, the sharing plan is forwarded to the sharing executor module
118. Advantageously, the merging module 116 iteratively optimizes
the commonalities to find a global optimum cost with combined
sharing plans.
[0032] Memory 104 also includes sharing executor module 118. For
the set S of sharing arrangements and the sharing plan D produced
by the merging module 116, the sharing executor module 118 executes
D in the most efficient manner to maximize profit (by reducing
operating cost) for the provider, while maintaining the desired
staleness level. The present principles provide low cost of
delivering data sharing services for the service providers and SLA
guarantees for customers.
[0033] Referring now to FIG. 2A, a block/flow diagram showing a
method for generation and optimization of data sharing among
multiple data sources and consumers 200 in accordance with one
embodiment. In block 202, a set of sharing arrangements S is
provided. In block 204, for each sharing arrangement in the set S,
the sharing plan with the cheapest dollar cost is generated in
block 206. The cost may include the amount of machine, network, and
disk capacity consumed per second to maintain the sharing
arrangement at the desired staleness level.
[0034] In block 208, it is determined whether the sharing
arrangement for the sharing plan with the cheapest dollar cost is
admissible. A sharing arrangement is admissible if, e.g., the cost
of its sharing plan does not exceed the capacity of the machine
(e.g., cost is not .infin.) and the critical time of the sharing
plan does not exceed the desired staleness level. The critical time
represents the longest path in terms of time taken to push tuples
from source vertices of the sharing plan to the destination vertex.
Other admissibility constraints are also contemplated. If the
sharing arrangement for the sharing plan with the cheapest dollar
cost is admissible, the method moves on to the next sharing
arrangement in S in block 202.
[0035] If the sharing arrangement for the sharing plan with the
cheapest dollar cost is not admissible, in block 212, the sharing
plan with the smallest time path is generated for the sharing
arrangement. The smallest time path is preferably determined based
on the critical time path. In block 124, it is determined whether
the sharing arrangement for the sharing plan with the smallest time
path is admissible. If the sharing arrangement for the sharing plan
with the smallest time path is admissible, the method moves on to
the next sharing arrangement in S in block 202. If the sharing
arrangement for the sharing plan with the smallest time path is not
admissible, in block 216, the sharing arrangement is rejected and
the method moves on to the next sharing arrangement in S. In some
embodiments, a user may choose whether to generate a sharing plan
with the cheapest dollar cost or a sharing plan with the smallest
time path.
[0036] Once sharing plans for each sharing arrangement in S has
been generated, in block 210, the sharing plans for the admissible
sharing arrangements are provided. In block 218, the sharing plans
are forwarded to the sharing executor. Preferably, the sharing
executor simultaneously executes the sharing plans. In other
embodiment, the sharing plans for the admissible sharing
arrangements in block 210 are combined prior to be sent to the
sharing executor, as will be discussed with respect to FIG. 3A.
[0037] Referring now to FIG. 3A, a block/flow diagram showing a
method for determining optimum combined plans among multiple
sharing arrangements 300 is illustratively depicted in accordance
with one embodiment. In block 302, sharing plans for admissible
sharing arrangements are provided. Sharing plans for admissible
sharing arrangements may be generated as discussed with respect to
FIG. 2A. Other methods of sharing plan generation are also
contemplated. In block 304, the sharing plans for admissible
sharing arrangements are merged to create a single sharing plan D.
In block 306, a set V of all possible plumbings that can be
performed in D is computed. Plumbings combine vertices belonging to
different sharing arrangements so that rather than retaining two
separate sets of vertices and edges, a merged set is provided.
Plumbings may include, e.g., copy plumbing and join plumbing. Other
types of plumbings are also contemplated.
[0038] In block 308, it is determined whether the set of possible
plumbings V is empty. If the set V is not empty, in block 310, the
plumbing in V with the maximum profit is performed (e.g., maximum
benefit-cost). In block 312, the plumbing is performed in the
merged sharing plan D. In block 314, D is appropriately fixed by
merging the commonality and discarding operators of one or more
sharing plans. The method then returns to block 306 until the set
of all possible plumbings V is empty in block 308.
[0039] Once the set of all possible plumbings V is empty, the
sharing plan is forwarded to the sharing executor in block 316.
Advantageously, the present principles iteratively optimize the
defined commonalities to find a global optimum cost with combined
sharing plans.
[0040] Having described preferred embodiments of a system and
method for finding optimum combined plans among multiple sharing
arrangements and multiple data sources and consumers (which are
intended to be illustrative and not limiting), it is noted that
modifications and variations can be made by persons skilled in the
art in light of the above teachings. It is therefore to be
understood that changes may be made in the particular embodiments
disclosed which are within the scope of the invention as outlined
by the appended claims. Additional information is provided in
Appendix A to the application. Having thus described aspects of the
invention, with the details and particularity required by the
patent laws, what is claimed and desired protected by Letters
Patent is set forth in the appended claims.
* * * * *