U.S. patent application number 10/768563 was filed with the patent office on 2005-08-04 for method and apparatus for utility-based dynamic resource allocation in a distributed computing system.
Invention is credited to Das, Rajarshi, Kephart, Jeffrey Owen, Tesauro, Gerald James, Walsh, William Edward.
Application Number | 20050172291 10/768563 |
Document ID | / |
Family ID | 34807905 |
Filed Date | 2005-08-04 |
United States Patent
Application |
20050172291 |
Kind Code |
A1 |
Das, Rajarshi ; et
al. |
August 4, 2005 |
Method and apparatus for utility-based dynamic resource allocation
in a distributed computing system
Abstract
In one embodiment, the present invention is a method for
allocation of finite computational resources amongst multiple
entities, wherein the method is structured to optimize the business
value of an enterprise providing computational services. One
embodiment of the inventive method involves establishing, for each
entity, a service level utility indicative of how much business
value is obtained for a given level of computational system
performance. The service-level utility for each entity is
transformed into a corresponding resource-level utility indicative
of how much business value may be obtained for a given set or
amount of resources allocated to the entity. The resource-level
utilities for each entity are aggregated, and new resource
allocations are determined and executed based upon the
resource-level utility information. The invention is thereby
capable of making rapid allocation decisions, according to
time-varying need or value of the resources by each of the
entities.
Inventors: |
Das, Rajarshi; (New
Rochelle, NY) ; Kephart, Jeffrey Owen; (Cortlandt
Manor, NY) ; Tesauro, Gerald James;
(Croton-on-Hudson, NY) ; Walsh, William Edward;
(New York, NY) |
Correspondence
Address: |
Moser, Patterson & Sheridan
Suite 100
595 Shrewsbury Avenue
Shrewsbury
NJ
07702
US
|
Family ID: |
34807905 |
Appl. No.: |
10/768563 |
Filed: |
January 30, 2004 |
Current U.S.
Class: |
718/104 |
Current CPC
Class: |
G06F 9/5083 20130101;
G06F 9/5027 20130101 |
Class at
Publication: |
718/104 |
International
Class: |
G06F 009/46 |
Claims
1. An automated method for allocating resources among a plurality
of resource-using computational entities in a data processing
system, the method comprising: establishing a service-level utility
for each of said plurality of resource-using entities; and
transforming said service-level utility into a resource-level
utility for each of said plurality of resource-using entities.
2. The method of claim 1, wherein the service-level utility is
representative of an amount of business value obtained by each of
said plurality of resource-using entities for various levels of
performance and demand associated with the resource-using
entity.
3. The method of claim 1, wherein the resource-level utility is
representative of an amount of business value obtained by each of
said plurality of resource-using entities when a quantity of said
resources is allocated to the resource-using entity.
4. The method of claim 1, further comprising the steps of:
aggregating said resource-level utilities of all of said plurality
of resource-using entities; and computing a resource allocation
from the aggregated utility information.
5. The method of claim 4, further comprising the step of: executing
and conveying to the plurality of resource-using entities said
resource allocation.
6. The method of claim 1, wherein at least one of said plurality of
resource-using entities operates to set its internal parameters, or
an adjustable parameter of the resources the resource-using entity
has been allocated so as to optimize the service-level utility, the
resource-level utility, or both.
7. The method of claim 3, wherein the resource-level utility
indicates, for at least one of said plurality of resource-using
entities, a current utility based on current state descriptions of
said at least one resource-using entity.
8. The method of claim 3, wherein the resource-level utility
indicates, for at least one of said plurality of resource-using
entities, an estimated cumulative discounted or undiscounted future
utility starting from current state descriptions of said at least
one resource-using entity.
9. The method of claim 8, wherein the estimated cumulative
discounted or undiscounted future utility is based, for at least
one of said plurality of resource-using entities, upon predictions
of future state descriptions of said at least one resource-using
entity.
10. The method of claim 8, wherein the estimated cumulative
discounted or undiscounted future utility is trained on a temporal
sequence of observed data using an adaptive machine learning
procedure.
11. The method of claim 10, wherein the machine learning procedure
is a reinforcement learning procedure.
12. The method of claim 11, wherein the reinforcement learning
procedure is Q-Learning, Temporal Difference Learning, R-Learning
or SARSA.
13. The method of claim 8, wherein the estimated cumulative
discounted or undiscounted future utility is trained on a temporal
sequence of observed data using a time-series prediction
method.
14. The method of claim 4, wherein the step of aggregating said
resource-level utilities of all of said plurality of resource-using
entities is initiated by said plurality of resource-using
entities.
15. The method of claim 4, wherein the step of aggregating said
resource-level utilities of all of said plurality of resource-using
entities is initiated by at least one resource arbiter adapted to
compute said resource allocation from the aggregated utility
information.
16. The method of claim 4, wherein the step of computing a resource
allocation from the aggregated utility information comprises
executing an optimization method to maximize a total utility of
said data processing system.
17. The method of claim 16, wherein said optimization method
comprises a standard linear or nonlinear algorithm.
18. The method of claim 17, wherein said optimization method is
hill climbing, simulated annealing, linear programming or
mixed-integer programming.
19. The method of claim 4, wherein the step of computing a resource
allocation from the aggregated utility information comprises
computing a cost that may be incurred in reallocating at least one
of said resources from one of said plurality of resource-using
entities to another.
20. The method of claim 1, wherein the resource-level utility is a
function of client demand received by one of said plurality of
resource-using entities and of a service-level agreement governing
the performance of said one of said plurality of resource-using
entities.
21. A computer readable medium containing an executable program for
allocating resources among a plurality of resource-using
computational entities in a data processing system, where the
program performs the steps of: establishing a service-level utility
for each of said plurality of resource-using entities; and
transforming said service-level utility into a resource-level
utility for each of said plurality of resource-using entities.
22. The computer readable medium of claim 21, wherein said program
further performs the steps of: aggregating said resource-level
utilities of all of said plurality of resource-using entities; and
computing a resource allocation from the aggregated utility
information.
23. The computer readable medium of claim 22, wherein said program
further performs the step of: executing and conveying to the
plurality of resource-using entities said resource allocation.
24. The computer readable medium of claim 21, wherein at least one
of said plurality of resource-using entities operates to set its
internal parameters, or an an adjustable parameter of the resources
the resource-using entity has been allocated so as to optimize the
service-level utility, the resource-level utility, or both.
25. The computer readable medium of claim 21, wherein the
resource-level utility indicates an estimated cumulative discounted
or undiscounted future utility starting from current state
descriptions of said plurality of resource-using entities.
26. The computer readable medium of claim 21, wherein the
resource-level utility indicates a current utility based on current
state descriptions of said plurality of resource-using
entities.
27. The computer readable medium of claim 21, wherein the
resource-level utility indicates an estimated cumulative discounted
or undiscounted future utility starting from current state
descriptions of said plurality of resource-using entities.
28. The computer readable medium of claim 23, wherein the step of
computing a resource allocation from the aggregated utility
information comprises executing an optimization algorithm to
maximize a business value of said data processing system.
29. A data processing system, comprising: a plurality of entities
adapted for processing client demands; a plurality of resources
adapted for allocation to said plurality of entities; and at least
one resource arbiter adapted for allocating said plurality of
resources among said plurality of entities in a manner that
optimizes a business value of the data processing system.
30. The data processing system of claim 25, wherein said plurality
of entities are further adapted for transforming a respective
service-level utility function into a corresponding service-level
utility function.
31. The data processing system of claim 25, wherein said plurality
of entities and said at least one resource arbiter are run on a
single computer.
32. The data processing system of claim 25, wherein said plurality
of entities and said at least one resource arbiter are run on
different computers connected by a network.
33. The data processing system of claim 25, wherein said plurality
of entities and said at least one resource arbiter are software
modules comprising autonomic elements.
34. The data processing system of claim 25, wherein the data
processing system is a server, a client computer or a network.
Description
BACKGROUND
[0001] The present invention relates generally to data processing
systems, and relates more particularly to the management of
hardware and software components of data processing systems.
Specifically, the present invention provides a method and apparatus
for automatic allocation of computing resources amongst multiple
entities that obtain value by utilizing the resources to perform
computation.
[0002] The problem of how to optimally allocate a limited set of
resources amongst multiple entities that use or consume the
resources has been extensively studied in disciplines including
economics, manufacturing, telecommunications networks, and
computing systems. Within the latter domain, the recent evolution
of highly interconnected, rapidly changing, distributed computing
systems such as the Internet has made it increasingly important to
be able to rapidly compute and execute resource allocation
decisions in an automated fashion.
[0003] Traditional approaches to provisioning and capacity planning
typically aim to achieve an external value of some overall system
performance metric (e.g., maximum average throughput or minimum
average response time). Other conventional techniques employ
market-based mechanisms for resource allocation (e.g., auction
bidding or bilateral negotiation mechanisms). For example, a
commonly used approach has been to anticipate the maximum possible
load on the system, and then perform one-time static allocation of
resources capable of handling the maximum load within a specified
margin of safety. A common problem with such approaches is that,
with modern workloads such as hit rates on Web pages, the demand
rate may vary dynamically and rapidly over many orders of
magnitude, and a system that is statically provisioned for its peak
workload may spend nearly all its time sitting idle.
[0004] Thus, there is a need in the art for a method and apparatus
for dynamic resource allocation in distributed computing
systems.
SUMMARY OF THE INVENTION
[0005] In one embodiment, the present invention is a method for
optimal and automatic allocation of finite resources (e.g.,
hardware or software that can be used within any overall process
that performs computation) amongst multiple entities that can
provide computational services given the resource(s). One
embodiment of the inventive method involves establishing, for each
entity, a service level utility indicative of how much business
value is obtained for a given level of computational system
performance and for a given level of demand for computing service.
Each entity is capable of transforming its respective service-level
utility into a corresponding resource-level utility indicative of
how much business value may be obtained for a given set or amount
of resources allocated to the entity. The resource-level utilities
for each entity are aggregated, and resource allocations are
subsequently determined and executed based upon the dynamic
resource-level utility information established. The invention is
thereby capable of making rapid allocation decisions, according to
time-varying need or value of the resources by each of the
entities. In addition, the inventive method is motivated by the
perspective of an enterprise comprising multiple entities that use
said finite computational resources to provide service to one or
more customers, and is thus structured to optimize the business
value of the enterprise.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] So that the manner in which the above recited embodiments of
the invention are attained and can be understood in detail, a more
particular description of the invention, briefly summarized above,
may be obtained by reference to the embodiments thereof which are
illustrated in the appended drawings. It is to be noted, however,
that the appended drawings illustrate only typical embodiments of
this invention and are therefore not to be considered limiting of
its scope, for the invention may admit to other equally effective
embodiments.
[0007] FIG. 1 is a diagram of a networked data processing system in
which the present invention may be implemented;
[0008] FIG. 2 is an overall view of a resource allocation system in
accordance with one embodiment of the present invention;
[0009] FIG. 3 is a flow chart illustrating one embodiment of a
method for dynamically allocating resources among multiple
application environments;
[0010] FIG. 4 is a diagram illustrating the detailed functionality
of an application environment module which constitutes a component
of the overall system shown in FIG. 2; and
[0011] FIG. 5 is a high level block diagram of the present
invention implemented using a general purpose computing device.
[0012] To facilitate understanding, identical reference numerals
have been used, where possible, to designate identical elements
that are common to the figures.
DETAILED DESCRIPTION
[0013] In one embodiment, the present invention is a method for
optimal and automatic allocation of finite resources amongst
multiple entities that can perform computational work given the
resource(s). For the purposes of the present invention, the term
"resource" may indicate an entire hardware or software component
(e.g., a compute server, a storage device, a RAM circuit or a
database server), or a portion of a component (e.g., bandwidth
access or a fraction of a server). The method may be implemented,
for example, within a data processing system such as a network, a
server, or a client computer. The invention is capable of making
allocation decisions in real time, according to time-varying need
or value of the resources by each of the entities, thereby
resolving the shortcomings associated with typical static resource
allocation techniques. In addition, the method is structured to
optimize the business value of an enterprise that provides
computing services to multiple entities using said finite
computational resources.
[0014] FIG. 1 is a schematic illustration of one embodiment of a
network data processing system 100 comprising a network of
computers (e.g., clients) in which the present invention may be
implemented. The network data processing system 100 includes a
network 102, a server 104, a storage unit 106 and a plurality of
clients 108, 110 and 112. The network 102 is the medium used to
provide communications links between the server 104, storage unit
106 and clients 108, 110, 112 connected together within network
data processing system 100. The network 102 may include
connections, such as wire, wireless communication links, or fiber
optic cables.
[0015] In the embodiment illustrated, the server 104 provides data,
such as boot files, operating system images, and applications to
the clients 108, 110, 112 (i.e., the clients 108, 110, and 112 are
clients to server 104). The clients 108, 110, and 112 may be, for
example, personal computers or network computers. Although the
network data processing system 100 depicted in FIG. 1 comprises a
single server 104 and three clients, 108, 100, 112, those skilled
in the art will recognize that the network data processing system
100 may include additional servers, clients, and other devices not
shown in FIG. 1.
[0016] In one embodiment, the network data processing system 100 is
the Internet, with the network 102 representing a worldwide
collection of networks and gateways that use the Transmission
Control Protocol/Internet Protocol (TCP/IP) suite of protocols to
communicate with one another. In further embodiments, the network
data processing system 100 is implemented as an intranet, a local
area network (LAN), or a wide area network (WAN). Furthermore,
although FIG. 1 illustrates a network data processing system 100 in
which the method of the present invention my be implemented, those
skilled in the art will realize that the present invention may be
implemented in a variety of other data processing systems,
including servers (e.g., server 104) and client computers (e.g.,
clients 108, 110, 112). Thus, FIG. 1 is intended as an example, and
not as an architectural limitation for the present invention.
[0017] FIG. 2 is a schematic illustration of one embodiment of a
data center 200 for executing the method of the present invention.
The data center 200 comprises a plurality of application
environment modules 201, 202, and 203, one or more resource
arbiters 204 and a plurality of resources 205, 206, 207, 208 and
209. Each application environment module 201-203 is responsible for
handling respective demands 213, 214 and 215 (e.g., requests for
information processing services) that may arrive from a particular
customer or set of clients (e.g., clients 108-112 in FIG. 1).
Example client types include: online shopping services, online
trading services, and online auction services.
[0018] In order to process client demands 213, 214 or 215, the
application environments 201-203 may utilize the resources 205-209
within the data center 200. As each application environment 201-203
is independent from the others and provides different services,
each application environment 201-203 has its own set of resources
205-209 at its disposal, the use of which must be optimized to
maintain the appropriate quality of service (QoS) level for the
application environment's clients. An arrow from an application
environment 201-203 to a resource 205-209 denotes that the resource
205-209 is currently in use by the application environment 201-203
(e.g., in FIG. 2, resource 205 is currently in use by application
environment 201). An application environment 201-203 also makes use
of data or software objects, such as respective Service Level
Agreements (SLAs) 210, 211 and 212 with its clients, in order to
determine its service-level utility function U(S,D). An example SLA
210-212 may specify payments to be made by the client based on mean
end-to-end response time averaged over, say, a five-minute time
interval. Additionally the client workload may be divided into a
number of service classes (e.g., Gold, Silver and Bronze), and the
SLA 210-212 may specify payments based on details of response time
characteristics within each service class.
[0019] Each application environment 201-203 is in further
communication with the resource arbiter module 204. Although the
data center 200 illustrated in FIG. 2 utilizes only one resource
arbiter 204, those skilled in the art will appreciate that multiple
resource arbiters may be implemented in the data center 200. The
resource arbiter 204 is responsible for deciding, at any given time
while the data center 200 is in operation, which resources 205-209
may be used by which application environments 201-203. In one
embodiment, the application environments 201-203 and resource
arbiter 204 are software modules consisting of autonomic elements
(e.g., software components that couple conventional computing
functionality with additional self-management capabilities), for
example written in Java.TM., and communication between modules
201-203 and 204 takes place using standard Java interfaces. The
modules 201-203 and 204 may run on a single computer or on
different computers connected by a network such as the Internet or
a Local Area Network (LAN), e.g., as depicted in FIG. 1. In the
networked case, communication may additionally employ standard
network communication protocols such as TCP/IP and HTTP, and
standard Web interfaces such as OGSA.
[0020] FIG. 3 is a flow chart illustrating the method 300 by which
the resource arbiter 204 makes resource allocation decisions.
Referring simultaneously to FIGS. 2 and 3, the method 300 is
initialized at block 302 and proceeds to block 304, where the
method 300 establishes a service-level utility function U(S, D) for
each application environment 201-203. In one embodiment, the
variable S is a vector that characterizes the multiple performance
measures for multiple service classes, and the variable D is a
vector that characterizes the demand. The service level utility
indicates how much business value U is obtained by the application
environment 201, 202 or 203 for various levels S of computational
system performance, and for a given level D of demand 213-215 for
computing service.
[0021] In one embodiment, the service-level utility function U(S,
D) is established by the application environment's SLA 210-212.
While each application environment's service-level utility may be
based on different performance metrics, all of the service-level
utility functions U(S, D) share a common scale of valuation.
[0022] In block 306, the method 300 transforms the service-level
utility function U(S, D) into a resource-level utility function
V(R) for each application environment 201-203. The resource level
utility indicates how much business value V is obtained for a given
actual or hypothetical set or amount of resources R (e.g., selected
from resources 205-209) allocated to the application environment
201-203. In one embodiment, R is a vector. For example, the utility
information may express a utility curve V(m), the utility obtained
from being able to use m compute servers, at various values of m
ranging from 0 to the total number of compute servers within the
data center. Additionally if the servers are of different types,
the utility information may express the value of obtaining m
servers of type A, n servers of type B, etc. More generally the
utility information may express V({x}), the value of assigning a
particular collection or set {X} of resources 205-209, for various
sets {x} ranging over the power set of possible resources 205-209
that could be assigned to the application environment 201-203. The
utility information may be expressed, for example, in a
parameterized functional form, or it may also be expressed in terms
of values at a set of discrete points which may represent a subset
or complete set of all possible resource levels that could be
provided.
[0023] The transformation may additionally depend on a set of
variables describing the application environment's current state
(e.g., current demand 213-215, system load, throughput or average
response time), or on differences between a hypothetical resource
allocation R and the application environment's current resource
allocation R* (e.g., in a manner that reflects any costs associated
with switching the allocation from R* to R, including delays,
machine downtime, etc.). In one embodiment, the resource-level
utility function is calculated according to the relation
V.sub.i(R.sub.i)=U.sub.i(S.sub.i, D.sub.i, R.sub.i) (EQN. 1)
[0024] such that S.sub.i .epsilon.S.sub.i (R.sub.i, D.sub.i), where
S.sub.i (R.sub.i, D.sub.i) is a relation specifying the set of
service levels attainable with resources R.sub.i and demand
D.sub.i. In one embodiment, the relation S.sub.i (R.sub.i, D.sub.i)
is obtained by standard computer systems modeling techniques (e.g.,
queuing theory). In another embodiment, the relation S.sub.i
(R.sub.i, D.sub.i) may instead or additionally be refined by
training on a collection of observed system performance data
{(S.sub.t, R.sub.t, D.sub.t)} using standard machine learning
procedures (e.g., supervised learning methods employing standard
linear or nonlinear function approximators).
[0025] In one embodiment, the resource-level utility function V(R)
estimates the current value of the current state. In another
embodiment, the resource-level utility function estimates the
expected cumulative discounted or undiscounted future value
starting from the current state. In one embodiment, any one or more
of a number of standard methodologies may be employed in the
process of estimating expected future value, including prediction
and forecasting methodologies such as time-series prediction
methods and machine learning methodologies such as reinforcement
learning algorithms (e.g., Q-Learning, Temporal Difference
Learning, R-Learning or SARSA).
[0026] In block 308, the method 300 communicates the respective
resource-level utility functions for each application environment
201-203 to the resource arbiter 204 and aggregates all resource
level utility functions. In one embodiment, while the data center
200 is running, from time to time each application environment
201-203 communicates to the resource arbiter 204 information
regarding its current resource-level utility function. Said
communication may take place either synchronously or
asynchronously, and may be initiated by the application
environments 201-203, or may be in response to a prompt or query
issued by the resource arbiter 204.
[0027] In block 310, the method 300, having received resource-level
utility information from each application environment 201-203,
combines said utility information and thereupon decides how to
assign each available resource 205-209 in the data center 200, in a
manner that optimizes the total utility obtained. In other words,
the resource arbiter 204 maximizes the sum of the resource-level
utilities, 1 max RR i V i ( R i ) .
[0028] Said resource assignment may include the possibility of a
null assignment, (i.e., the resource 205-209 is not assigned to any
application environment 201-203) so that the resource 205-209 may
be kept in reserve to handle future workload. For example, in the
case of undifferentiated compute servers within the data center
200, the resource arbiter 204 may utilize the most recent utility
curves from each application environment 201-203 (V.sub.1 (m),
V.sub.2 (m) and V.sub.3 (m) respectively), and then compute an
integral number of servers (m.sub.1, m.sub.2, m.sub.3) to assign to
each application environment 201-203 so as to maximize the total
V.sub.1 (m.sub.1)+V.sub.2 (m.sub.2)+V.sub.3 (m.sub.3). The
determination of an allocation that optimizes total utility will
generally be made by executing an optimization method. In one
embodiment, the values (m.sub.1, m.sub.2, m.sub.3) are found by
using standard linear or nonlinear algorithms such as hill
climbing, simulated annealing, linear programming, or mixed-integer
programming. Additionally, the objective function optimized by the
resource arbiter 204 may also include any switching costs that are
incurred when a particular resource 205-209 is reallocated from one
application environment 201-203 to another. Said switching costs
may include, for example, machine downtime and/or other costs
related to installing or removing data or software from the machine
when it is reallocated.
[0029] In block 312, the method 300 executes the resource
allocation decision calculated in block 310, and communicates the
resource allocation decision to the application environments
201-203. In one embodiment, block 312 additionally involves the
causation of manipulations or operations performed upon the
resources 205-209, enabling the resources 205-209 to be used by the
application environments 201-203 to which the resources 205-209
have been assigned, or associated with de-allocating a resource
205-209 from an application environment 201-203 to which the
resource 205-209 is no longer assigned.
[0030] FIG. 4 is a schematic illustration of the basic operations
and functionality of one embodiment of an application environment
module 401 according to the present invention, wherein the
application environment module 401 is any of the application
environments 201-203 depicted in FIG. 2. In one embodiment, the
application environment module 401 comprises an autonomic manager
element 402, a workload router 403, and a system performance
monitoring element 404. Interactions of the application environment
401 with its SLA 410, its client demand 411, its currently
allocated resources (e.g., compute servers 420, 421, and 422), and
with the resource arbiter element 412, are depicted as they were in
FIG. 2.
[0031] While the application environment 401 is in operation, from
time to time client demand 411 is received and transmitted to the
router 403, which thereupon sends said demand 411 to one of the
assigned compute servers 420, 421, or 422, typically based on the
use of a routing or load-balancing method. As client jobs are
processed, their intermediate and final output are returned to the
submitting client. From time to time the performance monitor 404
may observe, request or receive information regarding measures or
statistics of the system performance of the compute servers
420-422, such as CPU/memory usage, average throughput, average
response time, and average queue depth. The autonomic manager 402
combines said performance measures with information regarding the
demand 411, the SLA 610, and the currently allocated resources
420-422, to produce an estimated resource-level utility
function.
[0032] In one embodiment, said utility function indicates V(m), the
value of being allocated an integral quantity m of undifferentiated
compute servers, with the value of m ranging from zero to the total
number of servers in the data center (e.g., data center 200 in FIG.
2). From time to time said utility function is transmitted to the
resource arbiter 412, possibly in response to a prompt or query
sent from the resource arbiter 412. From time to time said resource
arbiter 412 will additionally transmit to the application
environment 401 updated information regarding its set of allocated
resources. The updated information indicates, for example, that
certain compute servers 420-422 are newly available for usage, or
that certain compute servers 420-422 previously used by the
application environment 401 are to be de-allocated and are no
longer available for usage.
[0033] In another embodiment, the autonomic manager module 402 of
FIG. 4 further comprises a capability to model the effect of any
adjustable operational parameters the resources 420-422 may have
(e.g., maximum queue depth, buffer pool sizes, etc.) on the
observed system performance. The autonomic manager 402 further
operates to set said parameters of the resources 420-422, or of the
router 403, or other internal parameters, to values such that the
resulting system-level utility function optimizes the
resource-level utility function.
[0034] In another embodiment of the invention, the autonomic
manager module 402 of FIG. 4 further comprises a capability to
model or predict the demand at future times given the observed
current demand 411, and a capability to model or predict the system
performance at future times given the current demand 411, current
performance, and future allocated resources, which may be the same
or different from the current allocated resources 420-422. The
autonomic manager 402 then computes a resource-level utility
function indicating the cumulative discounted or undiscounted
future utility associated with a hypothetical resource allocation
made at the current time. In one embodiment, the predicted demand
and predicted system performance are deterministic predictions at
each future time. In another embodiment, the predicted demand and
predicted system performance are probability distributions over
possible levels of demand or performance at each future time. In
one embodiment, the cumulative future utility is obtained by
summation over a finite number of discrete future time steps. In
another embodiment, the cumulative future utility is obtained by
integration over a continuous future time interval.
[0035] In another embodiment of the invention, the autonomic
manager module 402 of FIG. 4 does not explicitly predict future
demand or future system performance, but instead uses machine
learning procedures to estimate cumulative discounted or
undiscounted future utility from a temporal sequence of observed
data points, each data point consisting of: an observed demand, an
observed system performance, an observed resource allocation, and
an observed payment as specified by the SLA 410. In one embodiment,
the machine learning procedure consists of a standard reinforcement
learning procedure such as Q-Learning, Temporal Difference
Learning, R-Learning or SARSA.
[0036] FIG. 5 is a high level block diagram of the present dynamic
resource allocation system that is implemented using a general
purpose computing device 500. In one embodiment, a general purpose
computing device 500 comprises a processor 502, a memory 504, a
dynamic resource allocator or module 505 and various input/output
(I/O) devices 506 such as a display, a keyboard, a mouse, a modem,
and the like. In one embodiment, at least one I/O device is a
storage device (e.g., a disk drive, an optical disk drive, a floppy
disk drive). It should be understood that the dynamic resource
allocator 505 can be implemented as a physical device or subsystem
that is coupled to a processor through a communication channel.
[0037] Alternatively, the dynamic resource allocator 505 can be
represented by one or more software applications (or even a
combination of software and hardware, e.g., using Application
Specific Integrated Circuits (ASIC)), where the software is loaded
from a storage medium (e.g., I/O devices 506) and operated by the
processor 502 in the memory 504 of the general purpose computing
device 500. Thus, in one embodiment, the resource allocator 505 for
allocating resources among entities described herein with reference
to the preceding Figures can be stored on a computer readable
medium or carrier (e.g., RAM, magnetic or optical drive or
diskette, and the like).
[0038] The functionalities of the arbiters and the application
environments described with reference to FIGS. 2 and 4 may be
performed by software modules of various types. For example, in one
embodiment, the arbiters and/or application environments comprise
autonomic elements. In another embodiment, the arbiters and/or
application environments comprise autonomous agents software as may
be constructed, for example, using the Agent Building and Learning
Environment (ABLE). The arbiters and/or application environments
may all run on a single computer, or they may run independently on
different computers. Communication between the arbiters and the
application environments may take place using standard interfaces
and communication protocols. In the case of arbiters and
application environments running on different computers, standard
network interfaces and communication protocols may be employed,
such as Web Services interfaces (e.g., those employed in the Open
Grid Services Architecture (OGSA)).
[0039] Thus, the present invention represents a significant
advancement in the field of dynamic resource allocation. A method
and apparatus are provided that enable a finite number of resources
to be dynamically allocated among a number of entities or
application environments capable of performing computational work
given the resources. The allocation is performed in a manner that
optimizes the business value of the enterprise providing the
computing services to a number of clients.
[0040] While foregoing is directed to the preferred embodiment of
the present invention, other and further embodiments of the
invention may be devised without departing from the basic scope
thereof, and the scope thereof is determined by the claims that
follow.
* * * * *