U.S. patent application number 11/473818 was filed with the patent office on 2007-12-27 for dynamic application instance placement in data center environments.
This patent application is currently assigned to International Business Machines Corporation. Invention is credited to Constantin M. Adam, Michael Joseph Spreitzer, Malgorzata Steinder, Chunqiang Tang.
Application Number | 20070300239 11/473818 |
Document ID | / |
Family ID | 38874916 |
Filed Date | 2007-12-27 |
United States Patent
Application |
20070300239 |
Kind Code |
A1 |
Adam; Constantin M. ; et
al. |
December 27, 2007 |
Dynamic application instance placement in data center
environments
Abstract
Techniques are disclosed for determining placements of
application instances on computing resources in a computing system
such that the application instances can be executed thereon. By way
of example, a method for determining an application instance
placement in a set of machines under one or more resource
constraints includes the following steps. An estimate is computed
of a value of the first metric that can be achieved by a current
application instance placement and a current application load
distribution. A new application instance placement and a new
application load distribution are determined, wherein the new
application instance placement and the new load distribution
optimize the first metric.
Inventors: |
Adam; Constantin M.; (New
York, NY) ; Spreitzer; Michael Joseph;
(Croton-on-Hudson, NY) ; Steinder; Malgorzata;
(Leonia, NJ) ; Tang; Chunqiang; (Ossining,
NY) |
Correspondence
Address: |
Ryan, Mason & Lewis, LLP
90 Forest Avenue
Locust Valley
NY
11560
US
|
Assignee: |
International Business Machines
Corporation
Armonk
NY
|
Family ID: |
38874916 |
Appl. No.: |
11/473818 |
Filed: |
June 23, 2006 |
Current U.S.
Class: |
719/320 |
Current CPC
Class: |
G06F 9/505 20130101;
G06F 9/5066 20130101 |
Class at
Publication: |
719/320 |
International
Class: |
G06F 9/44 20060101
G06F009/44 |
Claims
1. A method for determining an application instance placement in a
set of machines under one or more resource constraints, the method
comprising the steps of: computing an estimate of a value of a
first metric that can be achieved by a current application instance
placement and a current application load distribution; and
determining a new application instance placement and a new
application load distribution that optimizes the first metric.
2. The method of claim 1, wherein the determining step further
comprises the new application instance placement improving upon the
first metric and the new load distribution improving upon a second
metric.
3. The method of claim 1, wherein the determining step further
comprises shifting an application load.
4. The method of claim 3, wherein the determining step further
comprises changing the application instance placement without
pinning to determine a first candidate placement.
5. The method of claim 4, wherein the determining step further
comprises changing the application instance placement with pinning
to determine a second candidate placement.
6. The method of claim 5, wherein the determining step further
comprises selecting a best placement from the first candidate
placement and the second candidate placement as the new application
instance placement.
7. The method of claim 1, wherein the determining step is performed
multiple times.
8. The method of claim 1, further comprising the step of balancing
an application load across the set of machines.
9. The method of claim 1, wherein the first metric comprises a
total number of satisfied demands.
10. The method of claim 1, wherein the first metric comprises a
total number of placement changes.
11. The method of claim 1, wherein the first metric comprises an
extent to which an application load is balanced across the set of
machines.
12. The method of claim 1, wherein one of the one or more resource
constraints comprises a processing capacity.
13. The method of claim 1, wherein one of the one or more resource
constraints comprises a memory capacity.
14. The method of claim 1, wherein the second metric comprises a
degree of correlation between residual resources on each machine of
the set of machines.
15. The method of claim 1, wherein the second metric comprises a
number of underutilized application instances.
16. Apparatus for determining an application instance placement in
a set of machines under one or more resource constraints, the
apparatus comprising: a memory; and at least one processor coupled
to the memory and operative to: (i) compute an estimate of a value
of a first metric that can be achieved by a current application
instance placement and a current application load distribution; and
(ii) determine a new application instance placement and a new
application load distribution that optimizes the first metric.
17. An article of manufacture for determining an application
instance placement in a set of machines under one or more resource
constraints, comprising a machine readable medium containing one or
more programs which when executed implement the steps of: computing
an estimate of a value of a first metric that can be achieved by a
current application instance placement and a current application
load distribution; and determining a new application instance
placement and a new application load distribution that optimizes
the first metric.
Description
FIELD OF THE INVENTION
[0001] The present invention generally relates to computing systems
and, more particularly, to techniques for determining placements of
application instances on computing resources in a computing system
such that the application instances can be executed thereon.
BACKGROUND OF THE INVENTION
[0002] With the rapid growth of the Internet, many organizations
increasingly rely on web (i.e., World Wide Web) applications to
deliver critical services to their customers and partners. An
"application" generally refers to software code (e.g., one or more
programs) which perform one or more functions.
[0003] Over the course of a decade, web applications have evolved
from the early HyperText Transport Protocol (HTTP) servers that
only deliver static HyperText Markup Language (HTML) files, to the
current ones that run in sophisticated distributed environments,
e.g., Java 2 Enterprise Edition (J2EE), and provide a diversity of
services such as online shopping, online banking, and web search.
Modern Internet data centers may run thousands of machines to host
a large number of different web applications. Many web applications
are resource demanding and process client requests at a high rate.
Previous studies have shown that the web request rate is bursty in
nature and can fluctuate dramatically in a short period of time.
Therefore, it is not cost-effective to over provision data centers
in order to handle the potential peak demands of all the
applications.
[0004] To utilize system resources more effectively, modern web
applications typically run on top of a middleware system and rely
on it to dynamically allocate resources to meet the applications'
performance goals. "Middleware" generally refers to the software
layer that lies between the operating system and the applications.
Some middleware systems use a clustering technology to improve
scalability, availability and load balancing, by integrating
multiple instances of the same application, and presenting them to
the users as a single virtual application.
SUMMARY OF THE INVENTION
[0005] Principles of the invention provide techniques for
determining placements of application instances on computing
resources in a computing system such that the application instances
can be executed thereon.
[0006] By way of example, in one aspect of the invention, a method
for determining an application instance placement in a set of
machines under one or more resource constraints includes the
following steps. An estimate is computed of a value of the first
metric that can be achieved by a current application instance
placement and a current application load distribution. A new
application instance placement and a new application load
distribution are determined, wherein the new application instance
placement and the new application load distribution optimize the
first metric.
[0007] The determining step may further include the new application
instance placement improving upon the first metric and the new load
distribution improving upon a second metric. The determining step
may further include shifting an application load, changing the
application instance placement without pinning to determine a first
candidate placement, changing the application instance placement
with pinning to determine a second candidate placement, and
selecting a best placement from the first candidate placement and
the second candidate placement as the new application instance
placement. The determining step may be performed multiple
times.
[0008] The method may also include the step of balancing an
application load across the set of machines.
[0009] The first metric may include a total number of satisfied
demands, a total number of placement changes, or an extent to which
an application load is balanced across the set of machines.
[0010] One of the one or more resource constraints may include a
processing capacity or a memory capacity.
[0011] The second metric may include a degree of correlation
between residual resources on each machine of the set of machines,
or a number of underutilized application instances.
[0012] These and other objects, features and advantages of the
present invention will become apparent from the following detailed
description of illustrative embodiments thereof, which is to be
read in connection with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] FIG. 1 illustrates an example of clustered web applications,
according to an embodiment of the invention.
[0014] FIG. 2 illustrates a control loop and system for solving an
application placement problem, according to an embodiment of the
invention.
[0015] FIG. 3 illustrates symbols used in a description of an
application placement problem, according to an embodiment of the
invention.
[0016] FIG. 4 illustrates a high-level pseudo code implementation
of an application placement algorithm, according to an embodiment
of the invention.
[0017] FIG. 5 illustrates a max-flow problem for use in solving an
application placement problem, according to an embodiment of the
invention.
[0018] FIG. 6 illustrates a pseudo code implementation of an
application placement algorithm, according to an embodiment of the
invention.
[0019] FIG. 7 illustrates a pseudo code implementation of placement
changing function, according to an embodiment of the invention.
[0020] FIG. 8 illustrates a pseudo code implementation of load
shifting function, according to an embodiment of the invention.
[0021] FIG. 9 illustrates a graphical user interface for use with
an application placement algorithm, according to an embodiment of
the invention.
[0022] FIG. 10 illustrates a computing system for implementing an
application placement algorithm, according to an embodiment of the
invention.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0023] Illustrative principles of the invention will be explained
below in the context of an Internet-based/web application
environment. However, it is to be understood that the present
invention is not limited to such an environment. Rather, the
invention is more generally applicable to any data processing
environment in which it would be desirable to provide improved
processing performance.
[0024] In the illustrative description below, the following problem
is addressed. Given a set of machines (computing systems or
servers) and a set of web applications with dynamically changing
demands (e.g., the number of client requests for use of the
application), an application placement controller decides how many
instances to run for each application and where to put them (i.e.,
which machines to assign them to), while observing a variety of
resource constraints. "Instances" of an application generally refer
to identical copies of the application, but can also refer to
different or even overlapping parts of the application. This
problem is considered non-deterministic polynomial-time (NP) hard.
Illustrative principles of the invention propose an online
algorithm that uses heuristics to efficiently solve this problem.
The algorithm allows multiple applications to share a single
machine, and strives to maximize the total satisfied application
demand, to minimize the number of application starts and stops, and
to balance the load across machines. It is to be understood that
reasonable extensions of the proposed algorithm can also optimize
for other performance goals, for example, maximize or minimize
certain user specified utility functions.
[0025] FIG. 1 is an example of clustered web applications. System
100 includes one front-end request router 102, three back-end
computing nodes 104 (A, B, and C), and three applications 106 (x,
y, and z). The applications, for example, can be a catalog search
application, an order processing application, and an account
management application, for an online shopping site. Request router
102 receives external requests (from client devices, not shown) and
forwards them to the appropriate instances of the three
applications (106-x, 106-y, and 106-z). To achieve the quality of
service (QoS) goals of the applications, the request router may
implement functions such as admission control, flow control, and
load balancing.
[0026] Flow control and load balancing decide how to dynamically
allocate resources to the running application instances.
Illustrative principles of the invention address an equally
important problem. That is, given a set of machines with
constrained resources and a set of web applications with
dynamically changing demands, we determine how many instances to
run for each application and what machine to execute them on.
[0027] We call this problem dynamic application placement. We
assume that not every machine can run all the applications at the
same time due to limited resources such as memory.
[0028] Application placement is orthogonal to flow control and load
balancing, and the quality of a placement solution can have
profound impacts on the performance of the entire system (i.e., the
complete set of machines used for hosting applications). In FIG. 1,
suppose the request rate for application z suddenly surges.
Application z may not meet the demands even if all the resources of
machine C are allocated to application z. A middleware system then
may react by stopping application x on machines A and B, and using
the freed resources (e.g., memory) to start an instance of
application z on both A and B.
[0029] We illustratively formulate the application placement
problem as a variant of the Class Constrained Multiple-Knapsack
Problem (see, e.g., H. Shachnai and T. Tamir, "Noah's bagels--some
combinatorial aspects," In Proc. 1st Int. Conf. on Fun with
Algorithms, 1998; and H. Shachnai and T. Tamir, "On two
class-constrained versions of the multiple knapsack problem,"
Algorithmica, 29(3), pp. 442-467, 2001). Under multiple resource
constraints (e.g., CPU and memory) and application constraints
(e.g., the need for special hardware or software), an automated
placement algorithm strives to produce placement solutions that
optimize multiple objectives: (1) maximizing the total satisfied
application demand, (2) minimizing the total number of application
starts and stops, and (3) balancing the load across machines. It is
to be understood that we can also optimize for other objective
functions, for example, a user specified utility function.
[0030] The placement problem is NP hard. In one embodiment, the
invention provides an online heuristic algorithm that can produce
within 30 seconds high-quality solutions for hard placement
problems with thousands of machines and thousands of application.
This scalability is crucial for dynamic resource provisioning in
large-scale enterprise data centers. Compared with existing
algorithms, for systems with 100 machines or less, the proposed
algorithm is up to 134 times faster, reduces the number of
application starts and stops by up to a factor of 32, and satisfies
up to 25% more application demands.
[0031] The remainder of the detailed description is organized as
follows. Section I formulates the application placement problem.
Section II describes an illustrative placement algorithm.
I. Problem Formulation
[0032] FIG. 2 is a diagram of a control loop and system 200 for
solving the application placement problem. For brevity, we simply
refer to "application placement" as "placement" in the following
illustrative description. Placement controller 202 is the main
placement processing component of the control loop. The set of
machines (data center) 203 includes the machines for which
placement controller 202 determines application placement.
[0033] Inputs 204 to placement controller 202 include the current
placement of applications on machines (matrix I), the resource
capacity of each machine (CPU capacity vector .OMEGA. and memory
capacity vector .GAMMA.), the projected resource demand of each
application (CPU demand vector .omega. and memory demand vector
.gamma.), and the restrictions that specify whether a given
application can run on a given machine (matrix R), e.g., some
application may require machines with special hardware or software.
It is to be appreciated that such inputs are collected by auxiliary
components. That is, placement sensor 205 generates and maintains
current placement matrix I. Application demand estimator 206
generates and maintains the projected resource demand of each
application (CPU demand vector .omega. and memory demand vector
.gamma.). Configuration database 207 maintains the resource
capacity of each machine (CPU capacity vector .OMEGA. and memory
capacity vector .GAMMA.).
[0034] Taking inputs 204, placement controller 202 generates
outputs 208 including new placement matrix I and load distribution
matrix L. That is, placement controller 202 computes a new
placement solution (new matrix I) that optimizes certain objective
functions, and then passes the solution to placement executor 209
to start and stop application instances accordingly. The placement
executor schedules placement changes in such a way that they impose
minimum disturbances to the running system. Periodically every T
minutes, the placement controller produces a new placement solution
based on the current inputs. By way of example only, T=15 minutes
may be a default configuration.
[0035] Estimating application demands is a non-trivial task. In one
embodiment, we use online profiling and linear regression to
dynamically estimate the average CPU cycles needed to process one
web request for a given application. The product of the estimated
CPU cycles per request and the projected request rate gives the CPU
cycles needed by the application per second. However, it is to be
understood that other known techniques for estimating application
demand may be used.
[0036] The remainder of this section presents the formal
formulation of the illustrative placement problem. We first discuss
the system resources and application demands considered in the
placement problem. An application's demands for resources can be
characterized as either load-dependent or load-independent. A
running application instance's consumption of load-dependent
resources depends on the request rate. Examples of such resources
include CPU cycles and network bandwidth. A running application
instance also consumes some load-independent resources regardless
of the offered load, i.e., even if it processes no requests. An
example of such resources is the process control block (PCB)
maintained in the operating system kernel for each running
program.
[0037] In this embodiment, for practical reasons, we treat memory
as a load-independent resource, and conservatively estimate the
memory usage to ensure that every running application has
sufficient memory. It is assumed that the system includes a
component that dynamically estimates the upper limit of an
application's near-term memory usage based on a time series of its
past memory usage. Because the memory usage estimation is updated
dynamically, some load-dependent aspects of memory are indirectly
considered by the placement controller.
[0038] We treat memory as a load-independent resource for several
reasons. First, a significant amount of memory is consumed by an
application instance even if it receives no requests. Second,
memory consumption is often related to prior application usage
rather than its current load. For example, even in the presence of
a low load, memory usage may still be high as a result of data
caching. Third, because an accurate projection of future memory
usage is extremely difficult and many applications cannot run when
the system is out of memory, it is more reasonable to be
conservative in the estimation of memory usage, i.e., using the
upper limit instead of the average.
[0039] Among many load-dependent and load-independent resources, we
choose CPU and memory as the representative ones to be considered
by the placement controller, because we observe that they are the
most common bottleneck resources. For example, our experience shows
that many business J2EE applications require on average 1-2 GB
(gigabyte) real memory to run. For brevity, the description of the
algorithm only considers CPU and memory, but it is to be understood
that the algorithm can consider other types of resources as well.
For example, if the system is network-bounded, we can use network
bandwidth as the load-dependent resource, which introduces no
changes to the algorithm.
[0040] Next, we present the formal formulation of the placement
problem. FIG. 3 lists the symbols used in the description. The
inputs to the placement controller are the current placement matrix
I, the placement restriction matrix R, the CPU and memory capacity
of each machine (.OMEGA..sub.n and .GAMMA..sub.n), the CPU and
memory demand of each application (.omega..sub.m, and
.gamma..sub.m). Note that .omega..sub.m is application m's
aggregated CPU demand throughout the entire system (i.e., the
complete set of machines used for hosting applications), while
.gamma..sub.m is the memory requirement to run one instance of
application m. Due to special hardware or software requirements, an
application m may not be able to run on a machine n. This placement
restriction is represented as R.sub.m,n=0.
[0041] The outputs 208 of placement controller 202 are the updated
placement matrix I and the load distribution matrix L. Placement
executor 209 starts and stops application instances according to
the difference between the old and new placement matrices. The load
distribution matrix L is a byproduct. It helps verify the maximum
total application demand that can be satisfied by the new placement
matrix I. L may or may not be directly used by the placement
executor or the request router. The request router may dynamically
balance the load according to the real received demands rather than
the load distribution matrix L computed based on the projected
demands.
[0042] Placement controller 202 strives to find a placement
solution that maximizes the total satisfied application demand.
Again, it is to be understood that this is just one example of the
optimization goal. That is, principles of the invention may also be
used to optimize for other objective functions instead of
maximizing the total satisfied demand, for example, maximize
certain user-specified utility function. In addition, the placement
controller also tries to minimize the total number of application
starts and stops, because placement changes disturb the running
system and waste CPU cycles. In practice, many J2EE applications
take a few minutes to start or stop, and take some additional time
to warm up their data cache. The last optimization goal is to
balance the load across machines. Ideally, the utilization of
individual machines should stay close to the utilization p of the
entire system:
.rho. = .SIGMA. m .di-elect cons. M .SIGMA. n .di-elect cons. N L m
, n .SIGMA. n .di-elect cons. N .OMEGA. n ( 1 ) ##EQU00001##
[0043] As we are dealing with multiple optimization objectives, we
prioritize them in the formal problem statement below. Let I*
denote the old placement matrix, and I denote the new placement
matrix:
( i ) maximum m .di-elect cons. M n .di-elect cons. N L m , n ( 2 )
( ii ) minimize m .di-elect cons. M n .di-elect cons. N I m , n - I
m , n * ( 3 ) ( iii ) minimize n .di-elect cons. N .SIGMA. m
.di-elect cons. M L n , m .OMEGA. n - .rho. such that ( 4 )
.A-inverted. m .di-elect cons. M , .A-inverted. n .di-elect cons. N
I m , n = 0 or I m , n = 1 ( 5 ) .A-inverted. m .di-elect cons. M ,
.A-inverted. n .di-elect cons. N R m , n = 0 I m , n = 0 ( 6 )
.A-inverted. m .di-elect cons. M , .A-inverted. n .di-elect cons. N
I m , n = 0 L m , n = 0 ( 7 ) .A-inverted. m .di-elect cons. M ,
.A-inverted. n .di-elect cons. N L m , n .gtoreq. 0 ( 8 )
.A-inverted. n .di-elect cons. N m .di-elect cons. M .gamma. m I m
, n .ltoreq. .GAMMA. n ( 9 ) .A-inverted. n .di-elect cons. N m
.di-elect cons. M L m , n .ltoreq. .OMEGA. n ( 10 ) .A-inverted. n
.di-elect cons. M n .di-elect cons. N L m , n .ltoreq. w m ( 11 )
##EQU00002##
[0044] As mentioned above, this optimization problem is a variant
of the Class Constrained Multiple-Knapsack problem. It differs from
the prior formulation mainly in that it also minimizes the number
of placement changes. This problem is NP hard. In the next section,
we present an online heuristic algorithm for solving the
optimization problem.
II. Placement Algorithm
[0045] This section describes an illustrative embodiment of a
placement algorithm, which can efficiently find high-quality
placement solutions even under tight resource constraints. FIG. 4
shows a high-level pseudo code implementation of a placement
algorithm according to an embodiment of the invention. A more
complete version is illustrated in FIGS. 6, 7 and 8.
[0046] The core of the place ( ) function is a loop that
incrementally optimizes the placement solution. Inside the loop,
the algorithm first solves the max-flow problem (see, e.g., R. K.
Ahuja, T. L. Magnanti, and J. B. Orlin, editors, "Network Flows:
Theory, Algorithms, and Applications," Prentice Hall, N.J., 1993,
ISBN 1000499012) in FIG. 5 to compute the maximum total demand that
can be satisfied by the current placement matrix. The algorithm
then invokes the load_shifting ( ) subroutine to move load among
machines (without any placement changes) in preparation for
subsequent placement changes. Finally, the algorithm invokes the
placement_changing ( ) subroutine to start or stop application
instances in order to increase the total satisfied application
demand. Note that "placement change" and "load shifting" in the
algorithm description are all hypothetical. The real placement
changes are executed after the placement algorithm finishes. The
outputs of the placement algorithm are the updated placement matrix
I and the new load distribution matrix L. The load_shifting ( )
subroutine modifies only L whereas the placement_changing ( )
subroutine modifies both I and L.
[0047] Below, we first define some terms that will be used in the
algorithm description (subsection A), and then generally describe
key concepts of the algorithm (subsections B and C). Finally, we
describe in detail the load-shifting subroutine (subsection D), the
placement-changing subroutine (subsection E), and the full
placement algorithm (subsection F) that invokes the two
subroutines.
[0048] A. Definition of Terms
[0049] A machine is fully utilized if its residual CPU capacity is
zero (.OMEGA.*.sub.n=0); otherwise, it is underutilized. An
application instance is fully utilized if it runs on a fully
utilized machine. An instance of application m running on an
underutilized machine n is completely idle if it has no load
(L.sub.m,n=0); otherwise, it is underutilized. The load of an
underutilized instance of application m can be increased if
application m has a positive residual CPU demand
(.omega.*.sub.m>0). Note that the definition of a machine's
utilization is solely based on its CPU usage.
[0050] The CPU-memory ratio of a machine n is defined as its CPU
capacity divided by its memory capacity, i.e.,
.OMEGA..sub.n/.GAMMA..sub.n. Intuitively, it is harder to fully
utilize the CPU of machines with a high CPU-memory ratio. The
load-memory ratio of an instance of application m running on
machine n is defined as the CPU load of this instance divided by
its memory consumption, i.e., L.sub.m,n/.gamma..sub.m. Intuitively,
application instances with a higher load-memory ratio are more
useful.
[0051] B. Load Shifting
[0052] Solving the max-flow problem in FIG. 5 gives the maximum
total demand w that can be satisfied by the current placement
matrix I. Among many possible load distribution matrices L that can
meet this maximum demand w, we employ several load-shifting
heuristics to find the one that makes later placement changes
easier.
[0053] We classify the running instances of an application into
three categories: idle, underutilized, and fully utilized. The idle
instances are preferred candidates to be shut down. We opt for
leaving the fully utilized instances intact.
[0054] Through proper load shifting, we can ensure that every
application has at most one underutilized instance in the entire
system. Reducing the number of underutilized instances simplifies
the placement problem, because the heuristics to handle idle
instances and fully utilized instances are straightforward. The
issue of load balancing will be addressed separately in a later
stage of the algorithm.
[0055] We strive to co-locate the residual memory and the residual
CPU on the same machines so that the residual resources can be used
to start new application instances. For example, if one machine has
only residual CPU and another machine has only residual memory,
neither of them can accept new applications.
[0056] We strive to make idle application instances appear on the
machines with more residual memory. By shutting down the idle
instances, more memory will become available for hosting
applications with a high memory requirement.
[0057] C. Placement Changing
[0058] The load_shifting ( ) subroutine prepares the load
distribution in a way that makes later placement changes easier.
The placement_changing ( ) subroutine further employs several
heuristics to increase the total satisfied application demand, to
reduce placement changes, and to reduce computation time.
[0059] The algorithm walks through the underutilized machines
sequentially and makes placement changes to them one by one in an
isolated fashion. When working on a machine n, the algorithm is
only concerned with the state of machine n and the residual
application demands. The states of other machines do not directly
affect the current decision to be made for machine n. Moreover,
once the applications to run on machine n are decided, later
placement changes on other machines will not affect the decision
already made for machine n. This isolation of machines dramatically
reduces the complexity of the algorithm.
[0060] The isolation of machines, however, may lead to inferior
placement solutions. We address this problem by alternately
executing the load-shifting subroutine and the placement-changing
subroutine for multiple rounds. As a result, the residual
application demands released from the application instances stopped
in the previous round now have the opportunity to be allocated to
other machines in the later rounds.
[0061] When sequentially walking through the underutilized
machines, the algorithm considers machines with a relatively high
CPU-memory ratio first. Because it is harder to fully utilize these
machines' CPU, we prefer to process them first when we still have
abundant options.
[0062] When considering the applications to run on a machine, the
algorithm tries to find a combination of applications that lead to
the highest CPU utilization of this machine. It prefers to stop the
running application instances with a relatively low load-memory
ratio in order to accommodate new application instances.
[0063] To reduce placement changes, the algorithm does not allow
stopping application instances that already deliver a sufficiently
high load. We refer to these instances as pinned instances. The
intuition is that, even if we stop these instances on their hosting
machines, it is likely that we will start instances of the same
applications on other machines. The algorithm dynamically computes
the pinning threshold for each application.
[0064] D. Load-Shifting Subroutine
[0065] Given the current application demands, the placement
algorithm solves a max-flow problem to derive the maximum total
demand that can be satisfied by the current placement matrix I.
FIG. 5 is an example of this max-flow problem, in which we consider
four applications (w, x, y, and z) and three machines (A, B, and
C). Each application is represented as a node in the graph. Each
machine is also represented as a node. In addition, there are a
source node and a sink node. The source node has an outgoing link
to each application m, and the capacity of the link is the CPU
demand of the application (.omega..sub.m). Each machine n has an
outgoing link to the sink node, and the capacity of the link is the
CPU capacity of the machine (.OMEGA..sub.n). The last set of links
are between the applications and the machines that currently run
those applications. The capacity of these links is unlimited. In
FIG. 5, application x currently runs on machines A and B.
Therefore, x has two outgoing links: x.fwdarw.A and x.fwdarw.B.
[0066] When the load distribution problem is formulated as this
max-flow problem, the maximum volume of flows going from the source
node to the sink node is the maximum total demand w that can be
satisfied by the current placement matrix I. Efficient algorithms
to solve max-flow problems are well known (see, e.g., R. K. Ahuja,
T. L. Magnanti, and J. B. Orlin, editors, "Network Flows: Theory,
Algorithms, and Applications," Prentice Hall, N.J., 1993, ISBN
1000499012). If w equals to the total application demand, no
placement changes are needed. Otherwise, some placement changes are
made in order to satisfy more demands. Before doing so, the load
distribution matrix L produced by solving the max-flow problem in
FIG. 5 is first adjusted. A goal of this load shifting process is
to achieve the effects described above, for example, co-locating
the residual CPU and the residual memory on the machines.
[0067] The task of load shifting is accomplished by solving the
min-cost max-flow problem in FIG. 5. We sort all the machines in
increasing order of residual memory capacity .GAMMA.*.sub.n, and
associate each machine n with a rank r.sub.n that reflects its
position in this sorted list. The machine with rank 0 has the
smallest residual memory. In FIG. 5, the link between a machine n
and the sink node is associated with the cost r.sub.n. The cost of
all the other links is zero, which is not shown in the figure for
brevity. In this example, machine C has more residual memory than
machine A, and machine A has more residual memory that machine B.
Therefore, the links between the machines and the sink node have
costs r.sub.B=0, r.sub.A=1, and r.sub.c=2 respectively.
[0068] The load distribution matrix L produced by solving the
min-cost max-flow problem in FIG. 5 has the following properties:
(1) an application has at most one underutilized instance in the
entire system; (2) the residual memory and the residual CPU are
likely to co-locate on the same machines; and (3) the idle
application instances appear on the machines with relatively more
residual memory. That is, in the load distribution matrix L
produced by solving the min-cost max-flow problem in FIG. 5, an
application has at most one underutilized instance in the entire
system. Furthermore, in the load distribution matrix L produced by
solving the min-cost max-flow problem in FIG. 5, if application m
has one underutilized instance running on machine n, then (1)
application m's idle instances must run on machines whose residual
memory is larger than or equal to that of machine n; and (2)
application m's fully utilized instances must run on machines whose
residual memory is smaller than or equal to that of machine n. It
is to be appreciated that these properties make later placement
changes easier.
[0069] E. Placement-Changing Subroutine
[0070] The placement-changing subroutine takes as input the current
placement matrix I, the load distribution matrix L generated by the
load-shifting subroutine, and the residual application demands not
satisfied by L. It tries to increase the total satisfied
application demand by making some placement changes, for instance,
stopping idle application instances and starting useful ones.
Again, note that the "placement changes" in the algorithm
description are all hypothetical.
[0071] As shown in FIG. 4, the main structure of the
placement-changing subroutine includes three nested loops. The
outermost loop iterates over the machines and asks the intermediate
loop to generate a placement solution for one machine n at a time.
Suppose machine n currently runs c not-pinned application instances
(M.sub.1, M.sub.2, . . . , M.sub.c) sorted in increasing order of
load-memory ratio. The intermediate loop iterates over a variable j
(0.ltoreq.j.ltoreq.c). In iteration j, it stops on machine n the j
applications (M.sub.1, M.sub.2, . . . , M.sub.j) while keeping the
other running applications intact, and then asks the innermost loop
to find appropriate applications to consume machine n's residual
resources. The innermost loop walks through the residual
applications, and identifies those that can fit on machine n. As
the intermediate loop varies the number of stopped applications
from 0 to c, it collects c+1 different placement solutions for
machine n, among which it picks the best one as the final
solution.
[0072] In the rest of this subsection, we describe the three nested
loops in more detail.
[0073] The Outermost Loop. Before entering the outermost loop, the
algorithm first computes the residual CPU demand of each
application. We refer to the applications with a positive residual
CPU demand (i.e., w*.sub.m>0) as residual applications. The
algorithm inserts all the residual application into a
right-threaded AVL (Adelson-Velsky Landis) tree called
residual_app_tree. The applications in the tree are sorted in
decreasing order of residual demand. As the algorithm progresses,
the residual demand of applications may change, and the tree is
updated accordingly. The algorithm also keeps track of the minimum
memory requirement .gamma..sub.min of applications in the tree,
.gamma. min = min m .di-elect cons. residual_app _tree .gamma. m ,
( 12 ) ##EQU00003##
where .gamma..sub.m is the memory needed to run one instance of
application m. The algorithm uses .gamma..sub.m to speedup the
computation in the innermost loop. If a machine n's residual memory
is smaller than .GAMMA..sub.min (i.e.,
.GAMMA.*.sub.n<.gamma..sub.min), the algorithm can immediately
infer that this machine cannot accept any applications in the
residual_app_tree.
[0074] The algorithm excludes fully utilized machines from the
consideration of placement changes, and sorts the underutilized
machines in decreasing order of CPU-memory ratio. Starting from the
machine with the highest CPU-memory ratio, it enumerates each
underutilized machine, and asks the intermediate loop to compute a
placement solution for the machine. Because it is harder to fully
utilize the CPU of machines with a high CPU-memory ratio, we prefer
to process them first when we still have abundant options.
[0075] The Intermediate Loop. Taking as input the residual_app_tree
and a machine n given by the outermost loop, the intermediate loop
computes a placement solution for machine n. Suppose machine n
currently runs c not-pinned application instances. Application
instance pinning is described below. We can stop a subset of the c
applications, and use the residual resources to run other
applications. In total, there are 2.sup.c cases to consider. We use
a heuristic to reduce this number to c+1. Intuitively, we prefer to
stop the less "useful" application instances, i.e., those with a
low load-memory ratio (L.sub.m,n/.gamma..sub.m)
[0076] The algorithm first sorts the not-pinned application
instances on machine n in increasing order of load-memory ratio.
Let (M.sub.1, M.sub.2, . . . , M.sub.c) denote this sorted list.
The intermediate loop iterates over a variable j
(0.ltoreq.j.ltoreq.c). In iteration j, it stops on machine n the j
applications (M.sub.1, M.sub.2, . . . , M.sub.j) while keeping the
other running applications intact, and then asks the innermost loop
to find appropriate applications to consume machine n's residual
resources that become available after stopping the j applications.
As the intermediate loop varies the number of stopped applications
from 0 to c, it collects c+1 placement solutions, among which it
picks as the final solution the one that leads to the highest CPU
utilization of machine n.
[0077] We illustrate this through an example. Suppose machine n
currently runs three not-pinned application instances (M.sub.1,
M.sub.2, M.sub.3) sorted in increasing order of load-memory ratio.
Intuitively, M.sub.3 is more useful than M.sub.2, and M.sub.2 is
more useful than M.sub.1. The algorithm tries four placement
solutions. In solution 1, it stops none of M.sub.1, M.sub.2, and
M.sub.3. In solution 2, it stops M.sub.1 but keeps M.sub.2 and
M.sub.3. In solution 3, it stops M.sub.1 and M.sub.2,but keeps
M.sub.3. In solution 4, it stops M.sub.0, M.sub.1, and M.sub.2 .
For each solution, the innermost loop finds appropriate
applications to consume machine n's residual resources that become
available after stopping the applications. Among the four
solutions, the algorithm picks the best one as the final
solution.
[0078] The Innermost Loop. The intermediate loop changes the number
of applications to stop. The innermost loop uses machine n's
residual resources to run some residual applications. Recall that
the residual_app_tree is sorted in decreasing order of residual CPU
demand. The innermost loop iterates over the residual applications,
starting from the one with the largest residual demand. When an
application m is under consideration, it checks two conditions: (1)
if the restriction matrix R allows application m to run on machine
n, and (2) if machine n has sufficient residual memory to host
application m, (i.e., .gamma..sub.m.ltoreq..GAMMA.*.sub.n). If both
conditions are satisfied, it places application m on machine n, and
assigns as much load as possible to this instance until either
machine n's CPU is fully utilized or application m has no residual
demand. After this allocation, application m's residual demand
changes, and the residual_app_tree is updated accordingly.
[0079] The algorithm loops over the residual applications until
either: (1) all the residual applications have been considered
once; or (2) machine n's CPU becomes fully utilized; or (3) machine
n's residual memory is insufficient to host any residual
application (i.e., .GAMMA.*.sub.n<.gamma..sub.min, see Equation
12). Typically, after hosting a few residual applications, machine
n's residual memory quickly becomes too small to host more residual
applications. Therefore, the third condition helps reduce
computation time.
[0080] F. Full Placement Algorithm
[0081] While the placement algorithm is outlined in FIG. 4, a full
placement algorithm is illustrated in detail in FIGS. 6 through 8.
Namely, FIG. 6 illustrates pseudo code for the place function, FIG.
7 illustrates pseudo code for the placement changing function, and
FIG. 8 illustrates pseudo code for the load shifting function.
[0082] The placement algorithm incrementally optimizes the
placement solution in multiple rounds. In one round, it first
invokes the load-shifting subroutine and then invokes the
placement-changing subroutine. It repeats for up to K rounds, but
quits earlier it sees no improvement in the total satisfied
application demand after one round of execution. The last step of
the algorithm balances the load across machines. By way of example
only, we use the load-balancing component from an exiting algorithm
(A. Karve, T. Kimbrel, G. Pacifici, M. Spreitzer, M. Steinder, M.
Sviridenko, and A. Tantawi, "Dynamic Application Placement for
Clustered Web Applications," In the International World Wide Web
Conference (WWW), May 2006). However, other existing load balancing
techniques can be employed. Intuitively, when the algorithm has
choices, it moves the new application instances (started by the
placement-changing subroutine) among machines to balance the load,
while keeping the total satisfied demand and the number of
placement changes the same.
[0083] The placement algorithm deals with multiple optimization
objectives. In addition to maximizing the total satisfied demand,
it also strives to minimize placement changes, because they disturb
the running system and waste CPU cycles. In practice, many J2EE
applications take a few minutes to start or stop, and take some
additional time to warm up their data cache. The heuristic for
reducing unnecessary placement changes is not to stop application
instances whose load (in the load distribution matrix L) is above
certain threshold. We refer to them as pinned instances. The
intuition is that, even if we stop these instances on their hosting
machines, it is likely that we will start instances of the same
applications on other machines.
[0084] Each application m has its own pinning threshold
w.sub.m.sup.pin. If the value of the threshold is too low, the
algorithm may introduce many unnecessary placement changes. If it
is too high, the total satisfied demand may be low due to
insufficient placement changes. The algorithm computes the pinning
thresholds for all the applications from the information gathered
in a single dry-run invocation to the placement-changing
subroutine. The dry run pins no application instances. After the
dry run, the algorithm makes a second invocation to the
placement-changing subroutine, and requires pinning the application
instances whose load is higher than or equal to the pinning
threshold of the corresponding application, i.e.,
L.sub.m,n.gtoreq.w.sub.m.sup.pin. The dry run and the second
invocation use exactly the same inputs: the matrices I and L
produced by the load-shifting subroutine. Between the two placement
solutions produced by the dry run and the second invocation, the
algorithm picks as the final solution the one that has a higher
total satisfied demand. If the total satisfied demands are equal
(e.g., both solutions satisfy all the demands), it picks the one
that has less placement changes.
[0085] Next, we describe how to compute the pinning threshold
w.sub.m.sup.pin for each application m from the information
gathered in the dry run. Intuitively, if the dry run starts a new
application instance, then we should not stop any instance of the
same application whose load is higher than or equal to that of the
new instance. This is because the new instance's load is considered
sufficiently high by the dry run so that it is even worthwhile to
start a new instance. Let w.sub.m.sup.new denote the minimum load
assigned to a new instance of application m in the dry run.
w m new = min { L m , n after the dry run } I m , n .di-elect cons.
{ new instances of app m started in the dry run } ( 13 )
##EQU00004##
[0086] Here I.sub.m,n represents a new instance of application m
started on machine n in the dry run. L.sub.m,n is the load of this
instance. In addition, the pinning threshold also depends the
largest residual demand w*.sub.max not satisfied in the dry
run.
w max * = max w m * m .di-elect cons. { residual_app _tree _after
_the _dry _run } ( 14 ) ##EQU00005##
[0087] Here w*.sub.m is the residual demand of application m after
the dry run. We should not stop the application instances whose
load is higher than or equal to w*.sub.max. If we stop these
instances, they will immediately become the applications that we
try to find a place to run. The pinning threshold for application m
is computed as follows.
w.sub.m.sup.pin=max (1, min(w*.sub.max, w.sub.m.sup.new)) (15)
[0088] Because we do not want to pin completely idle application
instances, Equation 15 stipulates that the pinning threshold
w.sub.m.sup.pin should be at least one CPU cycle per second.
[0089] It is to be appreciated that most of the computation time of
the placement algorithm is spent on solving the max-flow problem
and the min-cost max-flow problem in FIG. 5. One example of an
efficient algorithm for solving the max-flow problem is the
highest-label preflow-push algorithm (R. K. Ahuja, T. L. Magnanti,
and J. B. Orlin, editors, "Network Flows: Theory, Algorithms, and
Applications," Prentice Hall, N.J., 1993, ISBN 1000499012), whose
complexity is O(s.sup.2 t) where s is the number of nodes in the
graph, and t is the number of edges in the graph. One example of an
efficient algorithm for solving the min-cost flow problem is the
enhanced capacity scaling algorithm (also see R. K. Ahuja, T. L.
Magnanti, and J. B. Orlin, editors, "Network Flows: Theory,
Algorithms, and Applications," Prentice Hall, N.J., 1993, ISBN
1000499012), whose complexity is O((s log t)(s+t log t)). Let N
denote the number of machines, and M denote the number of
applications. Due to the high memory requirement of J2EE
applications, we assume that the number of applications that a
machine can run is bounded by a constant. Therefore, in the network
flow graph, both the number s of nodes and the number t of edges
are bounded by O (N). The total number of application instances in
the entire system is also bounded by O (N). Under these
assumptions, the complexity of the placement algorithm is
O(N.sup.2.5).
[0090] FIG. 9 illustrates a graphical user interface that may be
used to visualize the real-time behavior of the placement algorithm
executed by placement controller 202 (FIG. 2).
[0091] FIG. 10 illustrates a computing system in accordance with
which one or more components/steps of the application placement
system (e.g., components and methodologies described in the context
of FIGS. 2 through 9) may be implemented, according to an
embodiment of the present invention. It is to be understood that
the individual components/steps may be implemented on one such
computer system, or more preferably, on more than one such computer
system. In the case of an implementation on a distributed computing
system, the individual computer systems and/or devices may be
connected via a suitable network, e.g., the Internet or World Wide
Web. However, the system may be realized via private or local
networks. The invention is not limited to any particular
network.
[0092] Thus, the computing system shown in FIG. 10 may represent an
illustrative architecture for a computing system associated with
placement controller 202 (FIG. 2). For example, the computing
system in FIG. 10 may be the computing system that performs the
algorithm functions illustrated in the context of FIGS. 4-8 (as
well as any applicable steps discussed in the context of such
figures). Also, the computing system in FIG. 10 may represent the
computing architecture for each of the machines (servers) upon
which application instances are placed. Still further, placement
sensor 205, application demand estimator 206, configuration
database 207, and placement executor 209, may be implemented on one
or more such computing systems.
[0093] As shown, computing system 1000 may be implemented in
accordance with a processor 1002, a memory 1004, I/O devices 1006,
and a network interface 1008, coupled via a computer bus 1010 or
alternate connection arrangement.
[0094] It is to be appreciated that the term "processor" as used
herein is intended to include any processing device, such as, for
example, one that includes a CPU (central processing unit) and/or
other processing circuitry. It is also to be understood that the
term "processor" may refer to more than one processing device and
that various elements associated with a processing device may be
shared by other processing devices.
[0095] The term "memory" as used herein is intended to include
memory associated with a processor or CPU, such as, for example,
RAM, ROM, a fixed memory device (e.g., hard drive), a removable
memory device (e.g., diskette), flash memory, etc.
[0096] In addition, the phrase "input/output devices" or "I/O
devices" as used herein is intended to include, for example, one or
more input devices (e.g., keyboard, mouse, etc.) for entering data
to the processing unit, and/or one or more output devices (e.g.,
speaker, display, etc.) for presenting results associated with the
processing unit. The graphical user interface of FIG. 9 may be
implemented in accordance with such an output device.
[0097] Still further, the phrase "network interface" as used herein
is intended to include, for example, one or more transceivers to
permit the computing system of FIG. 10 to communicate with another
computing system via an appropriate communications protocol.
[0098] Accordingly, software components including instructions or
code for performing the methodologies described herein may be
stored in one or more of the associated memory devices (e.g., ROM,
fixed or removable memory) and, when ready to be utilized, loaded
in part or in whole (e.g., into RAM) and executed by a CPU.
[0099] Accordingly, illustrative principles of the invention
provide many advantages over existing approaches, for example:
[0100] The placement algorithm is an online algorithm that, under
multiple resource constraints, can efficiently produce high-quality
solutions for hard placement problems with thousands of machines
and thousands of applications. By "online," it is meant that the
algorithm has to solve the placement problem in a short period of
time, (e.g., seconds or minutes) because the other computers are
waiting for the decision in real time. By contrast, "offline" means
that, we can run the algorithm for hours, days, or even months to
solve the problem. That is, nobody is waiting for the result right
away. This scalability is crucial for dynamic resource provisioning
in large-scale enterprise data centers.
[0101] A load-lifting mechanism that makes later placement changes
easier. For example, it co-locates different types of residual
resources on the same machines so that they can be used to start
new application instances.
[0102] A mechanism to reduce the number of application starts and
stops by pinning application instances that already deliver a
sufficiently high load. The algorithm dynamically computes an
appropriate pinning threshold for every application through a dry
run of making placement changes.
[0103] A mechanism that does placement changes to the machines one
by one in an isolated fashion. This strategy dramatically reduces
the computation time, and also helps reduce the number of placement
changes. We further address the limitations of this isolation of
machines through multi-round optimizations.
[0104] Although illustrative embodiments of the present invention
have been described herein with reference to the accompanying
drawings, it is to be understood that the invention is not limited
to those precise embodiments, and that various other changes and
modifications may be made by one skilled in the art without
departing from the scope or spirit of the invention.
* * * * *