U.S. patent application number 13/645112 was filed with the patent office on 2013-04-04 for service level agreement-aware migration for multitenant database platforms.
This patent application is currently assigned to NEC Laboratories America, Inc.. The applicant listed for this patent is NEC Laboratories America, Inc.. Invention is credited to Sean Barker, Yun Chi, Vahit Hakan Hacigumus, Hyun Jin Moon.
Application Number | 20130085742 13/645112 |
Document ID | / |
Family ID | 47993405 |
Filed Date | 2013-04-04 |
United States Patent
Application |
20130085742 |
Kind Code |
A1 |
Barker; Sean ; et
al. |
April 4, 2013 |
SERVICE LEVEL AGREEMENT-AWARE MIGRATION FOR MULTITENANT DATABASE
PLATFORMS
Abstract
A method for migration from a multitenant database is shown that
includes building an analytical model for each of a set of
migration methods based on database characteristics; predicting
performance of the set of migration methods using the respective
analytical model with respect to tenant service level agreements
(SLAs) and current and predicted tenant workloads, where the
prediction includes a migration speed and an SLA violation
severity; and selecting a best migration method from the set of
migration methods according to the respective predicted migration
speeds and SLA violation severities.
Inventors: |
Barker; Sean; (Sunderland,
MA) ; Chi; Yun; (Monte Sereno, CA) ; Moon;
Hyun Jin; (Newark, CA) ; Hacigumus; Vahit Hakan;
(San Jose, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
NEC Laboratories America, Inc.; |
Princeton |
NJ |
US |
|
|
Assignee: |
NEC Laboratories America,
Inc.
Princeton
NJ
|
Family ID: |
47993405 |
Appl. No.: |
13/645112 |
Filed: |
October 4, 2012 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61542994 |
Oct 4, 2011 |
|
|
|
61543012 |
Oct 4, 2011 |
|
|
|
Current U.S.
Class: |
703/22 |
Current CPC
Class: |
G06F 9/5088 20130101;
G06F 16/214 20190101; G06F 9/455 20130101 |
Class at
Publication: |
703/22 |
International
Class: |
G06F 9/455 20060101
G06F009/455 |
Claims
1. A method for migration from a multitenant database, comprising:
building an analytical model for each of a set of migration methods
based on database characteristics; predicting performance of the
set of migration methods using the respective analytical model with
respect to tenant service level agreements (SLAs) and current and
predicted tenant workloads, wherein said prediction includes a
migration speed and an SLA violation severity; and selecting a best
migration method from the set of migration methods according to the
respective predicted migration speeds and SLA violation
severities.
2. The method of claim 1, wherein the set of migration methods
comprises live database migration.
3. The method of claim 1, further comprising updating the
analytical models using historical data.
4. The method of claim 1, wherein a migration method having a
lowest SLA violation severity is selected.
5. The method of claim 4, wherein the migration method having a
highest predicted migration speed is selected if the migration
methods have equal predicted SLA violation severities.
6. The method of claim 1, further comprising selecting a set of
optimal parameters for the selected migration method based on the
selected method's predicted performance and current and predicted
tenant workloads.
7. The method of claim 1, further comprising selecting one or more
tenants to migrate based on tenant resource usage.
8. The method of claim 1, further comprising updating the
analytical models with machine learning.
9. A multitenant database system, comprising: a processor
configured to build an analytical model for each of a set of
migration methods based on database characteristics, to predict
performance of the set of migration methods using the respective
analytical model with respect to tenant service level agreements
(SLAs) and current and predicted tenant workloads, wherein said
prediction includes a migration speed and an SLA violation
severity, and to select a best migration method from the set of
migration methods according to the respective predicted migration
speeds and SLA violation severities.
Description
RELATED APPLICATION INFORMATION
[0001] This application claims priority to provisional application
Ser. No. 61/542,994 filed on Oct. 4, 2011, incorporated herein by
reference. This application further claims priority to provisional
application Ser. No. 61/543,012 filed on Oct. 4, 2011, incorporated
herein by reference.
[0002] This application is related to application serial no. TBD,
Attorney Docket Number 11047B (449-254) entitled "LATENCY-AWARE
LIVE MIGRATION FOR MULTITENANT DATABASE PLATFORMS," filed
concurrently herewith and incorporated herein by reference.
BACKGROUND
[0003] 1. Technical Field
[0004] The present invention relates to database migration and, in
particular, to the migration of multitenant database platforms in
such a way as to preserve tenant service level agreements.
[0005] 2. Description of the Related Art
[0006] Modern cloud platforms are designed with the aim of
compactly servicing many users on a large number of machines. To
increase resource utilization in the presence of smaller customers,
providers employ multitenancy, in which multiple users and/or
applications are collocated on a single server. Ideally, each
tenant on a multitenant server is both unaware of and unaffected by
other tenants operating on the machine and is afforded the same
level of performance it would receive on a dedicated server.
[0007] To maximize profits, providers wish to maximize the number
of tenants on each server. Tenants wish to be guaranteed a certain
level of performance, however, as specified by service level
agreements (SLAs). An SLA may specify metrics of guaranteed
service, such as system uptime and query latency. A provider
balances these competing goals within the resources of a given
multitenant server.
[0008] If a tenant's resource demands exceed the free capacity on a
given server, other tenants on the server may be negatively
impacted, causing SLA violations. Database migration may be used to
relocate one or more tenants to an alternate machine, freeing
resources on the crowded server. Database migration may further be
used to consolidate tenants on a server with free resources,
potentially freeing servers for other purposes or allowing said
servers to be shut down. Migration incurs its own costs, however.
There is a direct cost of copying the tenant's data to another
machine, as well as penalties due to SLA violations during system
downtime and human-related costs.
[0009] Most existing database systems provide tools for data
export/migration in a "stop and copy" fashion. Such a solution is
not practical for migrating large amounts of data, where large
downtimes will be incurred. Existing live migration solutions for
database systems fail to take into account the costs that such a
migration may cause to a provider.
SUMMARY
[0010] A method for migration from a multitenant database is shown
that includes building an analytical model for each of a set of
migration methods based on database characteristics; predicting
performance of the set of migration methods using the respective
analytical model with respect to tenant service level agreements
(SLAs) and current and predicted tenant workloads, wherein said
prediction includes a migration speed and an SLA violation
severity; and selecting a best migration method from the set of
migration methods according to the respective predicted migration
speeds and SLA violation severities.
[0011] A multitenant database system is shown that includes a
processor configured to build an analytical model for each of a set
of migration methods based on database characteristics, to predict
performance of the set of migration methods using the respective
analytical model with respect to tenant service level agreements
(SLAs) and current and predicted tenant workloads, wherein said
prediction includes a migration speed and an SLA violation
severity, and to select a best migration method from the set of
migration methods according to the respective predicted migration
speeds and SLA violation severities.
[0012] These and other features and advantages will become apparent
from the following detailed description of illustrative embodiments
thereof, which is to be read in connection with the accompanying
drawings.
BRIEF DESCRIPTION OF DRAWINGS
[0013] The disclosure will provide details in the following
description of preferred embodiments with reference to the
following figures wherein:
[0014] FIG. 1 is a block/flow diagram for migrating a database in a
multitenant system according to the present principles.
[0015] FIG. 2 is a block/flow diagram for predicting migration
performance according to the present principles.
[0016] FIG. 3 is a block/flow diagram for migrating a database in a
multitenant system according to the present principles.
[0017] FIG. 4 is a block/flow diagram for throttling migration
speed according to the present principles.
[0018] FIG. 5 is a block/flow diagram for throttling migration
speed according to the present principles.
[0019] FIG. 6 is a diagram of a migration of databases between
multitenant systems according to the present principles.
[0020] FIG. 7 is a diagram of a proportional-integral-derivative
controller according to the present principles.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0021] A practical migration solution for multitenant database
systems according to the present principles includes a minimum of
downtime, controlled tenant interference, and automatic management.
To achieve this, the present principles employ backup tools to
perform a zero-downtime live backup of a database. Further more,
"migration slack" is employed, referring to resources which can be
used for migration without seriously impacting existing workloads.
By taking query latency into account and monitoring system
performance in real-time, system performance can be guaranteed
according to tenant service level agreements (SLAs) even during the
live migration.
[0022] The present principles achieve live migration by using
existing hot backup functionality present in database systems.
Because most contemporary database systems have such hot backup
functions, the present principles may be employed without changes
to existing database engines and operating systems. Within the
bounds of SLA guarantees, the present principles use
control-theory-based strategies to automatically throttle resource
usage in the migration process to achieve the fastest possible
migration that does not interfere with service level
guarantees.
[0023] Referring now in detail to the figures in which like
numerals represent the same or similar elements and initially to
FIG. 1, a high-level method for database migration is shown. At
block 102, information regarding client SLAs, database
characteristics, current workloads, and predicted workloads are
entered as input. This information characterizes the database
system as it will appear during migration, such that block 104 can
predict how a migration will perform according to each of a set of
migration methods. As an example, the set of migration methods may
include a "stop and copy" method, while another method in the set
may include a live database migration as described herein. Block
106 chooses a best method by selecting a method that will perform
the migration in the shortest time without generating any SLA
violations. If all migration methods tested in block 104 will
generate SLA violations, then block 106 selects the migration
method which generates the fewest violations. Block 106 further
selects optimal migration parameters that correspond to the
selected migration method. The migration method and parameters are
employed in block 108 to migrate, e.g., a tenant from a first
database system to a second database system.
[0024] Embodiments may include a computer program product
accessible from a computer-usable or computer-readable medium
providing program code for use by or in connection with a computer
or any instruction execution system. A computer-usable or computer
readable medium may include any apparatus that stores,
communicates, propagates, or transports the program for use by or
in connection with the instruction execution system, apparatus, or
device. The medium can be magnetic, optical, electronic,
electromagnetic, infrared, or semiconductor system (or apparatus or
device) or a propagation medium. The medium may include a
computer-readable storage medium such as a semiconductor or solid
state memory, magnetic tape, a removable computer diskette, a
random access memory (RAM), a read-only memory (ROM), a rigid
magnetic disk and an optical disk, etc.
[0025] A data processing system suitable for storing and/or
executing program code may include at least one processor coupled
directly or indirectly to memory elements through a system bus. The
memory elements can include local memory employed during actual
execution of the program code, bulk storage, and cache memories
which provide temporary storage of at least some program code to
reduce the number of times code is retrieved from bulk storage
during execution. Input/output or I/O devices (including but not
limited to keyboards, displays, pointing devices, etc.) may be
coupled to the system either directly or through intervening I/O
controllers.
[0026] Network adapters may also be coupled to the system to enable
the data processing system to become coupled to other data
processing systems or remote printers or storage devices through
intervening private or public networks. Modems, cable modem and
Ethernet cards are just a few of the currently available types of
network adapters.
[0027] Embodiments described herein may be entirely hardware,
entirely software or including both hardware and software elements.
In a preferred embodiment, the present invention is implemented in
software, which includes but is not limited to firmware, resident
software, microcode, etc.
[0028] Referring now to FIG. 2, a method for modeling migration
methods is shown. Block 202 builds an analytical model for each
migration method to represent the resources consumed by that model.
In the case of stop-and-copy, the analytical model may simply
represent an amount of downtime proportional to the volume of data
to be copied, using the characteristics of the database system to
determine a maximum transfer rate. In the case of live migration,
the analytical model may include a measurement of "migration
slack," resources that may be used for migration without impacting
SLA guarantees.
[0029] As an example of determining migration slack, consider a
system where the resources available in a system R must reliably
exceed the combined needs of each of the n tenant, such that
R.gtoreq..SIGMA..sub.i=1.sup.nT.sub.i. If this relation does not
hold, then the server becomes overloaded and begins incurring SLA
violations. Migrating a tenant also consumes some
resources--usually in disk input/output for reading or writing
data, but also including processing overhead, network throughput,
etc. If the server is handling m migrations, the resources needed
to maintain SLA guarantees becomes
R.gtoreq..SIGMA..sub.i=1.sup.nT.sub.i+.SIGMA..sub.j=1.sup.mM.sub.j.
Given the constant resource allocation R and a set of workloads T,
it is now possible to allocate slack resources S without incurring
violations, where S=R-.SIGMA..sub.i=1.sup.nT.sub.i. While R remains
fixed, the workloads T may change over time, such that the
migration workloads M should be adjustable in real-time. Thus, it
is often best to allocate resources below the slack value S to
provide stability in tenant workload processing.
[0030] Block 204 uses machine learning and historic data to update
the analytical models. Each model is then employed in block 206 to
predict migration performance for each of the methods based on
current data and predicted future workloads. This predicted
performance includes the cost of SLA violations as well as
migration costs such as resource usage and migration duration.
[0031] The machine learning (ML) models are used to predict the
overhead cost of each migration method. For example, if the
stop-and-copy method is chosen, then the ML model will look at the
size of the files that need to be transferred (migrated) to the
other server, the current network bandwidth, the IOPS (IO per
second) that can be used for the transfer, etc. The output of the
ML model is a prediction on the overhead cost of migration, such as
the duration of the migration (e.g., 10 minutes) or the number of
queries that are dropped during the migration (because during
stop-and-copy migration, the server will be shut down).
[0032] As another example, if live migration is used, then the ML
model may look at the same (or somewhat different) set of
characteristics of the system, make a prediction on the migration
overhead, e.g., in terms of the duration of the migration and the
impact of the migration process on the query latency among all
tenants on the current server. Then, based on the predictions, an
appropriate (lower cost) migration method is chosen.
[0033] ML methods have two stages: a training stage and a testing
stage. During the training stage, which is usually done offline,
historic data are used to learn a prediction model, such as the
linear regression model. During the testing stage, which is usually
done online, the prediction is made based on realtime data and the
model trained offline. According to the present principles, the
predictive models are constantly updated by incorporating real-time
data into historic data and by repeating the training stage of
machine learning methods in real time. In other words, the ML model
is updated (trained again) whenever new data is available. Such
real-time updates can improve the model over time.
[0034] Referring now to FIG. 3, a method for live migration is
provided that allows maintenance of SLA guarantees. Block 302
designates the database or databases to be migrated and selects a
target server for the migration(s). Block 304 starts a hot backup
of the databases to be migrated, creating a snapshot of each
database. Each snapshot is transferred to its respective target
server, and block 306 creates a new database from each snapshot at
its respective target server. Block 308 transfers the query log
that accumulated during the hot backup to the target server and
replays the query log to synchronize the new database with the
state of the old database. Block 310 then starts servicing queries
at the new database and stops queries at the old database. At this
point the old database may be purged, freeing resources at the
originating server.
[0035] Referring now to FIG. 4, a method for allocating resources
to migration processes M.sub.j is shown. Block 402 derives an
acceptable performance level based on existing SLAs. Based on this
acceptable performance level, block 404 calculates slack resources.
This may be performed using the formulas for S shown above. Block
406 allocates slack resources to migration processes M.sub.j. Block
408 monitors the system during migration to track changes in tenant
workloads T.sub.i. Block 410 adjusts the slack resources
accordingly--as tenant workloads increase, slack resources will
decrease and vice versa. Processing loops back to block 406 to
allocate an appropriate number of resources to the migration
processes and continues to loop in this manner until the migration
has completed.
[0036] Referring now to FIG. 5, a method for controlling the
migration process is shown, allowing for adaptive throttling of the
migration processes in response to changing resource availability.
Block 502 determines available slack resources that may be employed
in migration, as described above. Block 504 uses a
proportional-integral-derivative (PID) controller to determine a
speed of migration based on system performance. The PID controller
is used to adjust system resource consumption of the migration
process in block 506 by throttling disk input/output (I/O) and
network bandwidth. As slack resources become available, additional
resources are allocated to migration to speed the process. As
tenant workloads increase and fewer resources are available,
resources are taken away from the migration process to preserve SLA
guarantees. Examples of throttling embodiments may include using
the Linux "pv" utility to limit the amount of data passing through
a given pipe. This effectively limits both CPU as well as I/O (both
disk and network), because the backup process will only process the
database as quickly as the "pv" utility will allow.
[0037] A PID controller is a tool in control theory for driving a
system actuator such that the system stabilizes around a particular
setpoint. At a high level, a PID controller operates as a
continuous feedback loop that, at each timestep, adjusts the output
variable (the actuator) such that a dependent variable (the process
variable) converges toward a desired value (the setpoint
value).
[0038] Referring now to FIG. 6, a system of database servers is
shown that includes an originating server 602 and a target server
604. Each server includes a processor 606, a memory storage unit
608 (including one or more of random access memory and non-volatile
storage), a multitenant database system 610, a PID controller 612,
and a system monitor 614. The multitenant database system 610 at
the originating server 602 includes one or more tenant databases
that are to be migrated to the target server 604. System monitors
614 tracks the resources being used at each server 602/604,
providing PID controller 612 and processor 606 with information
regarding processor usage, memory usage, and network usage.
[0039] Referring now to FIG. 7, greater detail on the PID
controller 612 is provided. At a high level, a PID controller 612
operates as a continuous feedback loop that, at each timestep,
adjusts the output such that a dependent variable (the process
variable 716) converges toward a desired value (the set point value
702). At each timestep, the current process variable 716 is
compared to the desired setpoint 702. The new output of the
controller 612 is determined by three component paths of the error
706, defined as the degree to which the process variable 716
differs from the setpoint 702 at comparator 704. The three paths
are the proportional path 708, the integral path 710, and the
derivative path 712. Each path is scaled by coefficients K.sub.p,
K.sub.i, and K.sub.d respectively. In general terms, the
proportional path 708 uses the current error 706, the integral path
710 uses past errors, and the derivative path 712 predicts future
error. The outputs of the paths are added at summer 714. The output
at time t with error e(t) is given by the formula:
output ( t ) = K p e ( t ) + K i .intg. 0 t e ( .tau. ) .tau. + K d
e ( t ) t . ##EQU00001##
[0040] The PID controller 612 is in charge of determining the
proper migration speed, so the output variable is used as the
throttling speed adjustment (either speeding up or slowing down the
migration). For the process variable and setpoint, one option is to
target available slack using a setpoint of zero unused slack.
However, to save on computation, current transaction latency may be
used as an indicator of available slack. As migration speed
increases, the total server load also increases. As long as the
migration remains under the available slack, average transaction
latency will increase only modestly as migration speed increases.
However, as the slack runs out, latency will begin to display more
significant increases. Thus, to ensure that slack is used without
exceeding available resources, the PID controller 612 may be
configured to target a given average transaction latency. The
process variable 716 may therefore be set to the current average
transaction latency and the setpoint 702 may be set to a target
latency. This setpoint 702 represents an efficient use of available
slack while still maintaining acceptable query performance. The
three parameters of the different pats, K.sub.p, K.sub.i, and
K.sub.d, may be tuned manually. A small K.sub.i and a large K.sub.d
are used to set a reaction speed to changes in migration speed to
prevent overshooting the optimal speed, allowing the overall
latency to stabilize.
[0041] Having described preferred embodiments of a system and
method for SLA-aware migration for multitenant database platforms
(which are intended to be illustrative and not limiting), it is
noted that modifications and variations can be made by persons
skilled in the art in light of the above teachings. It is therefore
to be understood that changes may be made in the particular
embodiments disclosed which are within the scope of the invention
as outlined by the appended claims. Having thus described aspects
of the invention, with the details and particularity required by
the patent laws, what is claimed and desired protected by Letters
Patent is set forth in the appended claims.
* * * * *