System Positioning Services In Data Centers ZHANG; HUI ; et al. [NEC Laboratories America, Inc.]

System Positioning Services In Data Centers

ZHANG; HUI ; et al.

Patent Application Summary

U.S. patent application number 13/112096 was filed with the patent office on 2012-06-14 for system positioning services in data centers. This patent application is currently assigned to NEC Laboratories America, Inc.. Invention is credited to Guofei Jiang, Ya-Yunn Su, Kenji Yoshihira, HUI ZHANG.

Application Number	20120151490 13/112096
Document ID	/
Family ID	46200821
Filed Date	2012-06-14

United States Patent Application	20120151490
Kind Code	A1
ZHANG; HUI ; et al.	June 14, 2012

SYSTEM POSITIONING SERVICES IN DATA CENTERS

Abstract

A system and method are disclosed for managing a data center in terms of power and performance. The system includes at least one system positioning application for managing power costs and performance costs at a data center. The at least one system positioning application may determine a status of a data center in terms of power costs and performance costs or generate configurations to automatically implement a desired target state at the data center. A system configuration compiler is configured to receive a request from the system positioning application associated with a data center management task, convert the request into a set of subtasks, and schedule execution of the subtasks to implement the data center management task.

Inventors:	ZHANG; HUI; (New Brunswick, NJ) ; Yoshihira; Kenji; (Princeton Junction, NJ) ; Su; Ya-Yunn; (Taipei, TW) ; Jiang; Guofei; (Princeton, NJ)
Assignee:	NEC Laboratories America, Inc. Princeton NJ
Family ID:	46200821
Appl. No.:	13/112096
Filed:	May 20, 2011

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
61421675	Dec 10, 2010

Current U.S. Class:	718/102
Current CPC Class:	Y02D 10/00 20180101; G06F 9/5094 20130101; Y02D 10/22 20180101
Class at Publication:	718/102
International Class:	G06F 9/46 20060101 G06F009/46

Claims

1. A system for managing a data center, comprising: a system positioning module stored on a computer readable storage medium comprising: a position reporting module configured to determine a status of a data center under specified configuration parameters; a destination searching module configured to receive a desired target state and automatically determine which configuration parameters that are to be adjusted if the desired target state is implemented at the data center; and a system configuration compiler configured to receive a request from the system positioning module associated with a data center management task, convert the request into a set of subtasks, and schedule execution of the subtasks to implement the data center management task.

2. The system as recited in claim 1, further comprising a system simulator configured to simulate resource utilization data of a server using a set of input time-series, and further configured to output a predicted status of a data center.

3. The system as recited in claim 2, wherein the predicted status of the data center indicates performance, power and operational costs at the data center under a given set of parameters associated with a target state of the data center.

4. The system as recited in claim 1, further comprising a workload generator configured to receive a specified time period and transform data during the time period using a data reshaping scheme.

5. The system as recited in claim 1, wherein the position reporting module is capable of indicating a historical, present and predicted future status of the data center.

6. The system as recited in claim 1, wherein the system positioning module further comprises an auto-piloting module configured to automatically apply configuration settings to the data center using a sensitivity-based optimization technique to control the data center in terms of power and performance.

7. The system as recited in claim 1, further comprising a user interface configured to indicate a current status and a plurality of possible future statuses of the data center in terms of power, performance and operation costs.

8. The system as recited in claim 7, wherein the plurality of possible future statuses indicate optimal configurations of the data center in terms of power, performance and operation costs while satisfying constraints imposed by the configuration parameters.

9. The system as recited in claim 1, further comprising a set of mash-up applications which perform subtasks that can be used by the system positioning application, wherein the subtasks include functions for performing at least one of feasibility zone analysis and map generation.

10. A method for managing a data center, comprising: sending a request associated with a system positioning application stored on a computer readable storage medium for at least one: determining a status of a data center under specified configuration parameters; determining configuration parameters that are to be adjusted if a desired target state is implemented at the data center; and implementing an auto-piloting service for automatically controlling the data center; converting the request into a set of subtasks; and scheduling execution of the subtasks to implement the request.

11. The method as recited in claim 10, further comprising simulating resource utilization data of a server using a set of input time-series and outputting a predicted status of the data center based on simulated data.

12. The method as recited in claim 11, wherein the predicted status of the data center indicates performance, power and operational costs at the data center under a given set of configuration parameters associated with a target state of the data center.

13. The method as recited in claim 10, further comprising receiving a specified time period and transforming data during the time period using a data reshaping scheme.

14. The method as recited in claim 10, wherein the system positioning application comprises a position reporting application which predicts the power, performance and operational costs imposed on the data center under a specified set of parameters.

15. The method as recited in claim 10, wherein the system positioning application comprises a destination searching application which automatically determines configuration parameters that are to be adjusted if the desired target state is implemented at the data center.

16. The method as recited in claim 10, wherein the system positioning application comprises an auto-piloting application which automatically applies configuration settings to the data center using a sensitivity-based optimization technique to control the data center in terms of power and performance.

17. The method as recited in claim 10, further comprising outputting a current status and predicted future status of the data center in terms of power, performance and operation costs.

18. The method as recited in claim 10, wherein the system positioning application utilizes a set of mash-up applications to implement the subtasks, and the subtasks comprise functions for performing at least one of feasibility zone analysis and map generation.

19. A computer readable storage medium comprising a computer readable program, wherein the computer readable program when executed on a computer causes the computer to perform the method recited in claim 10.

20. A system for managing a data center, comprising: a system positioning module stored on a computer readable storage medium comprising: an auto-piloting module configured to automatically apply configuration settings to a data center using a sensitivity-based optimization technique to control the data center in terms of power and performance; a system configuration compiler configured to receive a request from the system positioning module associated with a data center management task, convert the request into a set of subtasks, and schedule execution of the subtasks to implement the data center management task.

Description

RELATED APPLICATION INFORMATION

[0001] This application claims priority to provisional application Ser. No. 61/421,675 filed on Dec. 10, 2010, the entirety of which is herein incorporated by reference.

BACKGROUND

[0002] 1. Technical Field

[0003] The present invention relates to virtualized data center management, and more particularly, to a middleware architectural scheme which provides integrated power and performance management in a virtualized data center.

[0004] 2. Description of the Related Art

[0005] While there has been significant industry investment and much effort expended on improving techniques for managing data centers, prior attempts have been insufficient for a number of reasons. One of the primary pitfalls associated with prior art data center management techniques relates to the fact that a number of separate solutions have been designed in isolation. For example, while solutions may have been proposed to handle platform management optimizations (e.g., server configuration optimizations) and virtualization optimizations (e.g., optimizing virtual machine provisioning), there has been no integration among these different solutions. Consequently, prior art data center management techniques often produce redundant, or even conflicting, operational decisions. This decreases the efficiency and stability of such systems.

[0006] Other deficiencies associated with prior art data center management systems stem from the fact that these systems are not declarative in nature. Providing a data center management system with this type of capability proves difficult for a number of reasons. There has been no suitable model developed for such data center management scheme. In addition, implementing such a system requires more than merely focusing on the target state or target requirements. Rather, the system must also consider the transitional states leading up to the target state, and account for potential errors which may arise during the transitional period.

SUMMARY

[0007] In accordance with the present principles, a system is provided for managing a data center. The system includes a system positioning module stored on a computer readable storage medium. The system positioning module is comprised of a position reporting module which is configured to determine a status of a data center under specified configuration parameters, and a destination searching module configured to receive a desired target state and automatically determine configuration parameters that are to be adjusted if the desired target state is implemented at the data center. The system further comprises a system configuration compiler configured to receive a request from the system positioning module associated with a data center management task, convert the request into a set of subtasks, and schedule execution of the subtasks to implement the data center management task.

[0008] In accordance with the present principles, a method is also disclosed for managing a data center. A request associated with a system positioning application is sent. The request may be for one of determining a status of a data center under specified configuration parameters, determining configuration parameters that are to be adjusted if a desired target state is implemented at the data center, or implementing an auto-piloting service for automatically controlling the data center. The request is converted into a set of subtasks and the subtasks are scheduled for execution to implement the request.

[0009] In accordance with the present principles, another system is provided for managing a data center. The system includes a system positioning module stored on a computer readable storage medium. The system positioning module is comprised of an auto-piloting module configured to automatically apply configuration settings to a data center using a sensitivity-based optimization technique to control the data center in terms of power and performance. The system further comprises a system configuration compiler configured to receive a request from the system positioning module associated with a data center management task, convert the request into a set of subtasks, and schedule execution of the subtasks to implement the data center management task.

[0010] These and other features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.

BRIEF DESCRIPTION OF DRAWINGS

[0011] The disclosure will provide details in the following description of preferred embodiments with reference to the following figures wherein:

[0012] FIG. 1 is a block/flow diagram illustrating a data center management system in accordance with the present principles.

[0013] FIG. 2 is a block/flow diagram illustrating a more detailed view of the system configuration compiler depicted in FIG. 1.

[0014] FIG. 3 is a graphical depiction of an exemplary API-Get( ) function in accordance with the present principles.

[0015] FIG. 4 is a graphical depiction of an exemplary API-Put( ) function in accordance with the present principles.

[0016] FIG. 5 is a graphical user interface illustrating a position reporting function in accordance with the present principles.

[0017] FIG. 6 is a graphical user interface illustrating a destination searching function in accordance with the present principles.

[0018] FIG. 7 is a block/flow diagram illustrating a method for managing a data center in accordance with the present principles.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0019] In accordance with the present principles, an integrated solution is disclosed for managing the power and performance configurations of a data center. A middleware system is situated between end-users (e.g., data center operators) and a set of components for controlling the power and performance of the data center. A set of system positioning services provides reporting data on the state of the system, permits end-users to configure and control the system (e.g., by specifying system performance and power cost targets) and determines the appropriate management configurations and settings that are able to drive the system to a desired input state which may be specified by the end-user. The results of the system positioning services are presented to the end user via an interface, e.g., a global positioning system (GPS)-like user interface.

[0020] The middleware solution described herein has a layered design which may be comprised of three different layers: a first layer comprising a system configuration compiler, a second layer comprising a set of mash-up applications, and a third layer comprising a set of system positioning services. The system configuration compiler interacts with components for controlling the power and performance of a data center. In one embodiment, two primary application programming interface (API) functions are provided for positioning of the system. However, additional API functions may also be provided.

[0021] A first API function, API-Get( ) provides a report on where the system would be located (e.g., in terms of power, performance and operational cost) if certain workload and system settings were implemented at the data center. The second API function, API-Put( ) determines a set of management policies and configurations that permit the system to reach a specified input status.

[0022] The mash-up applications may represent management subtasks that can be built on top of the API functions to implement certain functions. For example, exemplary mash-up applications may utilize these the API functions to provide functionality relating to system status prediction, feasibility zone analysis, impact analysis applications, and map generation (each of which is explained in further detail below).

[0023] The results generated by the mash-up applications are then used by the system positioning services layer to provide a graphical user interface (GUI) to the end-user which allows the end-user to visualize the current and predicted positioning of the system, and to configure and control the system. An exemplary system positioning service that may be derived from the results of the mash-up functions may include a position reporting service which indicates the power, performance and operational costs imposed on the system under given parameters. Other system positioning services may include destination searching services which query the system to automatically determine the management configurations that would lead to a user-specified status point, or auto-piloting services which automatically apply optimal management configurations to the system using sensitivity based optimization techniques described in further detail below.

[0024] The layered architecture described herein for managing a virtualized data center provides declarative data center management capabilities to an end-user. It permits the end-user to specify some new requirement or desired state in a declarative manner and have the data center management system automatically modify the appropriate configuration and processes to achieve the specified state. This type of declarative data center management functionality significantly reduces the complexity associated with operating a data center, and enables faster operation for administrators by providing decision supporting information. It further allows for the enforcement of Service Level Agreements (SLAs) through performance management, and serves as an important technology component for green information technology (IT) which tends to utilize private clouds to consolidate old IT systems in enterprise data centers.

[0025] Embodiments described herein may be entirely hardware, entirely software or including both hardware and software elements. In a preferred embodiment, the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.

[0026] Embodiments may include a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. A computer-usable or computer readable medium may include any apparatus that stores, communicates, propagates, or transports the program for use by or in connection with the instruction execution system, apparatus, or device. The medium can be magnetic, optical, electronic, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. The medium may include a computer-readable storage medium such as a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk, etc.

[0027] A data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution. Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) may be coupled to the system either directly or through intervening I/O controllers.

[0028] Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.

[0029] Referring now to the drawings in which like numerals represent the same or similar elements and initially to FIG. 1, a block/flow diagram illustratively depicts a data center management system 100 in accordance with the present principles. As illustrated therein, a middleware system 110 is situated between an end-user 101 (e.g., data center administrator) and a data center 160. The middleware system 110 can estimate the status of the data center 160 under certain conditions. The middleware system can also determine a set of configuration settings that will drive the data center 160 to a specified input state, and further implement the configuration settings to control the operation of the data center 160 in terms of power and performance.

[0030] The middleware system 110 can control the data center 160 by manipulating the power management component 152 and the performance management component 153. This may include reading in and utilizing data from the configuration and monitoring database 151. This database 151 stores settings and parameters associated with the current performance and power levels of the machines running at the data center 160. For example, the database 151 can store data indicating the CPU utilization of all virtual machines running at the data center 160.

[0031] The middleware system 110 comprises a set of system positioning services 120, a set of mash-up applications 130 and a system configuration compiler 140. The system configuration compiler 140 may implement two primitive functions, e.g. API-Get( ) and API-Put( ) that may be exploited to provide system positioning services 120 to end-users 101. The API-Get( ) function receives workload and system parameters as input and determines where the system would be located in terms of power, performance and operational cost if the workload and system settings were implemented at the data center 160. The API-Put( ) function permits a desired service configuration (e.g., a desired level of service) to be specified and then determines how the system should be configured to implement the specified service configuration at the data center 160. A more detailed explanation of these two illustrative functions is provided with reference to FIGS. 3 and 4 below.

[0032] The mash-up applications 130 represent management subtasks that can be built on top of the two API functions to perform a variety of operations and to make various types of determinations. These functions are designed to ask "what-if" questions that may be used in providing system positioning services 120. A mash-up application is not limited to using the information obtained from the two API functions, but may also use the results produced by other mash-up applications. Exemplary mash-up applications 130 may perform the following functions: system status prediction, feasibility zone analysis, impact analysis applications, and map generation.

[0033] A system status prediction (SSP) mash-up application makes predictions as to the expected status of the system in different workload scenarios. In making the predictions, the application may utilize the API-Get( ) function in conjunction with simulation components (e.g., the performance management component simulator 260 and power management component simulator 270 depicted in FIG. 2) embedded in the system configuration compiler 140. In certain embodiments, the system status prediction mash-up application may also allow the creation of a multi-expert workload prediction service to end users 101.

[0034] A multi-expert workload prediction service predicts the future workload of a data center 160 utilizing opinions from multiple experts. The plurality of expert opinions can be specified through user inputs or by offering workload forecasting procedures. While the experts may give different opinions at the workload level, the SSP application enables visualization of those opinions in terms of system performance and power cost, and therefore makes the opinions more intuitive and understandable to end users 101.

[0035] A feasibility zone analysis (FZA) mash-up application may also be provided which determines an acceptable or permitted status space that the system may operate in given a workload scenario and specified SLAs. The application may return a two-dimensional (in terms of performance and power costs) square-shape area defined by four coordinate points on a graph to indicate the feasibility zone. In this case, the status points inside the square would represent the feasibility zone, while the status points outside the square would indicate status points that are not reachable, either due to a performance constraint (e.g., <5% server overload time) or a physical resource limitation (e.g., maximally 200 servers in the resource pool).

[0036] A map generation (MG) mash-up application can indicate the system position in terms of power, performance and operation costs for some configuration setting and a specified workload. More specifically, this application can utilize the API-Put( ) function and the feasibility zone application to generate a map of (m*n)-sized grid, where each grid point corresponds to the system position for some configuration setting and a specified workload. In one embodiment, the feasibility zone is partitioned equally into (m-1) ranges between the minimum and maximum power cost defined in the feasibility zone. For each power cost on a range point, the API-Put( ) function is used to get n points each having different performance and operation costs.

[0037] Even further, other mash-up applications 130 may include impact analysis (IA) applications which post-process results from the API-Put( ) function to ascertain further information for end-users. For example, an IA mash-up application can post-process the operation cost report from the API-Put function that includes the resulting Virtual Machine (VM) migrations and the involved VMs, servers, and applications running on the VMs. It can further output an analysis report indicating how those VM migrations will impact the data center(s) 160. For example, the analysis report may indicate the network traffic caused by the migrations, the service downtime of applications running in the VMs, and other related factors. Various other types of mash-up applications 130 may also be employed with the present principles.

[0038] The results generated by the various mash-up applications 130 can then be used by the system positioning services 120 to provide a graphical user interface (GUI) to the end-user 101 which allows the end-user 101 to visualize the historical, current and predicted positioning of the system, and to configure and control the system (e.g., by controlling the power and performance settings at the data center 160). The system positioning services 120 may be used in real time to actively control and manage the data center 160, or can be used as offline decision supporting tools that makes determinations or predictions using resource utilization data from a separate source comprising history data or a public data center trace.

[0039] The system positioning services 120 depicted in FIG. 1 comprise a position reporting module 121, a destination searching module and an auto-piloting module 123. However, other related positioning services may also be included.

[0040] The position reporting service 121 determines the power, performance and operational costs imposed on the system under a given set of parameters. It is similar to the tracking function in GPS devices, and is built on top of the system status prediction mash-up application described above. The position reporting service 121 may provide a visual report which indicates how the system status has changed over history, the present status of the system, and a few different possibilities as to where the system may proceed in the future. This information may be presented to the end-user 101 via a three-dimensional map reflecting the power, performance and operation costs imposed on the system (e.g., as illustrated in FIG. 5).

[0041] The destination searching function 122 queries the system to automatically determine the management configurations and settings that are to be adjusted if a specified status state is to be implemented at the data center 160. Hence, this function permits an end-user 101 to specify a desired status state of the system without having to specify the configurations and settings that are needed to reach the desired state. Upon specifying the status state, the system automatically determines an appropriate set of configurations settings that can be used to drive the system to the status state. This function can be built on top of the API-Put( ) function and utilize data ascertained from the feasibility zone and impact analysis mash-up applications.

[0042] The auto-piloting service 123 can be used when the system is operated as a run-time management engine which automatically applies optimal management configurations to the system. Specifically, the auto-piloting service 123 applies optimal configuration settings to the system at the end of each consolidation epoch using, e.g., a sensitivity based optimization technique. These optimal configuration settings may be defined in the context of the following performance/power optimization problem: min Energy(configurations) subject to Performance(configurations)<=P.sub.th, where P.sub.th is the upper bound of the performance cost (possibly specified by end users 101).

[0043] The auto-Piloting service requires a map of (m.times.n)-sized grid. As stated in [0034], each grid point g corresponds to the system status position for a configuration setting candidate (CPU.sub.tow.sup.g, CPU.sub.high.sup.g) under a specified workload. Specifically, the feasibility zone is partitioned equally by the destination searching service into (m.times.1) sub-ranges along the power cost dimension, and for each of the m power cost points, the API-Put( ) function is applied to get n points which have different performance and operation cost.

[0044] The pseudocode presented below illustrates an exemplary manner of implementing the auto-piloting service 123.

[0045] Pseudocode for Auto-Piloting Service

TABLE-US-00001 Input: (m .times. n)-grid map, each grid node g represents a configuration setting candidate (CPU.sup.g.sub.low, CPU.sup.g.sub.high); migration cost threshold t Output: management configurations (CPU.sup.*.sub.low, CPU.sup.*.sub.high) Procedure: 1. Prune all grid nodes with migration cost > t in the map 2. If no node remains, return the current configuration 3. Else, for the remaining grid nodes, calculate the cost sensitivity on each node x with the configuration (CPU.sup.x.sub.low, CPU.sup.x.sub.high) as: - Sensitivity ( x ) = | .DELTA. Performance cost .DELTA. CPU low x .DELTA. Ppower cost .DELTA. CPU low x - .DELTA. Performance cost .DELTA. CPU high x .DELTA. Ppower cost .DELTA. CPU high x | ##EQU00001## 4. Pick the grid node x with the minimal Sensitivity(x) value, return (CPU.sub.low.sup.x, CPU.sub.high.sup.x).

[0046] The above-identified variables can be defined as follows:

[0047] t: a threshold value reflecting the maximum allowable migration cost (e.g, number of VM migrations) of the data center.

[0048] CPU*.sub.low: the optimal value of the configuration parameter CPU.sub.low which solves the above performance-power optimization problem.

[0049] CPU*.sub.high: the optimal value of the configuration parameter CPU.sub.high which solves the above performance-power optimization problem.

[0050] x: indicates a particular node in the grid map (e.g., all grid nodes may be indexed 1, 2, . . . m.times.n), which corresponds to a configuration setting candidate (CPU.sub.low.sup.x, CPU.sub.high.sup.x).

.DELTA. Performance cost .DELTA. CPU low x : ##EQU00002##

[0051] indicates the value of the partial derivative of the function Performance.sub.cost with respect to the variable CPU.sub.low when CPU.sub.low=CPU.sub.llow.sup.x.

.DELTA. Performance cost .DELTA. CPU high x : ##EQU00003##

[0052] indicates the value of the partial derivative of the function Performance.sub.cost with respect to the variable CPU.sub.high when CPU.sub.high=CPU.sub.high.sup.x.

.DELTA. Ppower cost .DELTA. CPU low x : ##EQU00004##

[0053] indicates the value of the partial derivative of the function Power.sub.cost with respect to the variable CPU.sub.high when CPU.sub.high=CPU.sub.high.sup.x.

.DELTA. Ppower cost .DELTA. CPU high x : ##EQU00005##

[0054] indicates the value of the partial derivative of the function Power.sub.cost with respect to the variable CPU.sub.high when CPU.sub.high=CPU.sub.high.sup.x.

[0055] Using the above procedure, the auto-piloting service 123 can automatically control the power and performance levels that the data center 160.

[0056] As explained above, the power and performance levels at the data center 160 can be controlled by the performance and power management components 152 and 153. To determine when power or performance adjustments should be made at the data center 160, the management components may store an over-utilized machine list, which indicates all machines with that are in violation of a parameter specified by a service-level agreement (SLA), and an under-utilized machine list, which indicates all machines whose total CPU utilization is below a target lower bound (referred to herein as CPU.sub.low). These lists can be used by the management components 152 and 153 to enforce power and performance configurations at the data center 160.

[0057] Enforcing power and performance configurations at the data center 160 may include migrating virtual machines running on the over-utilized machines to machines in the under-utilized list. To prevent an unnecessary virtual machine migration due to a transient glitch, an SLA violation may be defined to occur when more than 5% of the CPU utilization readings in a previous window are higher than a load threshold (e.g., 90%). The performance management component 153 may ensure that the total CPU utilization on each physical server is below the load threshold. It can periodically read in data from the database 151 which indicates the CPU utilization of all the virtual machines running at the data center 160, and check if the total utilization on a physical host is under the SLA threshold. If the performance manager 153 detects an SLA violation, it may resolve the violation by migrating virtual machines to physical machines that are included in the under-utilized machine list. In the case that the performance manager 153 determines that the available processing power from the under-utilized machines is insufficient, it can turn on additional machines at the data center 160.

[0058] In resolving an SLA violation, the performance manager 153 iterates through the virtual machines executing on each of the over-utilized machines until all of the over-utilized machines are under the SLA threshold. After iterating through all of the over-utilized machines, the power manager component 152 can determine a VM migration plan which indicates the destination for each VM. If there is not enough CPU processing power available to accommodate the VMs on the over-utilized machines, the performance manager 153 calculates the number of additional machines that will be turned on and powers on these extra machines via a wake-on-LAN.

[0059] A goal of the power manager 152 is to ensure that the total utilization for each physical server is at least higher than some threshold to prevent too much waste. Similar to the performance manager 153, the power manager 152 periodically checks the physical hosts to find machines whose total CPU utilization is lower than a threshold. The power manager 152 also maintains a list of under-utilized machines and tries to resolve the under-utilization by consolidating VMs and powering off machines. The power manager 152 iterates through the machines in the under-utilization list starting with the least utilized machine, and finds a destination host for each VM executing on that host. This component 152 then executes the VM migration plan and powers off machines that do not having any running VMs.

[0060] The power manager 152 can utilize two particularly useful parameters: minimal machines, which indicates a minimum number of physical machines that must be turned on at all times, and maximal VMs per machine, which indicates a maximum number of VMs that can be executing at once on a physical machine. The power manager 152 ensures that a minimal number of machines are always turned on so that some machines will always be running even when there is low overall CPU demand. When consolidating VMs to save power, the power manager 152 will also ensure that the number of VMs running on a machine does not exceed a maximum number.

[0061] Referring now to FIG. 2, a block/flow diagram 200 is disclosed which illustrates a more detailed view of the system configuration compiler 140 depicted in FIG. 1. As shown therein, the mash-up applications and system positioning services 210 request the results of the API-Get( ) and API-Put( ) functions from the system configuration compiler 140 (specifically, the configuration generator 220) to implement higher level tasks. Upon receiving these requests, the system configuration compiler 140 utilizes the input parameters associated with these functions to implement the functions, and subsequently returns the results of these functions to the mash-up applications and system positioning services 210 for further processing.

[0062] The configuration generator 220 drives the configuration compiler engine. The configuration generator 220 receives the calls to the two API functions, decides how to transfer them into internal subtasks in the compiler 140, and schedules the execution of the subtasks to implement the called functions.

[0063] The workload generator 230 receives a specified time period from the configuration generator 220 and reshapes the information in the configuration and monitoring database 151 during the time period. The output of the workload generator is a set of time series for virtual machine load information. One time series for one virtual machine x is in the format of (X.sub.1, X.sub.2, . . . , X.sub.b . . . ), where X.sub.i is a load value (e.g., CPU utilization) of x at some time point. The system simulator will replay the VM load by reading the time series one point after another, in the same manner that the performance and power components 152 and 153 read the monitoring data in the production system.

[0064] A variety of different of different reshaping schemes may be used by the workload generator 230 to transform the data in the database 151. A first reshaping scheme adjusts the load of each executing virtual machine by a specified percentage. Thus, a specified percentage of 0% leads to no change in the data, while a specified percentage of +20% would increase the load on each virtual machine by 20% within the time period. The second reshaping scheme does not take any input parameters, but runs a regression-based load prediction procedure which outputs a predicted load.

[0065] The system simulator 140 is a discrete-event simulator which outputs the estimated system status on the server-level, as well as detailed configuration settings which would enable a specified input state to be achieved. To generate this information, the system simulator 240 includes data structures and logic functions to simulate per-server resource utilization information (e.g., utilization of CPU, memory, disk, I/O, etc.) which is based on the time-series input from the workload generator 230. In a preferred embodiment, the system status is simulated on the server-level. The simulation output component 250 records the output of the system simulator 240.

[0066] The system simulator 240 also includes event registers which are used to indicate whether the management component simulators (i.e., performance management component simulator 260 and power management component simulator 270) should be called upon to implement changes in performance and/or power. For example, if a server overload is detected during the system simulation process, the performance management component simulator 270 may be called upon to determine corresponding management actions that the system simulator 240 can execute to rectify the problem. Independently, a timer register could trigger the call to the power management component simulator 270 to execute periodic server consolidations. Throughout these consolidations, the server performance and behavior information is collected and analyzed, and then output as the status of the system (e.g., the average percentage of time there is server overloading at the data center 160, or the average estimated power consumption of the data center 160).

[0067] As indicated above, the performance management component simulator 260 and power management component simulator 270 adjust the power and performance levels of the simulation produced by system simulator 240, thus imitating the role that the performance management component 153 and power management component 152 play with respect to altering the actual power and performance levels at the data center 160. These components 260 and 270 encode the logic functions of the physical management components 152 and 153 based on domain knowledge, and interact with the system simulator 240 to reproduce the impact of management operations on the system status.

[0068] A more detailed description of the API-Get( ) and API-Put( ) functions will now be given with reference to FIGS. 3 and 4. As indicated above, the API-Get( ) function 300 provides a report on where the system would be located (e.g., in terms of power, performance and operational cost) if certain workload and system settings were implemented at the data center 160. Three input parameters are provided to the API-Get( ) function, with an optional fourth input parameter indicated by the dotted arrow:

[0069] (1) Time Period (t.sub.start, t.sub.end): This parameter set specifies the interested time period when the virtual machine workload will be used for system status calculation. The time period comprises a time duration in history when the system/workload monitoring information is available.

[0070] (2) Workload Reshaping Scheme: This parameter provides the option to apply different forecasting schemes on the original workload data during the time period. For example, this parameter may indicate the particular reshaping scheme which is implemented by the workload generator 230 described above.

[0071] (3) Management Configurations: This parameter set specifies the interested management policy settings. The configurations may include the CPU load control range [CPU.sub.low, CPU.sub.high]. The actual system configurations during the time period (t.sub.start, t.sub.end) may be used as default values.

[0072] (4) VM-Server Map and Resource Inventory: This is an optional parameter set which specifies the physical resource information. The VM-server map specifies the initial virtual machine hosting information, and the resource inventory includes information which indicates the available resources. The system information at time t.sub.start may be used as a set of default values if this parameter is not specified.

[0073] Based on the above-identified input parameters, the API-Get( ) function outputs the system status. The output system status parameter may describe the system location in a virtual coordinate space, similar to the manner in which a GPS device indicates a longitude/latitude coordinate space in the geographical world (e.g., as illustrated in FIG. 5). For integrated power and performance management, the system status can be defined using four metrics:

[0074] (1) Performance Cost: This metric reflects the service performance when the system is configured by the input parameters. While many cost metrics can be applied, the server overload time is the preferred cost metric in certain embodiments.

[0075] (2) Power Cost: This metric reflects the power consumption when the system is configured by the input parameters (e.g., the total power in kilowatts consumed by servers).

[0076] (3) Operation cost: This metric reflects the impact on the system due to management operations when the system is configured by the input parameters. In one embodiment, the number of VM migrations may be used to represent the operation cost. In addition to operation cost, detailed operation actions may also be returned indicating the scheduling of virtual machine migrations and server on-off activities.

[0077] (4) Stability: This metric reflects the effectiveness of the management configurations from another angle. In a preferred embodiment, the operation stability is defined as the percentage of time that the server workload falls within the CPU load control range [CPU.sub.low, CPU.sub.high] given the specified input parameters. High operation stability implies efficient workload distribution and low operation cost as few management events are triggered.

[0078] Moving on FIG. 4, the second API function 400, i.e., API-Put( ) is illustratively depicted. As explained earlier, the API-Put( ) function determines a set of management policies and configuration settings that will permit the system to reach a specified input status. Many of the same parameters that were utilized in implementing the API-Get( ) function are also used in implementing the API-Put( ) function. The above descriptions of these parameters remain the same for the API-Put( ) function, and therefore will not be reproduced in describing the API-Put( ) function.

[0079] The API-Put( ) function receives four input parameters, with an optional fifth input parameter indicated by the dotted line. Like the API-Get( ) function, the API-Put( ) function receives input parameters indicating a time period (t.sub.start, t.sub.end), a workload reshaping scheme, and possibly a VM-sever map and/or resource inventory. This function also receives two other inputs:

[0080] (1) Performance and Power Targets: These parameters indicate the system performance and power cost targets for which the API-Put( ) function generates configuration settings to achieve. In other words, these targets represent the specified status state that the API-Put( ) function attempts to drive the system toward by generating the appropriate settings.

[0081] (2) Error Tolerance (.epsilon.&{acute over (.alpha.)}): This parameter specifies the allowable errors that may occur when the API-Put( ) function is searching for the destination status state.

[0082] Using the above input parameters, the API-Put( ) function generates two separate outputs. The first output is the management configurations and settings (described above) that will drive the system to the specified system status. The second output is the operation costs and actions (described above) reflecting the impact on the system due to management operations when the system is configured by the input parameters.

[0083] The configuration generator 220 in the system configuration compiler 140 includes an efficient destination searching procedure for the finding the solution of API-Put( ) calls. The pseudocode presented illustrates this procedure:

[0084] Pseudocode for Destination Searching Procedure

TABLE-US-00002 Input: (t.sub.start, t.sub.end), workload reshaping scheme, Performance.sub.target, Power.sub.target, error tolerance (.epsilon.&{acute over (.alpha.)}), [VM-server map, resource inventory] Output: management configurations (CPU.sub.low, CPU.sub.high), operation cost and actions Procedure: 1. Assign CPU.sub.low = CPU.sub.max, CPU.sub.high = CPU.sub.max; 2. (Performance.sub.cost, Power.sub.cost) = get_position(CPU.sub.low, CPU.sub.high, t.sub.start, t.sub.end, workload reshaping scheme, [VM-server map, resource inventory]); 3. If Power.sub.cost > Power.sub.target, then a subset of servers has been turned off forcedly to meet Power.sub.target; (a) (VM-server map, resource inventory) = forced_down(Power.sub.target, t.sub.start, [VM-server map, resource inventory]); (b) Power.sub.cost = Power.sub.target, CPU.sub.low = CPU.sub.max; 4. Else; //start binary searching for CPU.sub.low (a) CPU.sub.left = CPU.sub.min, CPU.sub.temp = CPU.sub.low, CPU.sub.right = CPU.sub.low; (b) while (CPU.sub.left < CPU.sub.right) (c) CPU.sub.temp = (CPU.sub.left + CPU.sub.right) / 2; (d) (Performance.sub.cost, Power.sub.cost) = get_position(CPU.sub.temp, CPU.sub.high, t.sub.start, t.sub.end, workload reshaping scheme, [VM-server map, resource inventory]); (e) If (Power.sub.cost > Power.sub.target + (.epsilon./2) (f) CPU.sub.left = CPU.sub.temp +1; CPU.sub.low = CPU.sub.temp; (g) Else if (Power.sub.cost > Power.sub.target - (.epsilon./2) (h) CPU.sub.right, = CPU.sub.temp - 1; CPU.sub.low = CPU.sub.temp; (i) Else; //find the configuration CPU.sub.low (j) CPU.sub.low = CPU.sub.temp; break; 5. CPU.sub.left = CPU.sub.low, CPU.sub.temp = CPU.sub.high, CPU.sub.right = CPU.sub.high; (a) while (CPU.sub.left < CPU.sub.right) //start binary searching for CPU.sub.high (b) (Performance.sub.cost, Power.sub.cost) = get_position(CPU.sub.low, CPU.sub.temp, t.sub.start, t.sub.end, workload reshaping scheme, [VM-server map, resource inventory]); (c) If (Performance.sub.cost < Performance.sub.target - ({acute over (.alpha.)} / 2)) (d) CPU.sub.left = CPU.sub.temp + 1; (e) CPU.sub.temp = (CPU.sub.left + CPU.sub.right) / 2; CPU.sub.high = CPU.sub.temp; (f) Else if (Performance.sub.cost > Performance.sub.target + ({acute over (.alpha.)} / 2)) (g) CPU.sub.right = CPU.sub.temp - 1; CPU.sub.high = CPU.sub.temp; (h) Else; //find the configuration for CPU.sub.high (i) CPU.sub.high = CPU.sub.temp; break; 6. Return (CPU.sub.low, CPU.sub.high), and the corresponding operation cost & actions.

[0085] The steps include a procedure for finding CPU.sub.low that leads to the desired Power.sub.target, and another procedure for finding CPU.sub.high that leads to the desired Performance.sub.target. Note the (Performance.sub.target, Power.sub.target) specification might not be a feasible status. In this case, this input is first filtered out using the feasibility zone information, and the status point closest to the input target is returned if it passes the filtering process. These details are not included in the pseudocode example provided above for purposes of reducing complexity.

[0086] The above-identified variables which have not be defined yet, may be defined as follows:

[0087] CPU.sub.max: indicates the maximum CPU utilization that is permissible under the constraints imposed by the inputs listed above.

[0088] CPU.sub.min: indicates the minimum CPU utilization that is permissible under the constraints imposed by the inputs listed above.

[0089] Performance.sub.target: indicates the performance level which the destination searching procedure is trying to drive the data center towards.

[0090] Power.sub.target: indicates the power level which the destination searching procedure is trying to drive the data center towards.

[0091] CPU.sub.left: a temporary variable used in the searching procedure.

[0092] CPU.sub.right: a temporary variable used in the searching procedure.

[0093] CPU.sub.temp: a temporary variable used in the searching procedure.

[0094] .epsilon.: indicates the acceptable level of error for power costs.

[0095] {acute over (.alpha.)}: indicates the acceptable level of error for performance costs.

[0096] To understand the correctness and convergence of the searching procedure, consider the three properties described below and Theorem 1.

[0097] Property 1: In a homogeneous system, Performance.sub.cost(CPU.sub.high) is a non-decreasing function of CPU.sub.high.

[0098] This property explains that in a homogeneous system, the performance cost typically increases along with higher CPU.sub.high given the same workload. Hence, the lower the value is of CPU.sub.high, the more load balanced the system will be, thus leading to lower performance violations.

[0099] Property 2: In a homogeneous system, Power.sub.cost(CPU.sub.low) is a non-increasing function of CPU.sub.low.

[0100] This property explains that in a homogeneous system, the power cost typically decreases along with higher CPU.sub.low given the same workload. Hence, the higher the value is of CPU.sub.low, the more aggressive the load consolidation in the system will be, thus leading to less power consumption.

[0101] Property 3: In a homogeneous system, Operation.sub.cost(CPU.sub.high-CPU.sub.low) is a non-increasing function of (CPU.sub.high-CPU.sub.low) for a fixed CPU.sub.high or CPU.sub.low.

[0102] This property explains that in a homogeneous system, the operation cost typically decreases along with wider load control range (CPU.sub.high-CPU.sub.low) given the same workload. Hence, the more tolerable the system is to load variation, the less frequent the operation management system will have to take effect.

[0103] Theorem 1: The destination searching procedure finds the status destination in O(logR) steps, where R=CPU.sub.max-CPU.sub.min is the allowable load control range.

[0104] Note the algorithm starts with the maximal CPU.sub.low and CPU.sub.high, and searches in the order of the target CPU*low followed by CPU*high due to the constraint CPU.sub.low.ltoreq.CPU.sub.high.

[0105] FIGS. 5 and 6 illustrate graphical user interfaces that can be provided to an end-user 101 in accordance with the present principles.

[0106] FIG. 5 illustrates a graphical user interface 500 for use with the position reporting function 121. The vertical bars 510, 520, 530, 540, 550 and 560 represent the status of the system at different points in time. The position of the vertical bars with respect to the axes represents the power and performance costs, while the height of the vertical bars represents the operational costs.

[0107] Vertical bar 530 represents the current status of the system, while vertical bars 510 and 520 represent previous states of the system derived from historical data (e.g., which may be stored in the configuration and monitoring database 151 described above). The solid arrows indicate how the system transitioned from previous state 510 to the current state 530.

[0108] As explained above, the position reporting function 121 can predict the status of the system under a given set of parameters. Since there may be a plurality of system states which satisfy the specified input parameters, the graphical user interface 500 may display multiple predicted futures states for the system. Vertical bars 540, 550 and 560 represent three predicted future states of the system. The non-solid arrows indicate the transition from the current state to each of the future states. In preferred embodiments, the three future states displayed on the graphical user interface represent predictions on how the system can be optimized in terms of power, performance and operation costs while still satisfying the constraints imposed by the input parameters.

[0109] FIG. 6 illustrates a graphical user interface 600 for use with the destination searching function 122. A target system status may be a system state with a different power consumption and/or performance violation which the end-user 101 would like the system to reach. However, the end-user 101 may not know exactly how to configure the system to reach this state. Thus, the graphical user interface can accept input information describing the target system state and forward this information to the system configuration compiler described above with reference to FIG. 2. Upon receiving the input, the system configuration compiler searches the configuration space to find possible configuration parameters which satisfy the target system state.

[0110] Since there may be multiple configuration parameters that satisfy the target system status, the GUI displays the possible choices of them. The square labeled 610 represents the feasible configuration parameters that can achieve the target system state. All status points which fall with this square 610 satisfy the target system state, while all status points outside the square 610 represent non-feasible states.

[0111] Vertical bars 620 and 630 represent two particularly notable status points which satisfy the target system status. Vertical bar 620 represents the system status which is minimized in terms of performance cost, while vertical bar 630 represents the status point which is minimized in terms of operational cost. Other vertical bars may be provided (e.g., a vertical bar indicates how the system can be configured to minimize power costs). Each choice includes information regarding the details of migration steps from the current state that are provided by the system simulator engine.

[0112] In the case when a target system status cannot be reached, the GUI shows the closest state the system can reach and the corresponding configuration parameters that are associated with the state.

[0113] Moving on to FIG. 7, a block/flow diagram illustrates a method for managing a data center in accordance with the present principles. The begins at the start block and proceeds to block 710 where a request is issued from a system positioning application (e.g., a position reporting application 121, a destination searching application 122 or an auto-piloting application 123) to implement a data center management task. The request may be issued by the system positioning application indirectly via the mash-up application layer 130. Exemplary data center management tasks may include determining the status of a data center 160 in terms of power, performance and operational costs given specified input parameters as described above, or may including generating a set of management policies that can drive the data center 160 to a particular state in terms of these costs.

[0114] Next, the request is received by a configuration generator in block 720, and converted into a plurality of subtasks (block 730). The conversion of the request into the subtasks may involve generating or determining the set of subtasks that can collectively be combined to provide a data center management task to an end-user 101. The subtasks may represent the functions performed by the components of the system configuration compiler 140 described above with reference to FIG. 2. For example, the subtasks may represent the data reshaping operations implemented by the workload generator 230 to transform the data in the configuration and monitoring database 151, or the server simulation operations performed by system simulator 240 for estimating the system status.

[0115] The set of tasks generated in block 730 are scheduled for execution in block 740. The execution of the tasks will provide the data center management task to the end user 101, and the results may be displayed to the end-user 101 via a graphical user interface (e.g., using the graphical user interfaces depicted in FIGS. 5 and 6) in block 750. The results may also be applied to implement changes at the data center 160 by applying the results on the configurations and settings of the performance and power components 152 and 153.

[0116] Having described preferred embodiments of a system and method for providing system positioning services in a data center (which are intended to be illustrative and not limiting), it is noted that modifications and variations can be made by persons skilled in the art in light of the above teachings. It is therefore to be understood that changes may be made in the particular embodiments disclosed which are within the scope of the invention as outlined by the appended claims. Having thus described aspects of the invention, with the details and particularity required by the patent laws, what is claimed and desired protected by Letters Patent is set forth in the appended claims.

* * * * *