Database Redistribution Utilizing Virtual Partitions Jain; Saurabh ; et al. [IBM CORPORATION]

Database Redistribution Utilizing Virtual Partitions

Jain; Saurabh ; et al.

Patent Application Summary

U.S. patent application number 13/595566 was filed with the patent office on 2012-12-20 for database redistribution utilizing virtual partitions. This patent application is currently assigned to IBM CORPORATION. Invention is credited to Saurabh Jain, Neeraj S. Sharma.

Application Number	20120323852 13/595566
Document ID	/
Family ID	46163197
Filed Date	2012-12-20

United States Patent Application	20120323852
Kind Code	A1
Jain; Saurabh ; et al.	December 20, 2012

DATABASE REDISTRIBUTION UTILIZING VIRTUAL PARTITIONS

Abstract

In some embodiments, a partitioned database is stored in a plurality of logical or physical partitions on at least a logical or physical first data storage node, and a subset of a first partition among the plurality of logical partitions is configured as a virtual partition. An input indicating an allocation of a second physical data storage node to store the partitioned database is received. A second partition is configured on the second data storage node. In response to the input, the partitioned database is redistributed over the first and second data storage nodes by moving data within the virtual partition on the first partition to the second partition.

Inventors:	Jain; Saurabh; (Bangalore, IN) ; Sharma; Neeraj S.; (New Delhi, IN)
Assignee:	IBM CORPORATION Armonk NY
Family ID:	46163197
Appl. No.:	13/595566
Filed:	August 27, 2012

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
12961544	Dec 7, 2010
13595566

Current U.S. Class:	707/640 ; 707/609; 707/E17.005
Current CPC Class:	G06F 16/22 20190101; G06F 16/278 20190101
Class at Publication:	707/640 ; 707/609; 707/E17.005
International Class:	G06F 17/30 20060101 G06F017/30; G06F 12/16 20060101 G06F012/16

Claims

1. A method of data processing, comprising: storing a partitioned database in a plurality of partitions on at least a first data storage node of a data processing system; configuring a subset of a first partition among the plurality of partitions as a virtual partition; receiving an input indicating an allocation of a second data storage node to store the partitioned database; configuring a second partition of the partitioned database on the second data storage node; and in response to the input, redistributing the partitioned database over the first and second data storage nodes by moving data within the virtual partition on the first partition to the second partition.

2. The method of claim 1, wherein the virtual partition comprises one or more data blocks of the first partition.

3. The method of claim 2, wherein the virtual partition includes only data within the first partition.

4. The method of claim 1, wherein: the method includes establishing a partition configuration data structure associating the virtual partition and the first partition; and the redistributing includes updating the partition configuration data structure to indicate that the data moved to the second partition does not reside in a virtual partition.

5. The method of claim 1, wherein: the method includes establishing a partition map mapping data within the partitioned database to particular ones of the plurality of partitions and the virtual partition; and the redistributing includes updating the partition map to indicate that data moved to the second partition is not mapped to a virtual partition.

6. The method of claim 1, wherein the redistributing includes: creating a backup of the data within the virtual partition; restoring the data from the backup to the second partition.

7. The method of claim 1, wherein: the data within the virtual partition is first data; prior to the redistributing, the first partition stores the first data and second data that is not within the virtual partition; and creating the backup comprises creating a backup including the first data and excluding the second data.

Description

[0001] This application is a continuation of U.S. patent application Ser. No. 12/961,544 entitled "DATABASE REDISTRIBUTION UTILIZING VIRTUAL PARTITIONS" by Saurabh Jain et al. filed Dec. 7, 2010, the disclosure of which is hereby incorporated herein by reference in its entirety for all purposes.

BACKGROUND OF THE INVENTION

[0002] 1. Technical Field

[0003] The present invention relates in general to data processing, and in particular, to redistribution of a partitioned database.

[0004] 2. Description of the Related Art

[0005] In computing environments in which a large volume of data is stored, the data are commonly managed by a relational database management system (RDBMS), which can be utilized to instantiate one or more databases for storing, accessing and manipulating the data. Each databases includes one or more table spaces, which in turn store table data in accordance with the relational data model. As implied by tabular organization, the table data is logically arranged in rows and columns, with each table row having an associated row key.

[0006] To provide enhanced manageability, performance and/or availability, a relational database is commonly partitioned into multiple logical or physical partitions (hereinafter, simply referred to as a "partition" unless a more definite meaning is required), each having its own data, indexes, configuration files, and transaction logs. Table data of any given table can be located in one or more of the partitions, with the partition on which the table data resides typically being determined by a hash function. Because data is distributed across database partitions, the power of multiple processors, possibly on multiple computers, can be harnessed in tandem to store, retrieve, process and manage the data in the database.

[0007] Enterprises that manage large data volumes, such as online transaction processing (OLTP) systems, data warehousing enterprises, insurance and financial companies, etc., are frequently required to expand their data storage and processing capacities as the volume of stored data grows. For example, an enterprise may add one or more additional servers and their associated storage nodes to the existing information technology (IT) infrastructure of the enterprise in order to handle an increased volume of data while avoiding a degradation in query response times.

[0008] To make use of the additional servers, the RDBMS must redistribute and reorganize one or more database instances so that the database instance(s) reside not only on the storage nodes of the existing servers, but also on the storage nodes of the newly installed servers. A conventional process by which a RDBMS redistributes and reorganizes a database in accordance with the prior art is depicted in FIG. 1.

[0009] The conventional process of redistributing and reorganizing a database begins at block 100 and thereafter proceeds to block 102, which depicts the RDBMS making a backup of the entire database that is to be redistributed. Depending upon the size of the database, making a backup of the database can consume significant processing time (e.g., days or weeks). The process then enters an iterative loop including blocks 104-118 in which the database is redistributed row by row across the existing and new storage nodes. The redistribution begins at block 104, which depicts the RDBMS reading a key value of the next database row to be processed. The RDBMS then rehashes the key value of the database row to determine a target partition number on which the database row will reside following the redistribution (block 106). At block 110, the RDBMS determines whether the target partition number is the same as the existing partition number, meaning that the database row will not be moved. If the target partition number matches the existing partition number, the process passes to block 118, which is described below. If, however, the target partition number does not match the existing partition number, the process proceeds to blocks 112-116.

[0010] At blocks 112-116, the RDBMS reads the complete database row from the preexisting storage node, inserting the database row in a new partition on a newly added storage node, and then deleting the database row from the preexisting storage node. Thereafter, at block 118, the RDBMS determines whether or not all rows of the database have been processed. If not, the process returns to block 104, which has been described. If, however, RDBMS determines at block 118 that all rows of the database have been processed, the process proceeds to block 120.

[0011] As will be appreciated, the movement of selected database rows from the preexisting storage nodes to the newly installed storage nodes via the redistribution depicted at block 104-118 leaves the preexisting storage nodes sparsely populated and thus inefficiently utilized. Consequently, at block 120 the RDBMS reorganizes the database rows in the preexisting storage nodes to return the database to a compact storage organization. If the reorganization completes successfully, the RDBMS then makes a second backup of the entire database at block 122. In addition, as depicted at block 124, the RDBMS executes a utility to gather statistics regarding the database, to recharacterize the table spaces, indexes, and partitions, and to record these statistics in a catalog. Finally, at block 126, the RDBMS notifies any partition-aware applications (e.g., Microsoft.RTM. Internet Information Services (IIS)) of the reorganization of the database across the newly added storage nodes. Thereafter, the conventional process for redistributing and reorganizing the database ends at block 130.

[0012] FIGS. 2A-2C depict the redistribution and reorganization of a database over newly added data storage nodes in accordance with the prior art. In particular, FIG. 2A depicts a data storage system 200 including four database partitions 202a-202d that are populated with a database. Because the size of the database is nearing the capacity of the currently installed data storage nodes, a data warehousing enterprise may add one or more additional storage nodes to data storage system 200 in order to support additional database partitions.

[0013] In the example depicted in FIG. 2B, the data warehousing enterprise adds one or more additional storage nodes to data storage system 200 in order to support four additional database partitions 202e-202h. FIG. 2B further illustrates that, following the conventional row-by-row redistribution of the database depicted at blocks 104-118 of FIG. 1, the portion of the database moved to new database partitions 202e-202h is tightly compacted, but the portion of the database remaining on original database partitions 202a-202d is sparsely populated and therefore makes poor utilization of the storage capacity of data storage system 200. Accordingly, as discussed above with reference to block 120 of FIG. 1, the RDBMS must also reorganize the portion residing on database partitions 202a-202d to achieve the compact, well distributed database illustrated in FIG. 2C.

SUMMARY OF THE INVENTION

[0014] In some embodiments, a partitioned database is stored in a plurality of logical or physical partitions on at least a logical or physical first data storage node, and a subset of a first partition among the plurality of logical partitions is configured as a virtual partition. An input indicating an allocation of a second physical data storage node to store the partitioned database is received. A second partition is configured on the second data storage node. In response to the input, the partitioned database is redistributed over the first and second data storage nodes by moving data within the virtual partition on the first partition to the second partition.

BRIEF DESCRIPTION OF THE DRAWINGS

[0015] FIG. 1 is a high level flowchart of a conventional process for redistributing and reorganizing a database in accordance with the prior art;

[0016] FIGS. 2A-2C depicts the conventional redistribution and reorganization of a database over new data storage nodes in accordance with the prior art;

[0017] FIG. 3 illustrates an exemplary data processing environment in accordance with one embodiment;

[0018] FIG. 4 shows an exemplary data storage node of the data processing enterprise of FIG. 3;

[0019] FIG. 5 illustrates an exemplary embodiment of a partition configuration data structure in accordance with one embodiment;

[0020] FIG. 6 depicts an exemplary embodiment of a partition map in accordance with one embodiment;

[0021] FIG. 7 is a high level logical flowchart of a first exemplary embodiment of a method of redistributing a database;

[0022] FIG. 8 illustrates an exemplary redistribution of a database in accordance with the first exemplary method depicted in FIG. 7;

[0023] FIG. 9 is a high level logical flowchart of a second exemplary embodiment of a method of redistributing a database; and

[0024] FIG. 10 depicts an exemplary redistribution of a database in accordance with the second exemplary method illustrated in FIG. 9.

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENT

[0025] With reference now to the figures and with particular reference to FIG. 3, there is illustrated a high level block diagram of an exemplary data processing environment 300 in accordance with one embodiment. As shown, exemplary data processing environment 300 includes an data processing enterprise 310, which can be operated or on behalf of an organization, such as a business, governmental agency, non-profit association, educational institution or the like, that manages a large volume of data. Data processing enterprise 310 is coupled for communication to one or more circuit-switched or packet-switched communication networks 304, such as wired or wireless local area or wide area network(s), cellular telephony network(s), and/or public switched telephone network(s) (PSTNs). Thus, data processing enterprise 310 may communicate with devices 302a-302d (e.g., server computer systems, personal computer systems, portable computer systems, mobile telephones, smart phones, landline telephones) via communication network(s) 304.

[0026] The communication between devices 302a-302d and data processing system 110 can include voice communication, for example, via a PSTN or voice over Internet Protocol (VoIP) connection, and/or data communication, for example, via instant messaging, Simple Mail Transport Protocol (SMTP) or Hypertext Transfer Protocol (HTTP). For example, the communication between data processing enterprise 310 and devices 302a-302d can include the transmission of data requests from devices 302a-302d to data processing enterprise 310 and the transmission of responsive data (e.g., formatted as program code, images, graphics, text, audio, video, and/or files containing such data) from data processing enterprise 310 to devices 302a-302d.

[0027] Still referring to FIG. 3, data processing enterprise 310 can include one or more physical computer systems, such as servers 312a-312n, which are coupled for communication by a communication fabric 314, which can include, for example, cabling and/or network connections, such as an intranet, virtual private network (VPN) or socket connection. In the illustrated exemplary embodiment, server 312a includes one or more network interfaces 316 that permit server 312a to communicate via communication networks 304 and communication fabric 314. Server 312a additionally includes one or more processors 320 that process data and program code, for example, to manages, access and manipulate data organized in one or more databases. Server 312a also includes input/output (I/O) devices 322, such as ports, displays, and attached devices, etc., which receive inputs and provide outputs of the processing performed by server 312a. Finally, server 312a includes data storage 330, which may include one or more volatile or non-volatile storage devices, including memories, solid state drives, optical or magnetic disk drives, tape drives, etc.

[0028] In the depicted embodiment, data storage 330 stores an operating system (OS) 332 that manages the hardware resources of server 312a and provides common services to other software executing on server 312a. For example, OS 332 may be implemented with one of the AIX.RTM., Linux.RTM., Android.RTM., or Windows.RTM. operating systems. Data storage 330 also includes a database manager 122, such as the DB2.RTM. relational database management system (RDBMS) available from IBM Corporation of Armonk, N.Y., which manages, accesses and manipulates data within one or more databases, such as exemplary database 350. In some embodiments, database manager 340 may be integrated with OS 332 or another software program. In addition to database 350, database manager 340 maintains one or more partition configuration data structures 352 that define the various logical partitions of database 350 and map the partitions to the physical storage resources of data processing enterprise 310. Database manager 340 may optionally also maintain a partition map 354 that maps virtual partitions of database 350 to the logical partitions of database 350, as discussed further below.

[0029] In various embodiments, database manager 340 and/or OS 332 may include code to support communication of server 312a with other servers 312 and devices 302a-302d via communication fabric 314 and communication network(s) 304. Should appropriate communication capabilities not be integrated within OS 332 and/or database manager 340 in some embodiments, data storage 330 may additionally include communication code 342, such as a web server (e.g., Apache, IIS, etc.), Interactive Voice Response (IVR) and/or other program code, that enables server 312a to communicate with other servers 312 and devices 302a-302d via communication fabric 314 and communication network(s) 304. In particular, if implemented, communication code 342 supports the communication of database queries to database manager 340 and the communication of responsive data from database manager 340 to a requester.

[0030] It should be appreciated that the contents of data storage 330 can be localized on server 312a in some embodiments and will be distributed across the data storage 330 of multiple of servers 312a-312n in other embodiments. In addition, the contents depicted in data storage 330 of server 312a may optionally partially or fully reside on a storage area network (SAN) 360 of data processing enterprise 310. As shown, SAN 360 includes a switch/controller (SW/C) 362 that receives and services storage requests and multiple data storage nodes 370a-370k, each of which may comprise one or more physical non-volatile memory drives, hard disk drives, optical storage drives, tape drives, etc. In some embodiments, data storage nodes 370a-370k may be logical entities presenting virtualized abstractions of such physical storage resources.

[0031] It will be appreciated upon review of the foregoing description that the form in which data processing enterprise 312 is realized can vary between embodiments based upon one or more factors, for example, the type of organization, the size of database 350, the number of devices 302a-302d that can query database 350, etc. All such implementations, which may include, for example, one or more handheld, notebook, desktop, or server computer systems, are contemplated as embodiments of the inventions set forth in the appended claims.

[0032] FIG. 4 depicts a more detailed view of a data storage node 400 (e.g., a data storage node 370 of SAN 370 or a data storage node within data storage 330 of a server 312) within data processing enterprise 310 of FIG. 3. In the depicted example, data storage node 400 hosts eight logical or physical partitions, which are hereinafter assumed to be logical partitions numbered LP0-LP7, respectively. Logical partitions LP0-LP7 store a database 350, which includes sixteen data blocks numbered B0-B15, respectively. In a RDBMS, each of data blocks B0-B15 may correspond to one or more database rows having a common row key hash.

[0033] In accordance with the present disclosure, database manager 340 assigns a subset of data blocks B0-B15 to virtual partitions. For example, database manager 340 may assign each of data blocks B8-B15 to a respective one of eight virtual partitions numbered VP8-VP15. In various scenarios, each virtual partition can include one or more data blocks, which preferably all reside on a common logical partition. As discussed further below with reference to FIGS. 7-10, database manager 340 can efficiently redistribute database 350 by reference to the virtual partitions.

[0034] With reference now to FIG. 5, there is illustrated an exemplary embodiment of a partition configuration data structure 352 in accordance with one embodiment. In the depicted embodiment, partition configuration data structure 352, which may be implemented, for example, in one or more database configuration files, includes a plurality of configuration entries 500 defining a plurality of logical partitions of database 350 and mapping the logical partitions to the physical storage resources of data processing enterprise 310.

[0035] In an exemplary embodiment, each configuration entry 500 of partition configuration data structure 352 comprises a number of fields, including a node number field 502, a hostname field 504, a logical partition number field 506, and a virtual partition flag 508. Node number field 502 specifies an integer number uniquely identifying a partition of database 350. In contrast to conventional partitioned databases that restrict node numbers to logical partitions, node number field 502 preferably contains a unique node number for each logical and virtual partition of database 350. Hostname field 504 identifies the TCP/IP hostname (e.g., "ServerA") of the database partition identified in node number field 502. In addition, logical port field 506 specifies the logical port (e.g., logical partition) assigned to the database partition identified in node number field 502, and virtual partition flag 508 identifies whether or not the partition specified in node number field 502 is a virtual partition. It should be appreciated that configuration entries 500 may include one or more additional fields providing additional configuration information, such as a communication path to a logical partition and/or operating system-specific information.

[0036] Given the exemplary embodiment of partition configuration data structure 352 depicted in FIG. 5, the portion of partition configuration data structure 352 describing data storage node 400 of FIG. 4 can be given as shown in Table I below.

TABLE-US-00001 TABLE I Node No. Hostname Logical Port No. VP 0 ServerA 0 -- 1 ServerA 1 -- 2 ServerA 2 -- 3 ServerA 3 -- 4 ServerA 4 -- 5 ServerA 5 -- 6 ServerA 6 -- 7 ServerA 7 -- 8 ServerA 0 V 9 ServerA 1 V 10 ServerA 2 V 11 ServerA 3 V 12 ServerA 4 V 13 ServerA 5 V 14 ServerA 6 V 15 ServerA 7 V

[0037] With reference now to FIG. 6, there is illustrated an exemplary partition map 354 in accordance with one embodiment. In the depicted embodiment, database manager 340 implements partition map 354 as a lookup table including a plurality of rows 500, each of which includes a hash value field 502, a virtual partition number field 504, and a logical partition number field 506. Thus, each row 500 associates a respective hash value (e.g., derived via a hash function from a row key of a row of database 350) with a logical partition number, and if applicable, a virtual partition number. For example, assuming hash values ranging between 0 and 4095 and a data storage node 400 implementing eight logical partitions LP0-LP7 as shown in FIG. 4, partition map 354 can include 4096 rows 500 storing the values summarized in Table II below.

TABLE-US-00002 TABLE II Hash Value Virtual Partition No. Logical Partition No. 0 -- 0 1 -- 1 2 -- 2 3 -- 3 . . . . . . . . . 7 -- 7 8 8 0 9 9 1 10 10 2 . . . . . . . . . 15 15 7 16 -- 0 . . . . . . . . . 23 -- 7 24 8 0 . . . . . . . . . 31 15 7 . . . . . . . . . 4095 -- 7

[0038] With reference now to FIG. 7, there is illustrated a high level logical flowchart of an exemplary method of redistributing a database in accordance with a first embodiment. The depicted method may be performed, for example, through the execution of database manager 340 by one or more processors 320 of a server 312. As with the other logical flowcharts presented herein, it should be understood that steps are depicted in a logical rather than strictly chronological order and that, in at least some embodiments, one or more steps may be performed contemporaneously or in a different order than illustrated.

[0039] The process depicted in FIG. 7 begins at block 700 and thereafter proceeds to block 702, which illustrates database manager 340 configuring a desired number of virtual partitions in database 350, for example, in response to an administrator input or automatically based upon predetermined defaults. In the exemplary partitioned database 350 described by Table I and depicted in FIG. 4, database manager 340 may enter the last eight entries 500 of partition configuration data structure 352 at block 702 in order to establish virtual partitions VP8-VP15 within logical partitions LP0-LP7, respectively. As noted above, the virtual partitions contain the data of database 350 that will be redistributed as the physical storage capacity allocated to store database 350 scales. With the number and location of virtual partitions configured, database manager 340 optionally establishes partition map 354 in order to quickly map between hash values (e.g., of row keys) of data and the logical and virtual partitions configured by partition configuration data structure 352 (block 704). Block 704 is optional in that database manager 340 could alternatively compute the logical and virtual partition associated with each hash value as needed.

[0040] The process proceeds from block 704 to block 710, which depicts database manager 340 determining whether or not an input has been received indicating that database 350 is to be redistributed over an expanded physical storage capacity. As will be appreciated, the expanded physical storage capacity available to store database 350 may become available through the addition of a server 312 to data processing enterprise 310, the addition of an additional data storage node 370 to SAN 360, and/or the reallocation of existing data storage node(s) of data processing enterprise 310 to store database 350. If database manager 340 does not detect an input indicating that database 350 is to be redistributed over an expanded physical storage capacity, the process remains at block 710. While the process remains at block 710, database manager 340 performs conventional database processing, including providing data responsive to structured query language (SQL) queries of database 350 and performing any requested management or configuration functions, etc., as is known in the art. In response to a determination by database manager 340 at block 710 that an input (e.g., a user command) has been received indicating that database 350 is to be redistributed over an expanded physical storage capacity, the process passes to block 712.

[0041] Block 712 depicts database manager 340 establishing logical partitions on the new physical storage node(s) allocated to store database 350. The process then enters a loop including blocks 720-730 in which virtual partitions are redistributed from the preexisting logical partitions to the new logical partitions established at block 712. Referring first to block 720, database manager 340 determines, for example, by reference to partition configuration data structure 352, whether or not all virtual partitions of database 350 have been processed. In response to database manager 350 determining at block 720 that all virtual partitions of database 350 have been processed, the process proceeds from block 720 to block 740, which is described below. If, however, database manager 350 determines at block 720 that not all virtual partitions of database 350 have been processed, database manager 350 selects a virtual partition for processing, for example, the next virtual partition listed in partition configuration data structure 352 (block 722).

[0042] At block 724, database manager 350 determines whether or not to move the virtual partition selected for processing, for example, by determining whether or not the virtual partition number matches a logical partition number assigned to one of the logical partitions established on the newly allocated storage node(s). In response to a determination not to move the currently selected virtual partition, the process returns to block 720, which has been described. If, however, database manager 350 determines at block 724 that the selected virtual partition is to be moved, the process passes to block 726. Block 726 depicts database manager 350 moving the data of the virtual partition using sequential access operations from the existing logical partition to the logical partition having a matching logical partition number. Database manager 350 then updates the metadata stored in association with the moved partition on the data storage node (block 728) and clears the virtual partition flag 508 of the associated configuration entry 500 in partition configuration data structure 352 (block 730). As a result, the moved partition is no longer a virtual partition and is converted into a data block of one of the logical partitions on the newly allocated data storage node. The process returns from block 730 to block 720, which depicts database manager 340 processing the next virtual partition, if any.

[0043] In response to database manager 340 determining at block 720 that all virtual partitions have been processed, database manager 340 updates partition map 354 to reflect the modified relationship between hash values and logical and virtual partition numbers (block 740). Following block 740, the process depicted in FIG. 7 ends at block 742.

[0044] FIG. 8 illustrates an exemplary redistribution of a database 350 from in accordance with the first exemplary method depicted in FIG. 7. In the example shown in FIG. 8, database 350 is originally stored on only the eight logical partitions (i.e., LP0-LP7) of data storage node 400, as previously discussed with reference to FIG. 4. On logical partitions LP0-LP7, database manager 340 configures data blocks B8-B15 as virtual partitions VP8-VP15, respectively.

[0045] The physical data storage capacity of data processing environment 310 available to house database 350 is then expanded to include an additional data storage node 800. As noted with respect to block 712, database manager 340 configures data storage node 800 with eight logical partitions numbered LP8-LP15. In addition, in accordance with blocks 720-730 of FIG. 7, database manager 340 redistributes each of virtual partitions VP8-VP15 (corresponding to data blocks B8-B15, respectively) to a respective one of logical partitions LP8-LP15 on data storage node 800, leaving data blocks B0-B7 on logical partitions LP0-LP7 of data storage node 400.

[0046] Assuming data storage node 800 resides on a server 312 having the hostname "ServerB," database manager 340 updates partition configuration data structure 352 from the state summarized above in Table I to that given in Table III below.

TABLE-US-00003 TABLE III Node No. Hostname Logical Port No. VP 0 ServerA 0 -- 1 ServerA 1 -- 2 ServerA 2 -- 3 ServerA 3 -- 4 ServerA 4 -- 5 ServerA 5 -- 6 ServerA 6 -- 7 ServerA 7 -- 8 ServerB 0 -- 9 ServerB 1 -- 10 ServerB 2 -- 11 ServerB 3 -- 12 ServerB 4 -- 13 ServerB 5 -- 14 ServerB 6 -- 15 ServerB 7 --

In addition, database manager 350 updates partition map 354 from the state summarized above in Table II to that given in Table IV below.

TABLE-US-00004 TABLE IV Hash Value Virtual Partition No. Logical Partition No. 0 -- 0 1 -- 1 2 -- 2 3 -- 3 . . . . . . . . . 7 -- 7 8 -- 0 9 -- 1 10 -- 2 . . . . . . . . . 15 -- 7 16 -- 0 . . . . . . . . . 23 -- 7 24 -- 0 . . . . . . . . . 31 -- 7 . . . . . . . . . 4095 -- 7

[0047] It should be noted by comparison of FIG. 1 with FIGS. 7-8 that the exemplary process depicted in FIG. 7 renders unnecessary many of the processing-intensive steps of FIG. 1. For example, in FIG. 7, there is no need to rehash the rows of database 350, as depicted at block 106. In addition, there is no need to backup database 350 before or after the redistribution of database 350, as depicted at blocks 102 and 122 of FIG. 1. Further, there is no need to reorganize database 120, as depicted at block 120, or to update database statistics, as depicted at block 124. Finally, there is no need to update partition-aware applications, as shown at block 126.

[0048] With reference to FIG. 9, there is illustrated a high level logical flowchart of an exemplary method of redistributing a database in accordance with a second embodiment. In particular, the process depicted in FIG. 9 redistributes database 350 onto one or more newly allocated physical storage nodes via a backup and restore of the virtual partitions of database 350.

[0049] The process depicted in FIG. 9 begins at block 900 and thereafter proceeds to block 902, which illustrates database manager 340 configuring a desired number of virtual partitions in database 350, for example, in response to an administrator input, as described previously. With the number and location of virtual partitions configured, database manager 340 optionally establishes partition map 354 in order to quickly map between hash values (e.g., of row keys) of data and the logical and virtual partitions configured by partition configuration data structure 352 (block 904). The process proceeds from block 904 to block 910, which depicts database manager 340 determining whether or not an input has been received indicating that database 350 is to be redistributed over an expanded physical storage capacity. If database manager 340 does not detect an input indicating that database 350 is to be redistributed over an expanded physical storage capacity, the process remains at block 910 (during which time, database manager 340 may perform other conventional database management operations).

[0050] In response to a determination by database manager 340 at block 910 that an input has been received indicating that database 350 is to be redistributed over an expanded physical storage capacity, the process passes to block 912. Block 912 depicts database manager 340 establishing logical partitions on the new physical storage node(s) allocated to store database 350. The process then enters a loop including blocks 920-930 in which virtual partitions are backed up from the preexisting logical partitions established at block 912. Referring first to block 920, database manager 340 determines, for example, by reference to partition configuration data structure 352 whether or not all virtual partitions of database 350 have been processed. In response to database manager 350 determining at block 920 that all virtual partitions of database 350 have been processed, the process proceeds from block 920 to block 940, which is described below. If, however, database manager 350 determines at block 920 that not all virtual partitions of database 350 have been processed, database manager 350 selects a virtual partition for processing, for example, the next virtual partition listed in partition configuration data structure 352 (block 922). Next, database manager 350 makes a backup of the selected virtual partition, but preferably excludes from the backup the remainder of the logical partition hosting the virtual partition (block 926). Database manager 350 then clears the virtual partition flag 508 associated with the selected virtual partition in partition configuration data structure 352 (block 930). The process returns from block 930 to block 920, which depicts database manager 340 processing the next virtual partition, if any.

[0051] In response to database manager 340 determining at block 920 that all virtual partitions of database 350 on the preexisting physical data storage node(s) have been processed, database manager 340 restores each of the virtual partitions from the backup made at block 926 to a respective logical partition of the newly allocated physical storage node(s) of data processing enterprise 310 (e.g., the logical partition having a logical partition number matching the virtual partition number of the backed up virtual partition). As a result, the moved partition is no longer a virtual partition and is converted into a data block on a logical partition of the newly allocated data storage node(s). Database manager 350 then updates the metadata stored in association with the restored partition on the data storage node (block 942) and deletes the moved partitions from the preexisting physical storage node(s) (block 944). Database manager 340 additionally updates partition map 354, if present, to reflect the modified relationship between hash values and logical and virtual partition numbers (block 946). Following block 946, the process depicted in FIG. 9 ends at block 950.

[0052] FIG. 10 depicts an exemplary redistribution of a database in accordance with the second exemplary method illustrated in FIG. 9. In the example shown in FIG. 10, database 350 is originally stored on only the eight logical partitions (i.e., LP0-LP7) of data storage node 400, as previously discussed with reference to FIGS. 4 and 8. On logical partitions LP0-LP7, database manager 340 configures data blocks B8-B15 as virtual partitions VP8-VP15, respectively.

[0053] The physical data storage capacity of data processing environment 310 allocated to house database 350 is then expanded to include an additional data storage node 800, on which database manager 340 configures eight logical partitions numbered LP8-LP15. In accordance with blocks 920-930 of FIG. 9, database manager 340 creates a virtual partition backup 1000 containing a backup of each of virtual partitions VP8-VP15 (corresponding to data blocks B8-B15, respectively), and preferably excluding other data residing on logical partitions LP0-LP7. Rather than performing a conventional restore back to the host logical partitions, database manager 340 then restores each virtual partition from virtual partition backup 1000 to a respective one of logical partitions LP8-LP15 on data storage node 800. In addition, database manager 340 deletes the corresponding virtual partitions from data storage node 400, leaving data blocks B0-B7 on logical partitions LP0-LP7 of data storage node 400. In this manner, database manager 340 redistributes the virtual partitions of database 350 from preexisting physical data storage node 400 onto newly allocated physical data storage node 800 by leveraging its backup capabilities, rather than by directly moving the data as depicted in FIG. 8. The resulting partition configuration data structure 352 and partition map 354 will, however, be the same as summarized above in Tables III and IV.

[0054] As has been described, in at least some embodiments a partitioned database is stored in a plurality of logical or physical partitions on at least a logical or physical first data storage node, and a subset of a first partition among the plurality of logical partitions is configured as a virtual partition. An input indicating an allocation of a second physical data storage node to store the partitioned database is received. A second partition is configured on the second data storage node. In response to the input, the partitioned database is redistributed over the first and second data storage nodes by moving data within the virtual partition on the first partition to the second partition.

[0055] While the present invention has been particularly shown as described with reference to one or more preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention. For example, although aspects have been described with respect to a computer system executing program code that directs the functions of the present invention, it should be understood that present invention may alternatively be implemented as a program product including a tangible, non-transient data storage medium (e.g., an optical or magnetic disk or memory) storing program code that can be processed by a data processing system to perform the functions of the present invention.

* * * * *