U.S. patent application number 13/402197 was filed with the patent office on 2013-08-22 for system and method for managing virtual hard disks in cloud environments.
This patent application is currently assigned to Computer Associates Think, Inc.. The applicant listed for this patent is SUBRAHMANYA S. YELLAPRAGADA. Invention is credited to SUBRAHMANYA S. YELLAPRAGADA.
Application Number | 20130219069 13/402197 |
Document ID | / |
Family ID | 48983216 |
Filed Date | 2013-08-22 |
United States Patent
Application |
20130219069 |
Kind Code |
A1 |
YELLAPRAGADA; SUBRAHMANYA
S. |
August 22, 2013 |
SYSTEM AND METHOD FOR MANAGING VIRTUAL HARD DISKS IN CLOUD
ENVIRONMENTS
Abstract
A system, method, and computer-readable storage medium for
managing virtual hard disks in a cloud computing/storage
environment are provided. The method includes associating, using a
virtual hard disk (VHD) management system of a server device, a
plurality of data blocks of a virtual hard disk stored at a cloud
vendor to a corresponding plurality of cloud objects. A plurality
of cloud object identifiers associated with the plurality of cloud
objects in a first cloud allocation table are stored. Changes to
one or more data blocks are determined. Corresponding new cloud
allocation tables for every data block in the plurality of data
blocks that has changed are formed, the new cloud allocation tables
having corresponding new cloud object identifiers. The first and
the new cloud allocation tables are downloaded and merged to form
an updated cloud allocation table. The updated cloud allocation
table is uploaded to the cloud vendor.
Inventors: |
YELLAPRAGADA; SUBRAHMANYA S.;
(Hyderabad, IN) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
YELLAPRAGADA; SUBRAHMANYA S. |
Hyderabad |
|
IN |
|
|
Assignee: |
Computer Associates Think,
Inc.
Islandia
NY
|
Family ID: |
48983216 |
Appl. No.: |
13/402197 |
Filed: |
February 22, 2012 |
Current U.S.
Class: |
709/226 |
Current CPC
Class: |
G06F 3/0664 20130101;
G06F 3/064 20130101; G06F 3/067 20130101; G06F 3/0605 20130101;
H04L 67/10 20130101; H04L 67/1097 20130101; G06F 9/5077 20130101;
G06F 16/188 20190101; G06F 3/0643 20130101; G06F 3/0631 20130101;
G06F 3/0665 20130101 |
Class at
Publication: |
709/226 |
International
Class: |
G06F 15/173 20060101
G06F015/173 |
Claims
1. A method for managing virtual hard disks in a cloud
computing/storage environment, comprising: associating, using a
virtual hard disk (VHD) management system of a server device, a
plurality of data blocks of a virtual hard disk stored at a cloud
vendor to a corresponding plurality of cloud objects; storing,
using the VHD management system, a plurality of cloud object
identifiers associated with the plurality of cloud objects in a
first cloud allocation table; determining, using the VHD management
system, changes to one or more data blocks in the plurality of data
blocks; forming, using the VHD management system, corresponding new
cloud allocation tables for every data block in the plurality of
data blocks that has changed, the new cloud allocation tables
having corresponding new cloud object identifiers; downloading,
using the VHD management system, the first and the new cloud
allocation tables; merging, at the VHD management system, the first
and the new cloud allocation tables to form an updated cloud
allocation table; and uploading, using the VHD management system,
the updated cloud allocation table to the cloud vendor such that
the updated cloud allocation table includes information regarding
the changed data blocks in the plurality of data blocks.
2. The method of claim 1, wherein the merging comprises: detecting
updated cloud object identifiers in the first and the new cloud
allocation tables to form updated cloud object identifiers in the
updated cloud allocation table.
3. The method of claim 1, wherein the determining the changes is
based upon determining which sectors of the one or more data blocks
have changed.
4. The method of claim 3, wherein the cloud object identifiers are
Boolean indicators in the first and the new cloud allocation
tables.
5. The method of claim 1, wherein for each data block that is
changed in the one or more data blocks, the new cloud object
identifiers are created.
6. The method of claim 1, wherein the plurality of data blocks
store respective snapshots of virtual machine images.
7. The method of claim 1, wherein the first cloud allocation and
the new cloud allocation tables are smaller in size than the data
blocks such that a time taken for the downloading is less than a
time taken for downloading the data blocks.
8. A tangible computer-readable storage medium having one or more
computer-readable instructions thereon for managing virtual hard
disks in a cloud computing/storage environment, which when executed
by one or more processors cause the one or more processors to:
associate, using a virtual hard disk (VHD) management system of a
server device, a plurality of data blocks of a virtual hard disk
stored at a cloud vendor to a corresponding plurality of cloud
objects; store, using the VHD management system, a plurality of
cloud object identifiers associated with the plurality of cloud
objects in a first cloud allocation table; determine, using the VHD
management system, changes to one or more data blocks in the
plurality of data blocks; form, using the VHD management system,
corresponding new cloud allocation tables for every data block in
the plurality of data blocks that has changed, the new cloud
allocation tables having corresponding new cloud object
identifiers; download, using the VHD management system, the first
and the new cloud allocation tables; merge, at the VHD management
system, the first and the new cloud allocation tables to form an
updated cloud allocation table; and upload, using the VHD
management system, the updated cloud allocation table to the cloud
vendor such that the updated cloud allocation table includes
information regarding the changed data blocks in the plurality of
data blocks.
9. The tangible computer-readable storage medium of claim 8,
wherein the one or more processors are caused to merge by:
detecting updated cloud object identifiers in the first and the new
cloud allocation tables to form updated cloud object identifiers in
the updated cloud allocation table.
10. The tangible computer-readable storage medium of claim 8,
wherein the one or more processors are caused to determine the
changes based upon determining which sectors of the one or more
data blocks have changed.
11. The tangible computer-readable storage medium of claim 10,
wherein the cloud object identifiers are Boolean indicators in the
first and the new cloud allocation tables.
12. The tangible computer-readable storage medium of claim 8,
wherein for each data block that is changed in the one or more data
blocks, the new cloud object identifiers are created.
13. The tangible computer-readable storage medium of claim 8,
wherein the plurality of data blocks store respective snapshots of
virtual machine images.
14. The tangible computer-readable storage medium of claim 8,
wherein the first cloud allocation and the new cloud allocation
tables are smaller in size than the data blocks such that a time
taken for the downloading is less than a time taken for downloading
the data blocks.
15. A system for managing virtual hard disks in a cloud
computing/storage environment, the system comprising: one or more
processors configured to: associate, using a virtual hard disk
(VHD) management system of a server device, a plurality of data
blocks of a virtual hard disk stored at a cloud vendor to a
corresponding plurality of cloud objects; store, using the VHD
management system, a plurality of cloud object identifiers
associated with the plurality of cloud objects in a first cloud
allocation table; determine, using the VHD management system,
changes to one or more data blocks in the plurality of data blocks;
form, using the VHD management system, corresponding new cloud
allocation tables for every data block in the plurality of data
blocks that has changed, the new cloud allocation tables having
corresponding new cloud object identifiers; download, using the VHD
management system, the first and the new cloud allocation tables;
merge, at the VHD management system, the first and the new cloud
allocation tables to form an updated cloud allocation table; and
upload, using the VHD management system, the updated cloud
allocation table to the cloud vendor such that the updated cloud
allocation table includes information regarding the changed data
blocks in the plurality of data blocks.
16. The system of claim 15, wherein the one or more processors are
caused to merge by: detecting updated cloud object identifiers in
the first and the new cloud allocation tables to form updated cloud
object identifiers in the updated cloud allocation table.
17. The system of claim 15, wherein the one or more processors are
caused to determine the changes based upon determining which
sectors of the one or more data blocks have changed.
18. The system of claim 17, wherein the cloud object identifiers
are Boolean indicators in the first and the new cloud allocation
tables.
19. The system of claim 15, wherein for each data block that is
changed in the one or more data blocks, the new cloud object
identifiers are created.
20. The system of claim 15, wherein the plurality of data blocks
store respective snapshots of virtual machine images.
21. The system of claim 15, wherein the first cloud allocation and
the new cloud allocation tables are smaller in size than the data
blocks such that a time taken for the downloading is less than a
time taken for downloading the data blocks.
Description
FIELD
[0001] The invention relates generally to the field of cloud
computing/storage systems and more particularly to efficiently and
cost effectively managing virtual hard disks in cloud
environments.
BACKGROUND
[0002] Cloud computing/storage environments have turned around the
manner in which business organizations examine the requirements and
capacity to implement their data processing needs. A cloud
computing/storage environment includes capabilities where the cloud
provider hosts the hardware and related items and provides systems
and computational power as a service to a customer (e.g., a
business organization). When implementing data processing needs via
a cloud vendor, a customer does not need to bear the cost of space,
energy, and maintenance in order to acquire the required
computational resources at a reasonable cost, and can back up data
to a cloud vendor's storage facility or device.
[0003] Cloud computing/storage environments support virtual
machines (VM) that may be defined as emulation of physical machines
in software, hardware, or combination of both. A set of services or
resources may form a virtual machine image that has associated
recovery points or snapshots. A recovery point or snapshot of a
virtual machine (VM) is a point in time copy of the virtual
machine. In a typical scenario, recovery points or snapshots of a
virtual machine can be copied and stored in a cloud computing and
storage environment. Recovery points are created at regular
intervals and data is stored at the recovery points containing one
or more virtual hard disks (VHDs) used as hard disks for the
virtual machine and stored as files in the cloud computing/storage
environment. Conventionally, to merge or consolidate these recovery
points or snapshots, for example, when two or more different
recovery points are to be merged, virtual VHDs of the virtual
machine associated with the recovery points, are downloaded from
the cloud environment and then merged locally. The merged VHDs are
then again uploaded into the cloud environment. Unfortunately, such
downloading and uploading of snapshots in the form of VHDs is
expensive and time consuming. These and other drawbacks exist.
SUMMARY
[0004] In some implementations, these and other drawbacks of
existing systems are addressed, where provided is a system, method,
and computer-readable storage medium having one or more
computer-readable instructions thereon for managing virtual hard
disks in a cloud computing/storage environment. The method includes
associating, using a virtual hard disk (VHD) management system of a
server device, a plurality of data blocks of a virtual hard disk
stored at a cloud vendor to a corresponding plurality of cloud
objects. A plurality of cloud object identifiers associated with
the plurality of cloud objects in a first cloud allocation table
are stored. Changes to one or more data blocks in the plurality of
data blocks are determined. Corresponding new cloud allocation
tables for every data block in the plurality of data blocks that
has changed are formed, the new cloud allocation tables having
corresponding new cloud object identifiers. The first and the new
cloud allocation tables are downloaded. The first and the new cloud
allocation tables are merged to form an updated cloud allocation
table. The updated cloud allocation table is uploaded to the cloud
vendor such that the updated cloud allocation table includes
information regarding the changed data blocks in the plurality of
data blocks.
[0005] Various other objects, features, and advantages of the
invention will be apparent through the detailed description and the
drawings attached hereto. It is also to be understood that both the
foregoing general description and the following detailed
description are exemplary and not restrictive of the scope of the
invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] FIG. 1 is an illustration of an example environment for
managing virtual hard disks in a cloud computing/storage
environment, according to various implementations of the
invention.
[0007] FIG. 2 is an illustration of a conventional format for a
dynamic virtual hard disk.
[0008] FIG. 3 is an illustration of a dynamic VHD used in a
modified format of a dynamic virtual hard disk, according to
various implementations of the invention.
[0009] FIG. 4 is an illustration of a format for a header field for
a recovery point, according to various implementations of the
invention.
[0010] FIG. 5 is an illustration of a disk information field of the
dynamic VHD, according to various implementations of the
invention.
[0011] FIG. 6 is an illustration of a structure of cloud allocation
table, according to various implementations of the invention.
[0012] FIG. 7 illustrates a flowchart of a process for merging
cloud allocation tables, according to various implementations of
the invention.
[0013] FIG. 8 illustrates a flowchart of a process for creating a
cloud allocation table at a new or latest recovery point or
snapshot, according to various implementations of the
invention.
[0014] FIG. 9 illustrates an exemplary scenario of when a merge
happens, according to various implementations of the invention.
[0015] FIGS. 10A and 10B illustrate examples of cloud allocation
tables of base virtual hard disk and child virtual hard disk,
according to various implementations of the invention.
[0016] FIG. 10C illustrates an example of a cloud allocation table
of a consolidated base disk after a merge operation, according to
various implementations of the invention.
DETAILED DESCRIPTION OF THE INVENTION
[0017] It is to be noted that the following definitions are
included solely for illustration purposes, and these definitions
are indicative with respect to the implementations described herein
and are not meant to be exhaustive or restrictive in nature.
[0018] In some implementations, a virtual machine (VM) is defined
as an emulation or implementation of an actual machine, e.g., a
computer. In some implementations, the VM may be software that
simulates the computer or any other machine.
[0019] In some implementations, a VM has an image that is a
snapshot of a resource or service provided in a cloud
computing/storage environment and managed. A virtual machine image
may include one or more VHDs.
[0020] In some implementations of this invention, a VHD is defined
as a file format that may contain what is found on a physical hard
disk drive, such as disk partitions and a file system, which in
turn can contain files and folders. A VHD is typically used as the
hard disk of a virtual machine.
[0021] In some implementations, a cloud object defines a basic unit
of storage in a cloud computing/storage environment.
[0022] In some implementations, a recovery point or a snapshot is
defined as point-in-time copy of the VM that may include a
point-in-time state of the VM.
[0023] In some implementations, an image is defined as a Virtual
Machine image that is a collection of resources or services
available for use by a customer in a cloud computing/storage
environment.
[0024] In some implementations, a block allocation table is defined
as a table of absolute sector offsets into a file backing a hard
disk of a computer system.
[0025] In some implementations, a data block is defined as a
sequence of bytes or bits.
[0026] In some implementations, a cloud object identifier is
defined as a variable or field that identifies a cloud object. For
example, a blob name for AZURE.RTM. provided by Microsoft
Corporation of Redmond, Wash. and object name for S3.RTM. provided
by Amazon.com, Inc. of Seattle, Wash. are cloud object
identifiers.
[0027] In some implementations, at a given time, a dynamic disk may
be as large as the actual data written to it plus the size of the
header and footer. Allocation of data may be done in blocks such
that as more data is written, a file associated with the dynamic
disk dynamically increases in size by allocating more blocks. In
the context of a VHD, at a given time, a dynamic disk may be as
large as the actual data written to it plus the size of the header
and footer. Allocation of data may be done in blocks such that as
more data is written, a file associated with the dynamic disk
dynamically increases in size by allocating more blocks.
[0028] In some implementations, a differencing disk is defined as a
virtual hard disk used to isolate changes to the VHD or a guest
operating system by storing them in a separate file.
[0029] In some implementations, a sector map for dynamic disks is
defined as a bitmap that indicates which sectors contain valid data
(indicated by binary 1's) and which sectors have never been
modified (indicated by binary 0's). For differencing disks, the
sector bitmap indicates which sectors are located within the
differencing disk (indicated by binary 1's) and which sectors are
in the parent (indicated by binary 0's).
[0030] FIG. 1 is an exemplary illustration of an environment 100,
which is an example of an environment having a system for managing
virtual hard disks in a cloud computing/storage environment. In
some implementations, environment 100 includes, among other things,
a server device 104 (interchangeably referred to herein as server
104), one or more client devices 110, one or more cloud storage
service providers C1-Cn (interchangeably referred to herein as
cloud vendors C1-Cn), and a network 108 for communication between
various components of environment 100 via wired, wireless, optical,
or other types of communication links, known to one of ordinary
skill in the art.
[0031] In some implementations, server device 104 may include a
hardware computing device having an operating system, disk drives,
interfaces/ports, memory, buses, cooling sub-systems, and various
software stored therein on tangible computer readable media.
Specifically, in some implementations, server device 104 may
include a virtual hard disk (VHD) management system 106, as
described below, for managing virtual hard disks stored at various
cloud vendors C1-Cn in a cloud computing/storage environment, such
as environment 100. In some implementations, server device 104,
although shown separate from cloud vendors C1-Cn in environment
100, may be a part of one of cloud vendors C1-Cn. In some
implementations, server device 104 may be a server supporting a
plurality of jobs/applications for one or more client devices 110.
In some implementations, server device 104 includes electronic and
electrical circuitry such as processors and memory and/or other
hardware operable to execute computer-readable instructions using,
for example, an operating system (OS). In some implementations,
server device 104 may include a security device that monitors
various security aspects for an organization in environment 100. In
some implementations, server device 104 may include one or more
tangible computer-readable storage media configured to store one or
more software modules, wherein the software modules include
computer-readable instructions that when executed by one or more
processors in server device 104 may cause the processors to perform
the functions related to managing virtual hard disks in a cloud
computing/storage environment, as described herein. In some
implementations, server device 104 may comprise computer hardware
programmed with a computer application having one or more software
modules that enable the various features and functions related to
managing virtual hard disks in a cloud computing/storage
environment (e.g., environment 100), as described herein. It will
be appreciated that in some implementations server device 104 may
be located remote from a physical location of the organization
(e.g., on a home computer of a user within the organization's
network), and various implementations of the present invention are
not limited by the location of server device 104. Further, although
one server device 104 is shown, in some implementations, cloud
vendors C1-Cn and/or client devices 110 may communicate in parallel
or in series with a plurality of different types of server devices
including but not limited to mobile and desktop client
computing/storage devices.
[0032] In some implementations, VHD management system 106 may be
implemented, for example, using one or more programming languages
such as C, Java, or other programming languages known to one of
ordinary skill in the art. In some implementations, VHD management
system 106 forms a system with electronic files stored in one or
more memory devices of server 104 to manage snapshots of images
executing at server 104. In some implementations, VHD management
system 106 includes code or instructions stored on computer
readable medium or computer readable storage device, which when
executed by a processor cause the processor to implement various
features and functionalities including managing, storing,
retrieving, and merging VHDs in cloud storage devices provided,
e.g., by cloud vendors C1-Cn. In some implementations, VHD
management system 106 is part of one or more memory devices in
server device 104. In some implementations, VHD management system
106 is a hardware module implemented in server device 104 as an
Application Specific Integrated Chip (ASIC) with various logic
circuitry integrated thereupon to implement the functionalities of
VHD management system 106 discussed in FIGS. 3-10. In some
implementations, VHD management system 106 is implemented using a
Field Programmable Gate Array (FPGA) device. It will be appreciated
that implementations of VHD management system 106 may be carried
out using a combination of hardware and software, as can be
contemplated by one of ordinary skill in the art in view of this
disclosure. In some implementations, VHD management system 106
includes one or more cloud allocation tables (CATs) 102 (also
referred to as CAT tables 102), discussed with respect to FIGS.
3-10.
[0033] In some implementations, client devices 110 interact,
directly or indirectly through server device 104, with a plurality
of cloud storage service providers C1-Cn via wired, wireless,
optical, or other types of communication links over network 108
known to one of ordinary skill in the art. Client devices 110 are
computing devices known to those of ordinary skill in the art
(e.g., mobile or desktop computing devices). In some
implementations, one or more client devices 110 may access
resources provided by cloud vendors C1-Cn through server device
104.
[0034] In some implementations, network 108 may be the Internet or
the World Wide Web ("www"). In some implementations, network 108
may be a switching fabric that is part of a Wide Area Network
(WAN), a Local Area Network (LAN), or other types of networks known
to those of ordinary skill in the art (e.g., a TCP/IP network). In
some implementations, network 108 routes requests from server 104
and/or client devices 110 for accessing various resources.
[0035] In some implementations, a plurality of cloud vendors C1-Cn
may include storage devices and hardware that may be part of or
separate from one or more servers (e.g., servers S1-S6 in cloud
vendor C1, servers S7-S10 in cloud vendor C2, and servers S12-S17
in cloud vendor Cn). The storage devices and hardware may store
data on respective memory devices therein. The servers may be
accessed by server device 104 for providing applications/services
to customers at client devices 110, although other servers or
devices may access servers S1-S17 for other purposes. Further, any
number of servers communicably connected in known ways may be used
as appropriate for cloud vendors C1-Cn and the number and types of
connections shown for the servers S1-S17 in FIG. 1 is by way of
example only and not by way of limitation. An example of cloud
vendors C1-Cn includes cloud computing/storage services provided by
Amazon.com, Inc. of Seattle, Wash., although other vendors may be
used.
[0036] FIG. 2 illustrates a basic format of a dynamic virtual hard
disk stored in the cloud storage environment, for example, in
environment 100 at one of cloud vendors C1-Cn. In some
implementations, for example in dynamic disk images, the VHD file
format is represented as a file 200 that includes one or more
dynamic disk header fields 202 including a copy of hard disk footer
204, a dynamic disk header 206, a block allocation table (BAT) 208,
and one or more data blocks 210. For example, some virtual hard
disk formats are supported by Microsoft Virtual PC.RTM. and Virtual
Server.RTM. provided by Microsoft Corporation of Redmond, Wash.
include fixed hard disk image, dynamic hard disk image, and
differencing hard disk image formats, although other formats
provided by other vendors may be used, as will be apparent to one
of ordinary skill in the art. It is to be noted that sizes of
various fields of file 200 are exemplary in nature, and are not
intended to be limiting, as will be appreciated by one of ordinary
skill in the art. In some implementations, virtual hard disks of
respective virtual machines are not stored as a single cloud unit
storage object.
[0037] Conventionally, BAT 208 is a table of absolute sector
offsets into file 200 backing a virtual hard disk. BAT 208 is
pointed to by a "Table Offset" field (not shown) of dynamic disk
header 206. The size of BAT 208 is calculated during creation of
the virtual hard disk. The number of entries in BAT 208 is the
number of data blocks 210 needed to store the contents of the
virtual hard disk when fully expanded. For example, in some
implementations, a 2-GB disk image that uses 2 MB blocks requires
1024 BAT entries, where each entry is four bytes long. In some
implementations, unused table entries are initialized to a physical
address 0xFFFFFFFF in a memory of server 104 or other storage
devices. In some implementations, BAT 208 is extended to a sector
boundary and has a field (not shown) within dynamic disk header 206
indicates how many entries are valid. Each entry in BAT 208 refers
to one or more data blocks in data blocks 210 in the virtual hard
disk image. In some implementations, one or more data blocks in
data blocks 210 may be contiguously stored with pointers to such
contiguous memory locations stored in BAT 208. Since details of the
format of file 200 are known to one of ordinary skill in the art,
they will not be described in detail.
[0038] In conventional systems, for changes to virtual hard disks
include changes to one or more data blocks in data blocks 210.
Every time a data block in data blocks 210 is changed or modified,
an additional VHD (or a "differencing disk") having the updated
data block is created. Time taken for consolidating or merging the
changes to data blocks 210 is determined as a total of time taken
to download one or more data blocks 210 in different VHDs, time
taken to modify one or more data blocks 210 based upon the changes,
and time taken to upload one or more data blocks 210 after the
changes have been performed and merging of redundant data blocks is
completed. However, such downloading and uploading from a local
space, e.g., on client devices 110 or server device 104, to cloud
vendors C1-Cn is expensive and time consuming in direct proportion
with the number of one or more data blocks 210 and respective size
of one or more data blocks 210. Generally, the term downloading, in
some implementations, refers to data received from cloud vendors
C1-Cn at server device 104 and/or client devices 110 either as a
response to a request from server device 104 and/or client devices
110, or otherwise. Likewise, the term uploading, in some
implementations, refers to sending data (e.g., merged CAT tables)
from server device 104 and/or client devices 110 to one or more of
cloud vendors C1-Cn, e.g., after merging the CAT tables.
[0039] Instead, in some implementations, every block in data blocks
210 of the VHD is stored in a separate cloud unit storage object
(e.g., an S3.RTM. object), for example in a distributed fashion. As
noted above, each cloud object is a file that now stores data in
data blocks 210. The distributed data blocks are then addressed to
using one or more cloud allocation tables 102 (or, CAT tables 102)
instead of BAT 208. As discussed below, operations are then
performed on CAT tables 102, which are substantially smaller than
data blocks 210 themselves, which eliminate the need for expensive
downloading and overwriting of blocks in data blocks 210, as
discussed below in FIG. 3.
[0040] FIG. 3 is an illustration of a format of file 200 modified
to include one or more cloud allocation tables 102 to form an
electronic file 300 to represent a dynamic virtual hard disk. In
some implementations, a CAT table may be a data structure, stored
in a storage device or memory that includes one or more identifiers
or pointers to data blocks stored as distributed cloud objects.
Such representation of a dynamic VHD that includes CAT tables 102
may be stored at server 104, for example, or at any other storage
device such as those provided by cloud vendors C1-Cn. In some
implementations, electronic file 300 includes dynamic disk header
fields 202, copy of hard disk footer 204, dynamic hard disk header
206, and one or more data blocks 210 each stored in a unique cloud
object. However, in some implementations, conventional BAT 202 of
file 200 is replaced by one or more cloud allocation tables 102 in
file 300, referred to hereinafter in singular form as cloud
allocation table 102). In some implementations, cloud allocation
table (CAT) 102 itself is stored as one or more cloud objects
depending upon a size of CAT table 102.
[0041] FIG. 4 is an illustration of a format of a header 400 for a
recovery points, according to various implementations of the
invention. In some implementations, header 400 includes a
validation data structure 402, one or more recovery points 404, a
next object identifier field 406, a disk information field 408
having one or more pointers to one or more disk information
identifiers 410. In one implementation, for every recovery point on
cloud vendors C1-Cn, there exists a corresponding file 300 and
therefore, a unique header 400.
[0042] In some implementations, validation structure 402 includes
data structures for calculating a checksum of header 400, a
modification time field, and other reserved fields. In some
implementations, one or more recovery points 404 each include at
least one VHD image. The number of recovery points is calculated as
a sum of a number `n` of child disks of the VHD and the base disk
(i.e., a total of n+1 recovery points). In some implementations,
next object identifier field 406 is used in a scenario when header
400 spans across multiple objects, in which scenario next object
identifier field 406 points to next object identifier in CAT 102.
In some implementations, disk information field 408 includes
pointers 408(1)-408(n) that point to corresponding to disk
information fields 410, as indicated by arrows in FIG. 4. Disk
information field 408 is described in more detail in FIG. 5. In
some implementations, disk information identifiers include
information related to most recent child disks in reverse
chronological order, although other arrangements may be
possible.
[0043] FIG. 5 illustrates disk information field 408 in more
detail. In some implementations, disk information field 408
includes a header footer object identifier 502, an object
identifiers number field 504, and object identifiers 504(1)-504(n).
Header footer object identifier 502 includes an object identifier
corresponding to a VHD's header and footer. Object identifiers
504(1)-504(n) facilitate storage of CAT table 102. VHD on host file
system is stored as one file. However, VHD at cloud vendors C1-Cn
is stored in multiple cloud objects that are units of data in cloud
storage environment. For example, object can be an Amazon
Object.RTM. provided by Amazon.com, Inc. of Seattle, Wash., or a
blob in Azure.RTM. provided by Microsoft Corporation of Redmond,
Wash. For each VHD in the parent chain VHD Headers (Header and
Footer) are stored in one cloud object. Each data block in VHD is
saved as a cloud object and the cloud object identifiers
504(1)-504(n) is stored in CAT 102 at a corresponding entry.
[0044] FIG. 6 is an illustration of an exemplary structure of cloud
allocation table 102. In some implementations, a structure of CAT
102 contains as many entries as there are entries in conventional
VHD BAT 208 shown in FIG. 2. The structure shown in FIG. 6
indicates a one to one mapping between cloud object identifiers 610
and one or more data blocks 210. For example, a first data block
("Data Block #1") in one or more data blocks 210 corresponds to a
first cloud object identifier ("VMname_UUID_DISKID_1") among cloud
object identifiers 610. If there is no entry in BAT 208 in file 200
for a given block number among one or more data blocks 210 in file
200 of FIG. 2, then a corresponding cloud object identifier in
cloud object identifiers 610 is not created, as indicated, for
example, by a "NULL" indicator for "Data Block #3" in FIG. 6. In
some implementations, cloud object identifiers 610 each have
corresponding Boolean indicators 612 that show whether or not a
corresponding data block in data blocks 210 was changed in a
particular recovery point or snapshot of the VM image. In some
implementations, a sector map 614 indicates sectors of VHD
corresponding to each of cloud identifier objects 610 where such
changes occurred. In some implementations, sector map 614 may be
optional and not mandatory. When data of all sectors in a specific
block in data blocks 210 is present, sector map 614 may be
indicated by all binary 1's when a VHD is restored.
[0045] FIGS. 7-9 are flowcharts of methods 700-900, respectively,
depicting operations performed by one or more components of
environment 100. The described operations of processes 700-900 may
be accomplished using one or more of modules/sub-modules described
herein and in some implementations, various operations may be
performed in different sequences. In some implementations,
additional operations may be performed along with some or all of
the operations shown in FIGS. 7-9. In some implementations, one or
more operations may be performed simultaneously. In some
implementations, one or more of operations may not be performed.
Accordingly, the operations described are exemplary in nature and,
as such, should not be viewed as limiting. In some implementations,
processes 700-900 are performed using instructions stored on
tangible computer readable media (e.g., memory devices in server
104), which instructions when executed by one or more processors in
server 104, or elsewhere, cause the processor to carry out the
operations of processes 700-900.
[0046] FIG. 7 illustrates a process 700 for merging recovery points
(RPs). In an operation 702, a user requests that a merge of RPs 3
and 2 among RPs 710 of a cloud vendor Ck (where k is an integer) be
performed. RPs are point in time snapshots of images provided by
cloud vendors C1-Cn and store one or more VHDs. In an operation
704, VHD management system 106 downloads CAT tables 710a and 710b
corresponding to VHDs of RPs 3 and 2, respectively. It is to be
noted that in RPs 710, although logical representations of RPs/VHDs
show one VHD per recovery point in RPs 710, such representation is
by way of example only and not by way of a limitation. For example,
one or more of RPs 710 can each include two or more VHDs. Further,
RP1 is denoted as an RP corresponding to a base VHD and RPs 2-5 are
subsequently created RPs of incremental child VHDs storing point in
time snapshots of images used by a customer (e.g., one or more
client devices 110), although such representation is solely
illustrative and is not meant to be limiting. In an operation, 706,
merging of CAT tables of RPs 3 and 2 is carried out, and a new set
of RPs 712 have a new CAT table 712a is formed for merged RPs 3,2,
as shown in RPs 712, as discussed below in FIGS. 10A-10C by way of
example only.
[0047] FIG. 8 illustrates a process 800 for a scenario where five
RPs are stored in cloud vendors C1-Cn and a sixth RP is to be
created, although such numbers of RPs are presented by way of
example only and not by way of limitation. RPs in FIG. 8 are
represented logically by VHDs 810. For example, VHDs 810 may be
part of RPs 710. In an operation 802, to create a new CAT table,
CAT table of a VHD-5 in VHDs 810 of a virtual machine 808 is
retrieved and corresponding Boolean indicators 612 are marked to
logical "FALSE" or binary "0" value. A template corresponding to
new CAT Table 6 is created using the template of CAT table of
VHD-5, as discussed below with respect to FIGS. 10A-10C. In an
operation, 804, for every block changed in VHD-6, a cloud object is
created in the new CAT table 6's template. Accordingly, CAT Table 6
for VHD-6 in VHDs 810 is overwritten by VHD management system 106.
It is to be noted that since CAT table 6 has same template of CAT
table 5, other values in the template of CAT table 6 remain
unchanged. In an operation 806, VHD management system 106 uploads
the new CAT table 6 of VHD-6 to cloud vendors C1-Cn. As a result,
only CAT table 6, and not the actual data blocks 210 are
downloaded, merged and uploaded.
[0048] FIG. 9 illustrates a process 900 for determining when a
merging of RPs corresponding to base/parent VHD and child VHD may
happen. In an operation 902, if a user at client devices 110
explicitly selects two RPs among RPs 906 for merging, then merging
is carried out based on such a merge trigger obtained from the
user. In another operation 904, a user at client devices 110
configures a maximum number of RPs (e.g., 5). Once this limit of
RPs is reached, then for every new RP that is to be uploaded to
cloud vendors C1-Cn, a merge operation is carried out. For example,
a new RP6 in RPs 908 causes RPs 1 and 2 in RPs 906 to merge, as
shown by RP 2,1 in RPs 908. As a result of merging only RPs and
therefore, corresponding CAT tables in those RPs, significant cost
savings are achieved when compared with conventional merging of
actual data of snapshots in data blocks 210.
[0049] FIGS. 10A-10C illustrate an example implementation where
cloud allocation tables of a base disk (or, parent disk) and a
child disk are merged to result in an updated cloud allocation
table with changes from both the base and the child disks
incorporated therein. FIG. 10A illustrates CAT table of a base VHD
with cloud object identifiers (similar to cloud object identifiers
610) and Boolean indicators (similar to Boolean indicators 612) for
corresponding cloud object identifiers (similar to object
identifiers 610). By way of example only and not by way of
limitation, assuming base VHD contains Block #1, Block #2, and
Block #3 (similar to data blocks 210) and assuming only sector#42
is valid in Block #1, Block #1, Block #2, Block #3 are replicated
to ObjectID #1, ObjectID #2, ObjectID #3 (similar to cloud object
identifiers 610) in cloud vendors C1-Cn. For base VHD, CAT 102 with
metadata to store cloud object identifiers 610 for each valid block
for that disk is created at VHD management system 106 of server
104, or other storage devices such as those in cloud vendors C1-Cn.
CAT 102 contains ObjectID #1, ObjectID #2, ObjectID #3 and
corresponding Boolean indicators 612 for the base VHD disk, a
parameter ("IsChangedinThisrecoveryPoint"), that indicates changes
in this base VHD, is set to a "true" (or, "T") value for all valid
entries of CAT 102.
[0050] FIG. 10B illustrates CAT 102 of a child VHD. Assuming in the
child VHD only sector#1 is modified in Block #1. When the child
disk is generated in one of cloud vendors C1-Cn, each block in the
child disk is a consolidation of all its parent changes and its own
changes. Therefore, Block #1 should contain both sector#1 and
sector #42. For the generation of CAT 102 of a child disk, CAT 102
of the parent VHD is taken and all Boolean indicators 612 are
marked as "False" (or, "IsChangedinThisrecoveryPoint" variables are
changed to "F"). A cloud object is newly created with data of Block
#1 and cloud object identifiers 610 that are newly created are
overwritten and the corresponding Boolean indicators are changed to
"True" (or, "IsChangedinThisrecoveryPoint" is set to "T").
Therefore, for each valid block in child disk CAT 102,
corresponding cloud object identifiers 610 are obtained and
overwritten at the corresponding blocks. For example, ObjectID #1
is overwritten with ObjectId#4, and sector map 614 is updated to
contain both sector #1 and Sector #42 with corresponding Boolean
indicator in Boolean indicators 612 changed to a value "true" (or,
"IsChangedinThisrecoveryPoint" changed to "T") resulting in CAT 102
child disk containing ObjectID #4, ObjectID #2, and ObjectID #3 as
shown in FIG. 10B.
[0051] FIG. 10C illustrates resulting CAT 102 after a merging
operation performed according to implementations of method 700-900.
In FIG. 10C, only metadata in the form of CAT 102 is merged but not
the actual data. Further, delete operations generally may not
involve any cost and are therefore can be freely performed by VHD
management system 106. After, downloading only CAT (e.g., CAT 102)
from cloud vendors C1-Cn, a modification of the CAT table of the
child disk that is latest among the two disks (i.e., parent VHD and
child VHD) that are being merged is performed by VHD management
system 106. An iteration over both the CAT tables of base/parent
VHD and child VHD is carried out by VHD management system 106, and
all the cloud objects of parent disk whose cloud object identifiers
610 are overwritten with child VHD's cloud object identifiers are
deleted. That is, at each iteration if
"IsChangedinThisrecoveryPoint" is "T" for both child and parent
VHDs, then the object belonging to parent VHD's data blocks 210 are
deleted. For the example above in FIGS. 10A and 10B,
"IsChangedinThisrecoveryPoint" is "T" for only data block #1 in
both child disk and base disk. So, object corresponding to data
block #1, which is ObjectID #1, is deleted.
[0052] All "IsChangedinThisrecoveryPoint" parameters are then
marked to "T" when merging of child disk to base disk is to be
carried out by VHD management system 106. When merging two child
disks "IsChangedinThisrecoveryPoint" is changed to a value "T" only
when either of the child disk has "T". Subsequently, uploading the
CAT table of the child disk that is modified to cloud vendors C1-Cn
is carried out by overwriting the older CAT table. Finally, the CAT
table and headers of the parent disk are deleted and accordingly a
"CloudVHDHeader" (e.g., header 400) is updated by removing the
corresponding entry in disk information fields 408 for the
particular disk. Accordingly, the number of recovery points (RPs)
for the disk are decremented. For every VHD recovered from cloud
storage environment 100 (e.g., from cloud vendors C1-Cn), the CAT
table of the corresponding child disk for a particular RP or
snapshot is taken and VHD is created locally from the CAT table by
reading object identifiers (e.g., cloud object identifiers 610)
[0053] Cloud computing and virtualization are emerging markets in
the coming decades. Cloud computing and virtualization are changing
the basic paradigm of information technology infrastructure.
Various implementations of the invention disclosed above
advantageously optimize the way VHDs (virtual hard disks) are
stored and managed at cloud. In some implementations, such
optimization can be part of Hyper-V.RTM. virtual machine manager
provided by Microsoft Corporation of Redmond, Wash. Various
implementations of the invention save cost and time to recover a
VHD from a given snapshot. Further, merging two VHD disks in cloud
environments (e.g., environment 100) is fast and incurs very less
cost. By way of example only and not by way of limitation, in a
scenario where snapshots are changed and have to be updated and/or
merged, at time taken for merging=time taken to download meta data
files+time taken to manipulate meta data files+time taken to upload
meta data files. Since metadata files (e.g., file 300 having CAT
102) are substantially smaller than the actual data files, by
performing operations (e.g., those disclosed in processes 700-900)
on such files, fast merging of recovery points and snapshot data
occurs. Processes 700-900 may be applied to any virtual disks that
adhere to VHD specification, although the implementations may be
modified for other forms of virtual hard disks.
[0054] Instead of or in addition to cloud storage, various
implementations of this invention can be used, for example, for
Hadoop Distributed File System (HDFS) where a file is used instead
of a cloud object. In HDFS, an object identifier may be a file
name. Various implementations of this invention can also be used
for file systems where files are used as cloud objects, such that
object identifiers are filenames. In some implementations, the
disclosure may be extended to store multiple objects in same file
where object identifier will include both filename and offset.
[0055] Implementations of the invention may be made in hardware,
firmware, middleware, software, or various combinations thereof.
The invention may also be implemented as computer-readable
instructions stored on a tangible computer-readable storage medium
which may be read and executed by one or more processors. A
computer-readable storage medium may include various mechanisms for
storing information in a form readable by a computing device. For
example, a tangible computer-readable storage medium may include
optical storage media, flash memory devices, and/or other storage
mediums. Further, firmware, software, routines, or instructions may
be described in the above disclosure in terms of specific exemplary
aspects and implementations of the invention, and performing
certain actions. However, it will be apparent that such
descriptions are merely for convenience, and that such actions may
in fact result from computing devices, processors, controllers, or
other devices executing firmware, software, routines or
instructions.
[0056] Other implementations, uses, and advantages of the invention
will be apparent to those skilled in the art from consideration of
the specification and practice of the invention disclosed herein.
The specification should be considered exemplary only, and the
scope of the invention is accordingly intended to be limited only
by the following claims.
* * * * *