U.S. patent application number 10/259237 was filed with the patent office on 2003-04-03 for data storage system having data restore by swapping logical units.
Invention is credited to Dings, Thomas L., Maurer, Charles F. III, Naik, Sujit Suresh, Pillai, Ananthan K., Stockenberg, John E., Wright, Michael H..
Application Number | 20030065780 10/259237 |
Document ID | / |
Family ID | 25403060 |
Filed Date | 2003-04-03 |
United States Patent
Application |
20030065780 |
Kind Code |
A1 |
Maurer, Charles F. III ; et
al. |
April 3, 2003 |
Data storage system having data restore by swapping logical
units
Abstract
This invention is a system and method for managing replication
of data distributed over one or more computer systems. A data
storage system can perform computer-executed steps of establishing
one or more mirrored copies of data that are copies of one or more
volumes of standard data (e.g. a database) on a first computer
system. In one embodiment, the mirrored data is restored using
logical volume swapping.
Inventors: |
Maurer, Charles F. III;
(Bellingham, MA) ; Naik, Sujit Suresh;
(Northborough, MA) ; Pillai, Ananthan K.;
(Shrewsbury, MA) ; Dings, Thomas L.; (Hopkinton,
MA) ; Wright, Michael H.; (Franklin, MA) ;
Stockenberg, John E.; (Newport, RI) |
Correspondence
Address: |
DALY, CROWLEY & MOFFORD, LLP
SUITE 101
275 TURNPIKE STREET
CANTON
MA
02021-2310
US
|
Family ID: |
25403060 |
Appl. No.: |
10/259237 |
Filed: |
September 27, 2002 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10259237 |
Sep 27, 2002 |
|
|
|
09894422 |
Jun 28, 2001 |
|
|
|
Current U.S.
Class: |
709/225 ;
709/203 |
Current CPC
Class: |
G06F 11/1456 20130101;
G06F 11/1469 20130101; G06F 2201/80 20130101 |
Class at
Publication: |
709/225 ;
709/203 |
International
Class: |
G06F 015/173; G06F
015/16 |
Claims
What is claimed is:
1. A method of performing a data restore by swapping logical units,
comprising: providing a first logical unit in a storage array to a
client; creating a mirror of the first logical unit on a second
logical unit of the storage array; splitting the mirror from the
first logical unit; and providing the second logical unit to the
client without notifying the client such that the client can access
the mirrored data.
2. The method according to claim 1, further including recreating
the first logical unit from the second logical unit.
3. The method according to claim 1, further including restoring the
mirrored data from the second logical unit to the first logical
unit; and returning client access to the first logical unit.
4. The method according to claim 3, further including waiting for
synchronization of the mirror of the second logical unit.
5. The method according to claim 3, further including determining
characteristics of a physical storage device corresponding to the
first logical unit.
6. The method according to claim 5, wherein the storage device
characteristics include world wide name and subsystem name.
7. The method according to claim 1, further including creating the
mirror of the first logical unit in a format selected from the
group consisting of JBOD, RAID 0, RAID 0+1, and RAID 1.
8. The method according to claim 1, further including selecting the
second logical unit from a plurality of logical units in the
storage array based on visibility to the client.
9. The method according to claim 1, further including converting a
host physical device location to an identification of the first
logical unit.
10. The method according to claim 9, further including unmounting a
filesystem associated with the first logical unit.
11. The method according to claim 10, further including disabling
client access to the first logical unit.
12. The method according to claim 1, wherein the storage array
corresponds to one of a Symmetrix.TM. system, a Clarion.TM. system,
and an HSG80 .TM. system.
13. A data storage system, comprising computer-executable logic
that enables the method steps of: providing a first logical unit in
a storage array to a client; creating a mirror of the first logical
unit on a second logical unit of the storage array; splitting the
mirror from the first logical unit; and providing the second
logical unit to the client without notifying the client such that
the client can access the mirrored data.
14. The system according to claim 13, further including recreating
the first logical unit from the second logical unit.
15. The system according to claim 13, further including restoring
the mirrored data from the second logical unit to the first logical
unit; and returning client access to the first logical unit.
16. The system according to claim 15, further including waiting for
synchronization of the mirror of the second logical unit.
17. The system according to claim 5, further including determining
characteristics of a physical storage device corresponding to the
first logical unit.
18. The system according to claim 17, wherein the storage device
characteristics include world wide name and subsystem name.
19. The system according to claim 13, further including creating
the mirror of the first logical unit in a format selected from the
group consisting of JBOD, RAID 0, RAID 0+1, and RAID 1.
20. The system according to claim 13, further including selecting
the second logical unit from a plurality of logical units in the
storage array based on visibility to the client.
21. The system according to claim 13, further including converting
a host physical device location to an identification of the first
logical unit.
22. The system according to claim 21, further including unmounting
a filesystem associated with the first logical unit.
23. The system according to claim 22, further including disabling
client access to the first logical unit.
24. The system according to claim 13, further including creating a
map of logical information on the first logical unit to one or more
physical devices.
25. The system according to claim 1, wherein the storage array
corresponds to one of a Symmetrix.TM. system, a Clarion.TM. system,
and an HSG80.TM. system.
26. A computer readable medium for use with a data storage system
comprising code to enable the steps of: providing a first logical
unit in a storage array to a client; creating a mirror of the first
logical unit on a second logical unit of the storage array;
splitting the mirror from the first logical unit; and providing the
second logical unit to the client without notifying the client such
that the client can access the mirrored data.
27. The medium according to claim 26, further including restoring
the mirrored data from the second logical unit to the first logical
unit; and returning client access to the first logical unit.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims the benefit of U.S.
application Ser. No. 09/894,422, filed on Jun. 28, 2001, which is
incorporated herein by reference.
[0002] A portion of the disclosure of this patent document contains
command formats and other computer language listings, all of which
are subject to copyright protection. The copyright owner, EMC
Corporation, has no objection to the facsimile reproduction by
anyone of the patent document or the patent disclosure, as it
appears in the Patent and Trademark Office patent file or records,
but otherwise reserves all copyright rights whatsoever.
FIELD OF THE INVENTION
[0003] The invention relates generally to managing data in a data
storage environment, and more particularly to a system and method
for managing replication of data distributed over one or more
computer systems.
BACKGROUND OF THE INVENTION
[0004] As is known in the art, computer systems which process and
store large amounts of data typically include a one or more
processors in communication with a shared data storage system in
which the data is stored. The data storage system may include one
or more storage devices, usually of a fairly robust nature and
useful for storage spanning various temporal requirements, e.g.
disk drives. The one or more processors perform their respective
operations using the storage system. To minimize the chance of data
loss, the computer systems also can include a backup storage system
in communication with the primary processor and the data storage
system. Often the connection between the one or more processors and
the backup storage system is through a network in which case the
processor is sometimes referred to as a "backup client."
[0005] The backup storage system can include a backup storage
device (such as tape storage or any other storage mechanism),
together with a system for placing data into the storage device and
recovering the data from that storage device. To perform a backup,
the client copies data from the shared storage system across the
network to the backup storage system. Thus, an actual data file may
be communicated over the network to the backup storage device.
[0006] The shared storage system corresponds to the actual physical
storage. For the client to write the backup data over the network
to the backup storage system, the client first converts the backup
data into file data i.e., the client retrieves the data from the
physical storage system level, and converts the data into
application level format (e.g. a file) through a logical volume
manager level, a file system level and the application level. When
the backup storage device receives the data file, the backup
storage system can take the application level data file, and
convert it to its appropriate file system level format for the
backup storage system. The data can then be converted through the
logical volume manager level and into physical storage.
[0007] The EMC Data Manager (EDM) is capable of such backup and
restore over a network, as described in numerous publications
available from EMC of Hopkinton, Mass., including the EDM User
Guide (Network) "Basic EDM Product Manual". For performance
improvements, a backup storage architecture in which a direct
connection is established between the shared storage system and the
backup storage system was conceived. Such a system is described in
U.S. Pat. No. 6,047,294, assigned to assignee of the present
invention, and entitled Logical Restore from a Physical Backup in
Computer Storage System and herein incorporated by reference.
[0008] Today much of the data processing and storage environment is
dedicated to the needs of supporting and storing large databases,
which only get larger. Although data storage systems, such as the
EMC Symmetrix Integrated Cache Disk Array, and some of its
supporting software such as TimeFinder have made general
advancements in the data storage art through the advanced use of
disk mirroring much of the capability of such technology is beyond
the grasp of most entities. This is because of an ever-increasing
shortage of skilled computer professionals. Typically, an entity
such as a company might employ or contract a data storage
administrator to take care of data storage needs, a database
programmer to take of database needs and general network
administrators and other information technology professionals to
take care of general computing needs.
[0009] If one of these skilled professionals leaves or is difficult
to hire then the task of storing a database and taking care of its
backup and restore needs may be neglected or never happen in the
first place. What is needed is a computer-based tool, such as a
system or program that could automate many of these tasks and
reduce the complexity so that such a wide array or depth of skill
sets are not needed. Further it would be an advantage if such a
tool provided solutions for disaster recovery of data.
[0010] Prior art systems have allowed for restoration of source or
standard data from replicated copies, but there has been no
straight-forward simple way to get logical information related to
the source so that another computer could take over the role of a
failed computer (i.e., serve as a surrogate for the failed
computer). There is a long-felt need for a technique to enable
extraction of such logical information in a straight-forward
non-complex and fast manner so that a surrogate computer could work
with replicated copies in substantially the same manner as the
original source computer that had operated with standard data. This
would be advancement in the art with particular relevance in the
field of disaster recovery.
SUMMARY OF THE INVENTION
[0011] The present invention is a system and method for management
of data replicated across one or more computer systems.
[0012] The method of this invention allows management of data that
may be replicated across one or more computer systems. The method
includes the computer-executed steps of establishing one or more
mirrored copies of data that are copies of one or more volumes of
data that are part of a first volume group on a first computer
system. The mirrored copies of data are separated or split from the
respective one or more volumes of data. Steps include the
discovering of logical information related to the one or more
volumes of data that are part of the volume group on the first
computer system.
[0013] A map is created from the discovered information to map
logical information to physical devices on the first computer
system. Then a duplicate of the one or more mirrored copies of data
is mounted on the second computer system by using the map to create
a second volume group that is substantially identical to the first
volume group.
[0014] In an alternative embodiment, the invention includes a
system for carrying out method steps. In another alterative
embodiment, the invention includes a program product for carrying
out method steps.
[0015] In a further aspect of the invention, a storage system
performs a restore by utilizing logical volume or unit
swapping.
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] The above and further advantages of the present invention
may be better under stood by referring to the following description
taken into conjunction with the accompanying drawings in which:
[0017] FIG. 1 is a block diagram of a data storage network
including host computer systems a data storage system and a backup
system and also including logic for enabling the method of the
present invention;
[0018] FIG. 2 is an exemplary representation of a computer-readable
medium encoded with the logic of FIG. 1 for enabling the method of
the present invention;
[0019] FIG. 3 is a schematic representation of the data storage
network of FIG. 1 in which the invention may be configured to
operate with standard and BCV devices for implementing the method
of this invention;
[0020] FIG. 4 is a representation of an embodiment of the logic of
FIG. 1 and showing a preferred functional structure;
[0021] FIG. 5 is a representation of a general overview of the
method steps of this invention;
[0022] FIG. 6 is a flow logic diagram illustrating some method
steps of the invention carried out by the logic of this invention
that are generally part of the steps shown in FIG. 5;
[0023] FIG. 7 is another flow logic diagram illustrating some
method steps of the invention carried out by the logic of this
invention that are generally part of the steps shown in FIG. 5;
[0024] FIG. 8 is a flow logic diagram illustrating some method
steps of the invention carried out by the logic of this invention
that are generally part of the steps shown in FIG. 5;
[0025] FIG. 9 is a flow logic diagram illustrating some method
steps of the invention carried out by the logic of this invention
that are generally part of the steps shown in FIG. 5;
[0026] FIG. 10 is a flow logic diagram illustrating some method
steps of the invention carried out by the logic of this invention
that are generally part of the steps shown in FIG. 5;
[0027] FIG. 11 is a flow logic diagram illustrating some method
steps of the invention carried out by the logic of this invention
which are generally part of the steps shown in FIG. 5; and
[0028] FIG. 12 is a flow logic diagram illustrating some method
steps of the invention carried out by the logic of this invention
that are generally part of the steps shown in FIG. 5;
[0029] FIG. 13 is a schematic block diagram of a storage system
having logical unit swapping in accordance with the present
invention shown in an initial configuration;
[0030] FIG. 14 is a schematic block diagram of the storage system
of FIG. 13 shown during creation of a logical unit mirror.
[0031] FIG. 15 is a schematic block diagram of the storage system
of FIG. 13 shown after creation of the mirror.
[0032] FIG. 16 is a schematic block diagram of the storage system
of FIG. 13 shown with logical unit swapping;
[0033] FIG. 17 is a schematic block diagram of the storage system
of FIG. 13 shown after restoration of data from the mirror of FIG.
13;
[0034] FIG. 18 is a flow diagram showing an exemplary top level
sequence of steps for implementing logical unit swapping in a
storage array in accordance with the present invention;
[0035] FIG. 19 is a flow diagram showing further details for a
portion of the flow diagram of FIG. 18;
[0036] FIG. 20 is a flow diagram showing a further portion of the
flow diagram of FIG. 18;
[0037] FIG. 21 is a flow diagram showing additional details for a
portion of the flow diagram of FIG. 20; and
[0038] FIG. 22 is a flow diagram showing further details for a
portion of the flow diagram of FIG. 20.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
[0039] The methods and apparatus of the present invention are
intended for use with data storage systems, such as the Symmetrix
Integrated Cache Disk Array system available from EMC Corporation
of Hopkinton, Mass. Specifically, this invention is directed to
methods and apparatus for use in systems of this type that include
transferring a mirrored set of data from a standard device to a
redundant device for use in applications such as backup or error
recovery, but which is not limited to such applications.
[0040] The methods and apparatus of this invention may take the
form, at least partially, of program code (i.e., instructions)
embodied in tangible media, such as floppy diskettes, CD-ROMs, hard
drives, random access or read only-memory, or any other
machine-readable storage medium. When the program code is loaded
into and executed by a machine, such as a computer, the machine
becomes an apparatus for practicing the invention. The methods and
apparatus of the present invention may also be embodied in the form
of program code that is transmitted over some transmission medium,
such as over electrical wiring or cabling, through fiber optics, or
via any other form of transmission. And may be implemented such
that herein, when the program code is received and loaded into and
executed by a machine, such as a computer, the machine becomes an
apparatus for practicing the invention. When implemented on a
general-purpose processor, the program code combines with the
processor to provide a unique apparatus that operates analogously
to specific logic circuits.
[0041] The logic for carrying out the method is embodied as part of
the system described below beginning with reference to FIG. 1. One
aspect of the invention is embodied as a method that is described
below with detailed specificity in reference to FIGS. 5-12.
Data Storage Environment Including Logic for This Invention
[0042] Referring now to, FIG. 1, reference is now made to a data
storage network 100 in which the invention is particularly useful
and includes a data storage system 119, host computer systems 113a
and 113b, and backup system 200.
[0043] In a preferred embodiment the data storage system is a
Symmetrix Integrated Cache Disk Arrays available from EMC
Corporation of Hopkinton, Mass. Such a data storage system and its
implementation is fully described in U.S. Pat No. 6,101,497 issued
Aug. 8, 2000, and also in U.S. Pat. No. 5,206,939 issued Apr. 27,
1993, each of which is assigned to EMC the assignee of this
invention and each of which is hereby incorporated by reference.
Consequently, the following discussion makes only general
references to the operation of such systems.
[0044] The invention is useful in an environment wherein
replicating to a local volume denoted as a business continuance
volume (BCV) is employed (FIG. 2). Such a local system which
employs mirroring for allowing access to production volumes while
performing backup is also described in the '497 patent incorporated
herein.
[0045] The data storage system 119 includes a system memory 114 and
sets or pluralities 115 and 116 of multiple data storage devices or
data stores. The system memory 114 can comprise a buffer or cache
memory; the storage devices in the pluralities 115 and 116 can
comprise disk storage devices, optical storage devices and the
like. However, in a preferred embodiment the storage devices are
disk storage devices. The sets 115 and 116 represent an array of
storage devices in any of a variety of known configurations.
[0046] A host adapter (HA) 117 provides communications between the
host system 113 and the system memory 114; disk adapters (DA) 120
and 121 provide pathways between the system memory 114 and the
storage device pluralities 115 and 116. A bus 122 interconnects the
system memory 114, the host adapters 117 and 118 and the disk
adapters 120 and 121. Each system memory 114 and 141 is used by
various elements within the respective systems to transfer
information and interact between the respective host adapters and
disk adapters.
[0047] A backup storage system 200 is connected to the data storage
system 119. The backup storage system is preferably an EMC Data
Manager (EDM) connected to the data storage system as described in
Symmetrix Connect User Guide, P/N 200-113-591, Rev. C, December
1997, available from EMC Corporation of Hopkinton, Mass. The direct
connection between the shared storage system and the backup storage
system may be provided as a high-speed data channel 123 such as a
SCSI cable or one or more fiber-channel cables. In this system, a
user may be permitted to backup data over the network or the direct
connection.
[0048] Backup system 200 includes a backup/restore server 202,
Logic 206 as part of the server, and a tape library unit 204 that
may include tape medium (not shown) and a robotic picker mechanism
(also not shown) as is available on the preferred EDM system.
[0049] Logic 206 is installed and becomes part of the EDM for
carrying out the method of this invention and the EDM becomes at
least part of a system for carrying out the invention. Logic 206 is
preferably embodied as software for carrying out the methods of
this invention and is preferably included at least as part of a
backup/restore server 202 in communication with the data storage
system 119 through an adapter 132 (e.g., a SCSI adapter) along
communication path 123. Substantially identical logic may also be
installed as software on any host computer system such as 113a or
113b, shown as logic 206a and 206b, respectively. In a preferred
embodiment the software is Unix-based and daemons are launched by
the software for execution where needed on the backup system, or
host computers. The deamons on each of these computers communicate
through sockets.
[0050] The Logic of this invention, in a preferred embodiment is
computer program code in the Perl programming language. As shown in
FIG. 2, it may be carried out from a computer-readable medium such
as CD-ROM 198 encoded with Logic 206 that acts in cooperation with
normal computer electronic memory as is known in the art. Perl is a
Unix-based language (see e.g. Programming Perl, 2nd Edition by
Larry Wall, Randal I. Shwartz, and Tom Christiansen, published by
O'Reilly and Associates). Nevertheless, one skilled in the computer
arts will recognize that the logic, which may be implemented
interchangeably as hardware or software may be implemented in
various fashions in accordance with the teachings presented
now.
[0051] Generally speaking, the data storage system 119 operates in
response to commands from one or more computer or host systems,
such as the host systems 113a and 113b, that are each connected via
a host adapter, such as host adapters 117 and 118. The host
adapters 117 and 118 transfer commands to a command buffer that is
part of system memory 114. The command buffer stores data
structures and write requests that the disk adapters generate. The
disk adapters, such as the disk adapters 120 or 121, respond by
effecting a corresponding operation using the information in a
command buffer. The selected disk adapter then initiates a data
operation. Reading operations transfer data from the storage
devices to the system memory 114 through a corresponding disk
adapter and subsequently transfer data from the system memory 114
to the corresponding host adapter, such as host adapter 117 when
the host system 113a initiates the data writing operation.
[0052] The computer systems 113a and 113b may be any conventional
computing system, each having an operating system, such as a system
available from Sun Microsystems, and running the Solaris operating
system (a version of Unix), an HP system running HP-UX (a
Hewlett-Packard client, running a Hewlett-Packard version of the
Unix operating system) or an IBM system running the AIX operating
system (an IBM version of Unix) or any other system with an
associated operating system such as the WINDOWS NT operating
system.
[0053] A short description of concepts useful for understanding
this invention and known in the art is now given. A physical disk
is formatted into a "physical volume" for use by the management
software such Logical Volume Manager (LVM) software available from
EMC. Each physical volume is split up into discrete chunks, called
physical partitions or physical extents. Physical volumes are
combined into a "volume group." A volume group is thus a collection
of disks, treated as one large storage area. A "logical volume"
consists of some number of physical partitions/extents, allocated
from a single volume group. A "filesystem" is simply stated a
structure or a collection of files. In Unix, filesystem can refer
to two very distinct things: the directory tree or the arrangement
of files on disk partitions.
[0054] Below is a short description of other useful terminology
which may be understood in more detail with reference to the
incorporated '497 patent. When a mirror is "established" the data
storage system 119 creates a mirror image (copy or replication) of
a source or standard volume. When using the preferred Symmetrix
such a mirror is denoted as a business continuance volume (BCV),
also referred to in general terms as a mirrored disk. and in such a
context specifically as a BCV device. If data on the standard
volume changes, the same changes are immediately applied to the
mirrored disk. When a mirror is "split" the preferred Symmetrix
data storage system 119 isolates the mirrored version of the disk
and no further changes are applied to the mirrored volume. After a
split is complete, the primary disk can continue to change but the
mirror maintains the point-in-time data that existed at the time of
the split.
[0055] Mirrors can be "synchronized" in either direction (i.e.,
from the BCV to the standard or visa versa). For example, changes
from the standard volume that occurred after a split to the mirror
can be applied to the BCV or mirrored disk. This brings the
mirrored disk current with the standard. If you synchronize in the
other direction you can make the primary disk match the mirror.
This is often the final step during a restore.
[0056] The operation of a BCV device and its corresponding BCV
volume or volumes is more readily understood in terms of data sets
stored in logical volumes and is useful for understanding the
present invention. As known, any given logical volume may be stored
on a portion or all of one physical disk drive or on two or more
disk drives.
[0057] Referring to FIG. 3, in this particular embodiment, disk
adapter 120 (FIG. 1) controls the operations of a series of
physical disks 115 that are shown in FIG. 3 in terms of logical
volumes 212. The segmentation or hypering of physical disks into
logical volumes is well known in the art.
[0058] Similarly a disk adapter interfaces logical volumes 214 to
the data storage system bus 122 (FIG. 1). Each of these volumes 214
is defined as a Business Continuation Volume and is designated a
BCV device. The concept of BCV's are described in detail in the
incorporated '497 patent so will be only generally descried herein.
Each BCV device comprises a standard disk controller and related
disk storage devices as shown in FIG. 1 especially configured to
independently support applications and processes. The use of these
BCV devices enables a host such as host 113a, described from here
on as the "source" host computer system to utilize instantaneous
copies of the data in the standard volumes 212. In a conventional
operations there typically will be at least one BCV volume assigned
to each host device that will operate on a data set concurrently.
However, as will be explained below, this invention, in particular
logic 206 and its counterparts 206a and 206b add additional
function so that the BCV volumes established for use by one host
may be used by another host, such as host 113b, described from here
on as the "target" host computer system.
[0059] Although the invention has particular advantages when the
target and source host computer system are separate distinct
computers, there may also be advantages in having the two combined
together. Thus, the target and source computer may be integrally
combined as one computer.
[0060] Referring again to FIG. 3, host 113a may continue online
transaction processing (such as database transaction processing) or
other processing without any impact or load on the volumes 212,
while their respective mirror images on BCV's 214 are used to back
up data in cooperation with backup system 200. However using the
Logic of this invention the BCV's may be established for use on
another host substantially automatically under control of a
computer program, rather than requiring intervention of an operator
all along the way. The advantages and details associated with such
an operation are described below.
[0061] The direction of data flow for backup is from the data
storage system 119 to the backup system 200 as represented by arrow
211. The direction of data flow for restore is to the data storage
system (opposite from arrow 211 ), but the BCV's may be mounted on
another host other than the one originally established in
accordance with the method of this invention.
[0062] The EDM normally offers several options for controlling
mirror behavior before and after a backup or restore, which are
incorporated with this invention and are therefore discussed now at
a high level. (Further detail about such known polices may be found
in a white paper available from EMC: Robert Boudrie and David
Dysert, EMC Best Practices: Symmetrix Connect and File Level
Granularity.)
Pre-Backup Mirror Policy
[0063] Bring Mirrors Down--This option expects to find the mirrors
established and it will split the mirrors automatically before the
backup. If the mirrors are down already, the backup will fail and
report an error. The error is designed to prevent the system from
backing up mirrors that are in an unexpected state.
[0064] Verify Mirrors are Down--This option expects to find the
mirrors split and it will leave them split and perform the backup.
If the mirrors are established at the time of backup, the backup
will fail and report an error. This error is designed to ensure
that the backup is taken for the specific point in time that the
mirrored data represents.
[0065] Bring Mirrors Down if Needed--This option checks whether the
mirrors are established or split and it will split the mirrors if
they are established. If you select this option, the backup will
not fail regardless of the state of the mirrors.
[0066] Bring Mirrors Down after Establishing--This option checks
the mirrors and if they are not established, the EDM first
establishes the mirror to ensure that it is an exact copy of data
on the primary volume. Then the EDM splits the mirrors to perform
the backup.
Post-Backup Mirror Policy
[0067] During post-backup processing, mirror management can be
configured to do any of the following:
[0068] Bring Mirrors Up--After the restore is complete, the EDM
automatically resynchronizes the mirror to the primary disk.
[0069] Leave Mirrors Down--After the restore is complete, the EDM
leaves the mirrors split from the primary disk.
[0070] Leave Mirrors as Found--After the restore is complete, the
EDM resynchronizes the mirrors to the primary disk if they were
established to begin with. If not, the EDM leaves the mirrors
split.
[0071] The invention includes a method for managing data that may
be replicated across one or more computer systems. The method is
carried out in the above-described environment by the Logic of this
invention, which in a preferred embodiment is a program code in the
Perl programming language as mentioned above.
[0072] The method includes the computer-executed steps of
establishing one or more mirrored copies of data (BCV's) that are
copies of one or more volumes of data (Standard Volumes). The BCV's
are established in a conventional manner as described in the
incorporated '497 patent. The BCV's are separated or split from the
respective one more volumes of data in a conventional manner and
which is also described in the incorporated '497 patent.
[0073] The Standard volumes are part of a volume group on the
source computer system 113a that has an operating system 210a (FIG.
3). The operating system is preferably a Unix operating system,
such as Solaris from Sun Microsystems of California, AIX from IBM
of New York, or HP-UX from Hewlett Packard of California.
[0074] The method further includes discovering logical information
related to the Standard volumes that are part of the volume group
on the source computer system 113a. A map of the logical
information to physical devices on the source computer system is
created, preferably in the form of a flat file that may be
converted into a tree structure for fast verification of the
logical information. That map is used to build a substantially
identical logical configuration on the target computer system 113b,
preferably after the logical information has been verified by using
a tree structure configuration of the logical information.
[0075] The logical configuration is used to mount a duplicate of
the BCV's on the target computer system (denoted as mounted target
BCV's). The newly mounted target BCV's then become part of a second
volume group on the target computer system 113b that has an
operating system 210b. The operating system is preferably a Unix
operating system, such as Solaris from Sun Microsystems of
California, AIX from IBM of New York, or HP-UX from Hewlett Packard
of California.
[0076] The invention is particularly useful when data on the
standard volumes and BCV's represents data related to an
application 208a and/or application 208b, and in particular the
invention is particularly useful if the application is a database,
such as an Oracle database available from Oracle Corporation of
Redwood, Calif.
[0077] Referring to FIG. 4, the logic 206 includes program code
that enables certain functions and may be thought of as code
modules, although the code may or may not be actually structured or
compartmentalized in modular form, i.e., this illustrated concept
is more logical than physical. Accordingly, D/M module 300 serves a
discovery/mapping function; E/S module 302 serves an
establish/split function; B/M module 304 serves a build/mount
function; B/R module 306 serves a backup/restore function; and D/C
module 308 serves a dismount/cleanup function. Any of the functions
may be accomplished by calling a procedure for running such a
function as part of the data storage system and the backup
system.
[0078] The discovery/mapping function, discovers and maps logical
to physical devices on the source host 113a, and includes such
information as physical and logical volumes, volume groups, and
file system information. The establish/split function establishes
BCV's or splits such from standard volumes, depending on the pre-
and post-mirror policies in effect on source host 113a.
[0079] The build/mount function substantially exports the BCV's
established on the source host 113a to the target host 113b. It
creates volume group, logical volume, and file system objects on
the target host computer system.
[0080] The backup/restore function performs backup of the target
host BCV data that has been exported or migrated from the source
host. The dismount/cleanup function removes all volume group,
logical volume, and filesystem objects from the target host.
Method Steps of the Invention
[0081] Now for a better understanding of the method steps of this
invention the steps are described in detail with reference to FIGS.
5-12.
[0082] FIG. 5 shows an overview of the entire process. In step 400
the logic 206 maps logical to physical devices on the source host.
In a step 402, the logic establishes or splits standard to BCV's
(which may be accomplished by a call to another function on the
data storage system) in accordance with the mirror policy in effect
at the source host. Step 404, logic builds and mounts on the target
host so that the BCV's are exported or migrated to the target host.
Step 406 is a step for Backup and/or Restore, as described in more
detail below. Step 408 is a cleanup step in which all volume group
logical volume, and filesystem objects are removed from the target
server.
[0083] FIG. 6 is an overview of the steps of the mapping and
discovery process. Such processing begins in step 500. The
filesystem is discovered on the source host in step 502. The
logical volume is discovered in step 504. The volume group
information is discovered on the source host in step 506. In step
508, the map is created preferably as a flat file because that is
an efficient data structure for compiling and using the
information.
[0084] For mapping purposes, in general, the method uses a data
storage system input file. Preferably the input file is a
three-column file that contains a list of the standard and BCV
device serial numbers containing the data and the data copies
respectively, and the physical address of the BCV devices.
[0085] The following is an example of this file:
1TABLE 1 Example of data storage system input file Standard (STD)
Device BCV Dev 7902C000 790B4000 7902D000 790B5000 7902E000
790B6000 7902F00 790B7000
[0086] An example of how such a map is created in the preferred
embodiment for each of the preferred operating systems: Solaris,
AIX, and HP-UX is now shown in tables 2-4, in the respective order
mentioned.
2TABLE 2 Mapping information for Sun Solaris: Mapping file (for SUN
Solaris) The .std Mapping file is generated by a Unix-based call
with the -std option flag. The .std Mapping file is a
multi-columned file of information about the Standard devices The
columns may include: 1. Device Serial Number-from the Data Storage
system input file 2. Physical Address (i.e., c0d0t1) 3. Volume
Group 4. Logical Volume Name 5. File Type 6. Mount Point 7. Serial
Number 8. Device Type
[0087] The following is an example of a flat file using such
information for a Solaris operating system:
[0088] 3701F000 clt0d0s2 testvg1 vol01 ufs/mir1
947015961.1105.sunmir2 sliced
[0089] 3701F000 clt0d0s2 testvg1 vol02 ufs/mir1
947015961.1105.sunmir2 sliced
[0090] 37020000 clt0d1s2 testvg1 vol01 ufs/mir1
947015961.1105.sunmir2 sliced
[0091] 37020000 clt0d1s2 testvg1 vol02 ufs/mir1
947015961.1105.sunmir2 sliced
3TABLE 3 Mapping information for IBM AIX: Mapping file (for IBM
AIX) The .std Mapping file is generated by a Unix-based call with
the -std option flag. The .std Mapping file is a multi-columned
file of information about the Standard devices The columns may
include: 1. Device Serial Number-from the Data Storage System 2.
Physical Address (i.e., hdisk1) 3. Volume Group 4. Logical Volume
Name 5. Volume Group Partition Size 6. File Type 7. Mount Point 8.
Logical Volume Partition size 9. Logical Volume source journal log
10. Logical Volume number of devices striped over 11. Logical
Volume Stripe size
[0092] The following is an example of a flat file using such
information for an AIX operating system:
[0093] 37006000 hdisk1 testvg2-2 testvg2-lv01 4 jfs/testvg2/mntpt1
25 loglv02 N/A N/A
[0094] 37006000 hdisk1 testvg2-2 testvg2-lv02 4
jfs/testvg2/mntpt1/mntpt2 25 loglv02 N/A N/A
[0095] 37006000 hdisk1 testvg2-2 testvg2-lv03 4 jfs/testvg2-3 25
loglv02 N/A N/A
[0096] 37006000 hdisk1 testvg2-2 testvg2-lv04 4 jfs/testvg2-4 25
loglv02 N/A N/A
4TABLE 4 Mapping information for HP-UX: Mapping file (for HP-UX)
The .std Mapping file is generated by a Unix-based call with the
-std option flag. The .std Mapping file is a multi-columned file of
information about the Standard devices The columns may include: 1.
Device Serial Number-from the data storage system input file 2.
Physical Address (i.e., c0d0t1) 3. Volume Group 4. Logical Volume
Name 5. Logical Volume Number 6. File Type 7. Mount Point
[0097] The following is an example of a flat file using such
information for an AIX operating system:
[0098] 7903A000 c3t8d2 vgedm2 lvt8d2 1 vxfs/t8d2
[0099] 7903B000 c3t8d3 vgedm2 lvt8d3 2 vxfs/t8d3
[0100] 7903C000 c3t8d4 vgedm2 lvt8d4 3 vxfs/t8d4
[0101] 7903D000 c3t8d5 vgedm2 lvt8d5 4 vxfs/t8d5
[0102] Referring now to FIG. 7, step 600 uses the flat file to
create a tree structure. This structure is preferably built by a
unix function call from information in the mapping files described
above. It may be built on both the target host computer system and
the source host computer system. It is referred to as a tree
because the Volume group information may be placed as the root of
the tree and the branches represent the device information within
the group and the logical volumes within the group. It is used in
step 602 to verify the accuracy of the map file before the map file
is sent to the target host. The tree is converted to a map
preferably as a flat file in step 604. This flat file map is then
sent back to the target in step 606.
[0103] Referring to FIG. 8, the process of establishing/splitting
with a backup system is started in step 700. The mirror policy is
checked in step 702. An inquiry is posed in step 704 to determine
if BCV's are established in accordance with the mirror policy. If
the answer is no then BCV's are established in step 706. The BCV's
are split from the source host in step 708. The BCV's are made not
ready to the host in step 710.
[0104] Referring to FIG. 9, the process of beginning to build/mount
logical information so the BCV's can be mounted on the target is
begun in step 800. The volume groups are created on the target is
step 802. Logical volumes are created on the target in step 804.
The filesystem is created on the target in step 806. The device
mount may now be completed with this logical information related to
the BCV's on the target host in step 808.
[0105] Referring to FIG. 10, the newly mounted target BCV's may now
be backed up in step 900. The application is then shut down on the
target in step 902. And following the backup of the target BCV's
cleanup steps as described in FIG. 12 and notification take place
in step 904.
[0106] If the software application on the target host in the source
host is a database, then information related to the data may also
be backed up, with the effect that essentially the entire database
is backed up. Important information from the database includes any
transactional data performed by the database operations, and
related control files, table spaces, and archives/redo logs.
[0107] Regarding databases, these are other terminology are now
discussed. The terminology is described with reference to an Oracle
database because that is the preferred embodiment but one skilled
in the art will recognize that other databases may be used with
this invention.
[0108] Control files contain important information in the Oracle
database, including information that describes the instance where
the datafiles and log files reside. Datafiles may be files on the
operating system filesystem. A related term is tablespace that is
the lowest logical layer of the Oracle data structure. The
tablespace consists of one or more datafiles. The tablespace is
important in that it provides the finest granularity for laying out
data across datafiles.
[0109] In the database there are archive files known as redo log
files or simply as the redo log. This is where information that
will be used in a restore operation is kept. Without the redo log
files a system failure would render the data unrecoverable. When a
log switch occurs, the log records in the filled redo log file are
copied to an archive log file if archiving is enabled.
[0110] Referring now to FIG. 11, the process for restoring source
standard volumes is shown beginning at step 1000. Step 1002, poses
an inquiry to determine if the restore is to be from the BCV's on
the target or tape. In accordance with the answer the standard
volumes are synchronized or restored from the target mounted BCV's
or tape, respectively in steps 1004 or 1006. Step 1008 begins the
notification and cleanup steps that are generally described in FIG.
12.
[0111] The cleanup/dismount process begins in step in 1100 as shown
in FIG. 12. The BCV's are dismount from the target in step 1102.
This may be accomplished for example with the UNIX umount command.
The objects related to volume group, logical volume, and filesystem
or move the target in steps 1104 and 1106. The cleanup is completed
in step 1108. The BCV's are re-established on the source (i.e.,
made ready to the host) in step in 1108.
[0112] In another aspect of the invention, a data storage system
includes a storage array having logical volumes or units that can
be accessed by one or more clients via a switch. A first logical
unit can be replicated to create a copy, i.e., a mirrored BCV, of
the first logical unit within the storage array. At a given time,
the mirrors are split so that write operations to disk no longer
affect the copy. In the case where the first logical unit is no
longer accessible, such as due to disk failure, the storage array
can provide access to the copy of the first logical unit by the
client by swapping the logical unit accessed by the host. In one
embodiment, the client and/or client application is not aware that
the first logical unit, e.g., original or source, logical unit is
no longer being accessed. If desired, a restore can be performed
from the copy to the first logical unit and application access to
the first logical unit can be provided after mirror synchronization
for the restore is complete.
[0113] FIG. 13 shows an exemplary system 1200 including a storage
array 1202 having a series of logical units 1204a-N and a client
1206 that can access the storage array 1202 via a switch 1208, such
as a Fiber Channel Switch. In an exemplary embodiment, the client
1206, such as a Unix-based workstation, includes an adapter 1206a,
a disk 1206b, and an application 1206c, e.g., an Oracle database.
The storage array 1202 includes a host/disk adapter 1203 for
providing access to the logical volumes or units, as described
above.
[0114] In an initial configuration, the client 1206 and/or
application 1206 a access the first logical unit 1204a as described
above. That is, the first logical unit 1204a is presented to the
client 1206 as a SCSI device having SCSI attributes, e.g., Port 0,
target, e.g., target 0, and LUN address, e.g., LUN 0.
[0115] As shown in FIG. 14, a copy of the first logical unit 1204a
can be created on a second logical unit 1204b of the storage array
1202. That is, mirror synchronization occurs and subsequent writes
to the first logical unit 1204a are also updated on the second
logical unit 1204b. As shown in FIG. 15, at a given time, after
synchronization, the mirror is split so that writes to the first
logical unit 1204a are no longer made to the copy on the second
logical unit 1204b. At this point, the first logical unit 1204a
contains the same data as the client disk 1206b, for example, and
the second logical unit 1202b is a point-in-time copy of the first
logical unit 1204a that can be restored to the first logical unit
1204a at a later time.
[0116] Due to some type of disk failure or application failure in
the application 1206c, for example, the first logical unit 1204a
may no longer be available/reliable so that it would be desirable
to access data from the copy on the second logical unit 1204b, as
shown in FIG. 16. In one particular embodiment, the storage array
1202 provides access to the second logical unit 1204b, i.e., the
copy, instead of the first logical unit 1204a without the client's
knowledge. That is, the storage array 1202 swaps client access from
the first logical unit 1204a to the second logical unit 1204b, as
described more fully below. With this arrangement, a disk-based
"instant restore" by logical unit swapping can be provided.
[0117] The first logical unit 1204a, which can be provided as a new
disk, can be restored from the second logical unit 1204b after
client application access to disk 1206b is stopped during the
mirror synchronization.
[0118] As shown in FIG. 17, the storage array 1202 can optionally
again provide a connection to the first logical unit 1204a for the
client 1206 and retain the copy on the second logical unit 1204b.
The first logical unit 1204a is now the restored contents of the
copy on the second logical unit 1204b, which is available for
subsequent restore operations as desired.
[0119] FIG. 18 shows a high level flow diagram having an exemplary
sequence of steps for implementing data restores by logical unit
swapping in accordance with the present invention. In step 2000, a
solve for storage module, which is described in FIG. 19, obtains
host, storage set, logical unit (LUN) info, and the like, that is
associated with the restore. The data replication for a first
logical unit is performed in step 2002. And in step 2004, the data
is restored, as described in detail below.
[0120] FIG. 19 shows further details of the solve for storage
module of FIG. 18. The solve for storage module for various storage
types, such as the original host device, storage set, and LUN,
which define the STD and the BCV. In step 2102, for each OSO
(operating system object) that is a file in a file system or raw
device, the underlying filesystem is identified. For each
filesystem name, the physical device (raw device) node(s) is
discovered in step 2104. In step 2106, the type of storage the
physical device resides on, e.g, Symmetrix, is determined.
[0121] In step 2108, the storage array details are obtained, such
as WWN, subsystem name, and any other information that would be
required to identify and communicate to the particular storage
array. In step 2110, the system determines the storage array device
unit name to which the physical name corresponds. This converts a
host physical device (e.g. /dev/rdsk/c?t?d?s?) to the corresponding
LUN on the array. In step 2112, the system checks the STD unit type
to see if it can be replicated. In general, the logical unit can
contain data in one of the following formats: JBOD (just a bunch of
disks), RAID 0 or RAID 1 or RAID 0+1. Unless otherwise specified,
RAID1 is assumed in this description. The unit should not be a
partition of a storage set.
[0122] These formats are well known to one of ordinary skill in the
art. In general, RAID 0 (Redundant Array of Independent Disks,
level 0) refers to a storageset, which is known as a stripeset,
that includes striped data across an array of disks. A single
logical disk can span a number of physical disks. Note that RAID 0
does not provide redundancy. RAID 1 refers to a storageset, which
is known as a mirrorset, of at least two physical disks that
provide an independent copy of the virtual disk. RAID 0+1 refers to
a storageset that stripes data across an array of disks and mirrors
the data to a BCV.
[0123] In step 2114, the system identifies possible "quick"
replication disks if the STD unit is RAID 1 or RAID (0+1). In an
exemplary embodiment, three or more member mirrors indicates that
there is at least one mirror that can be used for a quick
replication. Without three member mirrors, the replication will
take longer. In step 2116, the system obtains storageset details
for the STD unit. Exemplary information of interest includes a list
of disk members for the stripeset, mirrorset or striped mirrorset,
and the size of each member.
[0124] FIG. 20 shows a top level sequence of steps for implementing
a restore with logical unit swapping in accordance with the present
invention. In step 2200, the system prepares for the replication.
In an exemplary embodiment, preparation includes adding a
replication disk to the STD mirrorset if the mirrorset has less
than a predetermined number, e.g., three, members. If a mirrorset
has three or more members when the replication is run, one of the
disks already in the mirrorset is selected for its solution.
Exemplary commands are set forth below;
[0125] SET<MIRRORSETNAME>
[0126] MEMBERSHIP=<CURRENT_MEMBERS>+1
[0127] SET<MIRRORSETNAME>NOPOLICY
[0128] SET<MIRRORSETNAME>REPLACE=<CLONE_DISK>
[0129] SET<MIRRORSETNAME>POLICY=<PREVIOUS POLICY>
[0130] The system then checks the state of the STD mirrorset
members and waits until all members are in a normalized
(synchronized) state.
[0131] In step 2202, the system executes the desired replication.
First, the cloned BCV disk is split or reduced from the STD mirror
set using a split or reduce command, for example. Alternatively,
the system immediately replaces the reduced disks with additional
disks so that the mirrorset is always in a position for a quick
replication. It is understood that a quick replication refers to a
process in which the split can occur without having to wait for
normalization. The system then creates a one-member BCV mirrorset
from the reduced disk. The BCV mirrorset is then initialized so
that it is not destroyed and the name of the just-created mirrorset
is saved. In step 2204, the system executes the mount further
details of which are described below in connection with FIG.
21.
[0132] Referring now to FIG. 21, in step 2300, the system obtains
mount host connection information. It is assumed that the mount
host has a valid connection(s). The system also obtains mount host
YIBA information, such as the WWN(s). In general, the host will
have multiple connections if it has multiple HBAs. The mount host
information can be determined by the connection data.
[0133] In step 2302, the system defines the BCV LUN for mount host
connection(s). It is understood that the BCV LUN is assigned a LUN
based on the offset value of the target connection(s). Offset
control refers to the LUN number of the unit that will be made
available to the host. In one embodiment, the initial visibility of
the LUN is to no device. In an exemplary embodiment, client
connections have decimal offsets for logical units. A logical unit
number in an offset for a client has that client "see" the storage
defined by that logical unit.
[0134] The system, in step 2304, then assigns the BCV LUN to a
mount host connection(s). For hosts with multi-paths, such as
SecurePath or other host-based device failover software, the system
assigns the LUN to the all paths involved in the multi-path of the
host where the system is to mount the replica. The system should
also be aware of the mode in which the controller is configured,
e.g., transparent failover or multiple bus failover. The system can
also consider the connection offset in order to determine the LUNs
that are visible on the connection to the HBA. The preferred path
can also be considered.
[0135] The system makes dynamic BCV LUNs visible on the mount host
OS in step 2306. In an exemplary embodiment, this occurs without
rebooting the mount host with the host OS supporting so-called bus
rescans. The BCV LUN should be mounted as a specific node.
Alternatively, the system can determine the node assigned to the
LUN. In step 2308, the system imports volume groups and mounts the
filesystem.
[0136] It will be readily apparent to one of ordinary skill in the
art that the invention is applicable to a range of other storage
configurations including other RAID configurations. For example,
steps to handle a RAID 0+1 configuration are similar to those
described above for RAID 1. In this case, there is a restoration of
a replication of a filesystem built on a striped mirrorset (RAID
0+1). It is assumed that the Solve for Storage routine has run
successfully. A striped mirrorset is a stripeset where each of the
members of the stripe has one or more mirrors.
[0137] Preparation for the replication is the same as that
described for RAID 1 above except a) that there will be more than
one disk to add, since it is a stripe and each column of the stripe
needs to have the mirror disk added; and b) there will be more than
one set of mirrors to check, since it is a stripe and each column
of the stripe needs to have its mirrors normalized. To execute the
replication, the BCV disks from the striped mirror set are reduced
and the mirrors in a striped mirrorset are reduced in one
operation. A BCV stripeset is created from the reduced disks. The
order of the disks should be the same as the stripeset members that
the clone disks came from. The remaining replication steps are
substantially similar to those of RAID 1 except that a) the system
initializes the replication or clone stripeset that was just
created, and b) the system saves the name of the just-created
stripeset. The execute mount procedure is also substantially
similar to RAID 1 except that the system defines the LUN from the
previously created clone stripeset.
[0138] Replication of a filesystem on a stripeset (RAID 0) is
similar to that described above for RAID 1 and RAID 0 formats. For
RAID 0 replications, the STD unit is a stripeset of two or more
disks. There are no mirrors of the unit, so the system creates a
temporary striped mirrorset or RAID 0+1 to create a replication or
clone of the stripeset. After the cloning has taken place, the
system deletes any temporary containers that were created. To
prepare the RAID 0 replication, the system converts a STD stripeset
to a striped mirrorset. Each disk member of the stripeset is
temporarily turned into a mirrorset. Execution of the replication
is substantially similar to RAID 0+1 and execution of the mount is
substantially similar to RAID 1.
[0139] FIG. 22 shows an exemplary implementation of restoring a
RAID 1 replication with logical unit swapping in accordance with
the present invention. The below description assumes that the unit
being restored to already exists. For RAW (raw device) backup and
replication, the system needs the device node to exist just as it
was when the replica was taken. The information can come from
catalog information created at replication time and can be managed
by the an instant restore daemon (IRD). In general, filesystems
will be created as necessary. It is understood that the raw device
is how the OS perceives the LUN. Filesystems are built on top of
raw devices. An application can use a raw device or filesystem to
store data.
[0140] In step 2400, the system discovers, for the physical device
node(s), the filesystem name being restored to. If the replication
was a RAW device, an instant restore daemon (IRD) provides the
device node to which the system is restoring. In step 2402, the
system determines the STD storage type, e.g., Symmetrix. Storage
array details for the STD physical device are obtained in step
2402. Exemplary details include world wide name (WWN), subsystem
name, and any other information that would be required to
communicate to the particular storage array. In step 2406, the
system determines the array device unit name to which the STD
physical name corresponds. This converts a host physical device
(e.g. /dev/rdsk/c?t?d?s?) for a Solaris system raw device) to the
corresponding LUN on the array. In one embodiment, information from
a SymAPI call is used.
[0141] In step 2408, the system obtains STD mirrorset details for
the unit, such as a list of disk members for the mirrorset, the
size of each member (may be different), status of mirrorset, and
fail if restore is not possible. The filesystem is unmounted in
step 2410, such as by a function call. In step 2412, the system
disables volume groups residing on discovered nodes and in step
2414, the system renders the STD device "not ready" on the host OS.
The "not ready" status of the STD device is preferred since the
system may delete the unit. In one embodiment, the STD is made "not
ready" after disabling access to the LUN (step 2416 ).
[0142] In step 2416, the system saves the STD LUN information and
the STD mirrorset name. In step 2418, the system disables host
access to the STD LUN and in step 2420 the system deletes the
unit(s) associated with the STD LUN. The system then deletes the
STD mirrorset in step 2422. It is understood that the mirrorset
will be recreated after the restore.
[0143] One member of the STD mirrorset is added to the BCV
mirrorset in step 2424. The BCV mirrorset has the data the system
is to restore from. The system can reuse the spindles/disks that
belonged to the units that were just deleted. Note that the user
should be responsible for creating a LUN with the same RAID
configuration and same number of members as the replication or
clone. Also note that in the case of a RAID 1 or RAID 0+1 restore,
there is still at least one copy of the STD device until the
restore from the BCV has completed. This copy is preserved from
step 2420 and 2422 and can be retrieved if the restore fails.
[0144] In step 2426, the system waits during the synchronization
process for the BCV mirrorset to normalize. In step 2428, the
system reduces the STD disk from BCV mirrorset. In one embodiment,
the system reduces the disks that were added to the mirrorset. The
STD mirrorset is then re-created in step 2430, such as by using the
restored STD disk and the other members of original STD
mirrorset.
[0145] It is understood that step 2426 through 2434 are unnecessary
to perform a so-called instant restore for which waiting,
normalization, and preservation of the BCV are not needed.
[0146] The system then waits for the STD mirrorset to normalize in
step 2432. Note that it may be possible to skip this step since the
data has been restored, but it is currently unprotected by mirrors
while the synchronization is taking place. It may be possible to
initiate the mirror synchronization process and then continue. In
step 2434, the restored mirrorset is initialized so that it is not
destroyed. In step 2436, STD LUN is re-created using the re-created
STD mirrorset.
[0147] In step 2438, the system assigns the STD LUN to a
connection(s). In one embodiment, the system assigns the same unit
numbers that were discovered previously. It is assumed that
assigning the same unit/LUN number will make it accessible to the
same OS node. In step 2440, the OS recognizes the new LUN. And in
step 2442, the filesystem is mounted or a logical volume is created
to complete the restore.
[0148] It is understood that replication in other configurations,
such as RAID 0 and RAID 0 +1, is well within the scope of the
invention. For a RAID 0 (non-mirrored stripes) restoration, it can
be assumed that the unit being restored to already exists. For RAW
backup and replication, the system may need the device node to
exist just as it was when the replica was taken. Filesystems will
be created as necessary.
[0149] Some of the differences between a RAID 0 restoration and a
RAID 1 restoration are described below in conjunction with a
comparable RAID 1 step in step FIG. 22. Referring to step 2416, in
the RAID 0 restoration, the saved STD stripeset information can
include the stripeset name and the stripeset disk member order,
which can be re-created after the restore. In contrast to step
2424, in a RAID 0 format the system makes each BCV stripeset member
into one-member mirrorset. This creates a striped mirrorset (RAID
0+1) from the BCV stripeset. In the equivalent step 2426, for RAID
0 restoration, the system adds each member of the STD stripeset to
the corresponding BCV mirrorset. The BCV striped mirrorset has the
data from which the restoration is performed. Here the system can
reuse the spindles/disks that belonged to the units of the STD
stripeset that was just deleted. Note that unlike the case of a
RAID 1 restore, there are no valid copies of any STD device once
the restore from the BCV has started.
[0150] In the comparable 2430 step of the RAID 1 restoration
process, in the RAID 0 restoration the STD stripeset is re-created
using restored STD disk and the other members of original STD
stripeset. The order of disks added to stripeset should be noted.
The restored stripeset is then initialized so that it will not be
destroyed, the STD LUN is re-created, and the STD LUN is assigned
to a connection. The comparable steps of 2440 and 2442 are then
performed.
[0151] Restoration of a RAID 0+1 (striped mirrors) replication is
similar to a RAID 0 restoration with some differences discussed
below. It is assumed that the unit being restored to already
exists. For RAW backup and replication, the systems may need the
device node to exist just as it was when the replica was taken.
Filesystems will be created as necessary. In the equivalent of RAID
1 step 2424, the system adds one member of the STD stripeset to the
corresponding BCV mirrorset, which contains the data from which the
restoration is performed. The system can reuse the spindles/disks
that belonged to the units of the STD stripeset that were just
deleted. Note that like the case of a RAID 1 restore, there is a
valid copy of each STD device once the restore from the BCV has
started. The remaining steps are substantially similar to those
described above in conjunction with RAID 0 and/or RAID 1 and will
be readily apparent to one of ordinary skill in the art in view of
the description contained herein.
[0152] It is understood that the invention is applicable to a
variety of known storage systems including Symmetrix and Clarion
systems by EMC Corporation and HP/Compaq StorageWorks systems, such
as the HSG80 system.
[0153] A system and method has been described for managing data
that may be replicated across one or more computer systems. Having
described a preferred embodiment of the present invention, it may
occur to skilled artisans to incorporate these concepts into other
embodiments. Nevertheless, this invention should not be limited to
the disclosed embodiment, but rather only by the spirit and scope
of the following claims and their equivalents. One skilled in the
art will appreciate further features and advantages of the
invention based on the above-described embodiments. All
publications and references cited herein are expressly incorporated
herein by reference in their entirety.
* * * * *