U.S. patent application number 14/133795 was filed with the patent office on 2014-09-18 for storage system, storage apparatus, and computer product.
This patent application is currently assigned to FUJITSU LIMITED. The applicant listed for this patent is FUJITSU LIMITED. Invention is credited to Kenichi Fujita, Hiroshi Murayama, Tsuyoshi Uchida.
Application Number | 20140281337 14/133795 |
Document ID | / |
Family ID | 49916886 |
Filed Date | 2014-09-18 |
United States Patent
Application |
20140281337 |
Kind Code |
A1 |
Fujita; Kenichi ; et
al. |
September 18, 2014 |
STORAGE SYSTEM, STORAGE APPARATUS, AND COMPUTER PRODUCT
Abstract
A storage system includes a storage apparatus having a first
storage unit having first storage and a first storage control unit
controlling access to the first storage, and a first control unit
controlling storage units including the first storage unit; a
second storage unit having second storage and a second storage
control unit controlling access to the second storage; and a second
control unit controlling storage units including the second storage
unit. The second storage unit and second control unit are added to
the storage apparatus. the first control unit includes a memory
unit storing allocation information including an allocation state
of storage areas of the first and second storage, and a processor
configured to execute rearrangement control of an allocated storage
area based on the allocation information corresponding to
unevenness between a storage capacity of an allocated storage area
in the first storage and that in the second storage.
Inventors: |
Fujita; Kenichi; (Nagoya,
JP) ; Murayama; Hiroshi; (Fuji, JP) ; Uchida;
Tsuyoshi; (Kawasaki, JP) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
FUJITSU LIMITED |
Kawasaki-shi |
|
JP |
|
|
Assignee: |
FUJITSU LIMITED
Kawasaki-shi
JP
|
Family ID: |
49916886 |
Appl. No.: |
14/133795 |
Filed: |
December 19, 2013 |
Current U.S.
Class: |
711/170 |
Current CPC
Class: |
G06F 3/0647 20130101;
G06F 3/0607 20130101; G06F 12/023 20130101; G06F 3/0689 20130101;
G06F 3/0631 20130101; G06F 3/0665 20130101 |
Class at
Publication: |
711/170 |
International
Class: |
G06F 12/02 20060101
G06F012/02 |
Foreign Application Data
Date |
Code |
Application Number |
Mar 18, 2013 |
JP |
2013-055602 |
Claims
1. A storage system comprising: a storage apparatus that includes:
a first storage unit that has a first storage and a first storage
control unit that controls access to the first storage, and a first
control unit that controls accessible storage units including the
first storage unit; a second storage unit that has a second storage
and a second storage control unit that controls access to the
second storage; and a second control unit that controls accessible
storage units including the second storage unit, wherein the second
storage unit and the second control unit are added to the storage
apparatus, the first control unit includes: a memory unit that
stores allocation information including an allocation state of a
storage area of the first storage and an allocation state of a
storage area of the second storage, and a first processor that is
configured to execute rearrangement control of a currently
allocated storage area based on the allocation information
corresponding to a degree of unevenness occurring between a storage
capacity of a currently allocated storage area in the first storage
and a storage capacity of a currently allocated storage area in the
second storage.
2. The storage system according to claim 1, wherein the first
processor is further configured to execute the rearrangement
control to rearrange of a portion of a currently allocated storage
area in the first storage to an unallocated storage area in the
second storage, when an expansion process for a storage capacity is
executed by connecting the second storage unit and the second
control unit to the first storage unit and the first control unit,
and when predefined unevenness is detected.
3. The storage system according to claim 1, wherein the first
processor of the first control unit is further configured to:
calculate based on the allocation information, a difference in a
storage capacity of the currently allocated storage area in plural
memory apparatuses included in the first storage and a storage
capacity of the currently allocated storage area in plural memory
apparatuses included in the second storage, and determine based on
the calculated difference, whether predefined unevenness is present
in allocation states of the storage areas of the first and the
second storage, and the first processor executes the rearrangement
control of the currently allocated storage area based on the
allocation information, when the determining unit determines that
the predefined unevenness is present.
4. The storage system according to claim 3, wherein the first
processor calculates a difference in a storage capacity of the
currently allocated storage area in a memory apparatus whose
storage capacity of the currently allocated storage area is largest
and a storage capacity of the currently allocated storage area in a
memory apparatus whose storage capacity of the currently allocated
storage area is smallest, among the plural memory apparatuses,
based on the allocation information, and the first processor
determines that the predefined unevenness is present in the
allocation state of the storage areas of the first and the second
storage, when the calculated difference is greater than or equal to
a predetermined rate of the storage capacity of the storage area
currently allocated to a memory apparatus whose storage capacity is
largest.
5. The storage system according to claim 3, wherein the first
processor determines that the predefined unevenness is present in
the allocation state of the storage areas of the first and the
second storage, when the calculated difference is greater than or
equal to a predetermined size.
6. The storage system according to claim 1, wherein the first
processor executes the rearrangement control of the currently
allocated storage areas based on the allocation information such
that storage capacities of the currently allocated storage areas of
the plural storage apparatuses among the plural storage apparatuses
included in the first and the second storage are equalized.
7. The storage system according to claim 6, wherein the first
processor of the first control unit is further configured to create
based on the allocation information, a rearrangement plan for
rearranging between the first and the second storage, the currently
allocated storage areas, and the first processor executes the
rearrangement control of the currently allocated storage areas in
the first and the second storage according to the created
rearrangement plan.
8. The storage system according to claim 7, wherein the first
processor creates the rearrangement plan based on the allocation
information such that copying processes are reduced for
transferring data consequent to rearrangement in each of the first
and the second storage.
9. The storage system according to claim 1, wherein the first
processor executes the rearrangement control to rearrange of a
portion of the currently allocated storage area in the first
storage, into an unallocated storage area in the second storage, in
a case where the expansion process is executed for the storage
capacity by connecting the second storage unit and the second
control unit to the first storage unit and the first control unit
during a data transfer process from another storage unit to the
first storage unit, when the predefined unevenness is detected.
10. The storage system according to claim 3, wherein the first
processor periodically determines whether the predefined unevenness
is present in the allocation state of the storage areas of the
first and the second storage.
11. The storage system according to claim 1, wherein the first and
the second control units are respectively connected to the first
and the second storage units, and the first and the second control
units are able to directly access respectively the second and the
first storage units.
12. The storage system according to claim 3, wherein the allocation
information includes information related to allocation of segments
allocated to the storage areas of the first and the second storage,
and the first processor calculates based on a count of currently
allocated segments, a difference in the storage capacity of the
currently allocated storage areas of the memory apparatuses
included in the first storage and the storage capacity of the
currently allocated storage areas of the memory apparatuses
included in the second storage.
13. A storage apparatus that includes a first storage unit that
includes a first storage and a first storage control unit that
controls access to the first storage, and a first control unit that
executes control of accessible storage units including the first
storage unit, and to which is added a second storage unit that
includes a second storage and a second storage control unit that
controls access to the second storage, and a second control unit
that executes control of accessible storage units including the
second storage unit, the storage apparatus comprising: a memory
unit that stores allocation information that includes allocation
states of storage areas of the first and the second storage in a
state where a storage capacity expansion process is executed by
connecting the second storage unit and the second control unit to
the first storage unit and the first control unit; and a processor
that executes rearrangement control of currently allocated storage
areas, based on the allocation information corresponding to a
degree of unevenness occurring between a storage capacity of the
currently allocated storage areas in the first storage and a
storage capacity of the currently allocated storage areas in the
second storage.
14. A non-transitory, computer-readable recording medium storing a
control program of a storage apparatus that includes a first
storage unit that includes a first storage and a first storage
control unit that controls access to the first storage, and a first
control unit that executes control of accessible storage units
including the first storage unit, and to which is added a second
storage unit that includes a second storage and a second storage
control unit that controls access to the second storage, and a
second control unit that executes control of accessible storage
units including the second storage unit, the control program
causing the first control unit to execute a process comprising:
acquiring allocation information that includes allocation states of
storage areas of the first and the second storage in a state where
a storage capacity expansion process is executed by connecting the
second storage unit and the second control unit to the first
storage unit and the first control unit; and executing
rearrangement control of currently allocated storage areas, based
on the allocation information corresponding to a degree of
unevenness occurring between a storage capacity of the currently
allocated storage areas in the first storage and a storage capacity
of the currently allocated storage areas in the second storage.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is based upon and claims the benefit of
priority of the prior Japanese Patent Application No. 2013-055602,
filed on Mar. 18, 2013, the entire contents of which are
incorporated herein by reference.
FIELD
[0002] The embodiment discussed herein is related to a storage
system, a storage apparatus, and a computer product.
BACKGROUND
[0003] Storage for a virtualized environment, i.e., a so-called
virtualized storage apparatus has conventionally been present as a
storage system capable of realizing a memory apparatus having a
free volume configuration and a free storage capacity without being
bound by the volume configuration and the storage capacity of a
physical memory apparatus. The virtualized storage apparatus
internally includes a real storage apparatus that controls access
to the physical memory apparatus. The virtualized storage apparatus
creates a virtual volume by a processor that manages the real
storage apparatus.
[0004] The virtualized storage apparatus performs data access by,
for example, wide-striping. "Wide-striping" is a technique of
distributing data access of one volume, to plural logical unit
numbers (LUNs) and performing access according to units
respectively referred to as "strip" and having a fixed length.
[0005] The storage area of the overall virtualized storage
apparatus may be expanded with increases in the storage capacity
demanded of the virtualized storage apparatus. A method of
expanding the storage area of the overall virtualized storage
apparatus may be, for example, addition of a real storage apparatus
or an increase of the number of memory apparatuses loaded on the
real storage apparatus.
[0006] For example, according to a related technique, plural disks
configure groups; a storage area is allocated from each of the
plural groups to a virtualized volume; and the storage area of each
of the groups used by the virtualized volume is rearranged based on
external operation. A virtualized file system is present that
includes plural storage processor nodes including a managing node;
a backbone switch; a disk drive array; and a virtualized file
manager executed at the managing node. According to another
technique, when virtualized volumes are rearranged among plural
pools, time periods for the pools to be depleted before and after
the rearrangement are estimated based on information in a database,
and execution or cancellation of the rearrangement is determined,
or a preferable rearrangement plan is determined, based on the
result of the estimation. For examples, refer to Japanese Laid-Open
Patent Publication Nos. 2008-234158 and 2008-112276, and Published
Japanese-Translation of PCT Application, Publication No.
2007-513429).
[0007] Nonetheless, according to the conventional techniques, when
the storage area of the overall system is expanded, the access
performance with respect to the data stored before the change of
the system configuration remains the same as that corresponding to
the performance of the storage apparatus before the change of the
system configuration.
SUMMARY
[0008] According to an aspect of an embodiment, a storage system
includes a storage apparatus that includes a first storage unit
that has a first storage and a first storage control unit that
controls access to the first storage, and a first control unit that
controls accessible storage units including the first storage unit;
a second storage unit that has a second storage and a second
storage control unit that controls access to the second storage;
and a second control unit that controls accessible storage units
including the second storage unit. The second storage unit and the
second control unit that are added to the storage apparatus. The
first control unit includes a memory unit that stores allocation
information including an allocation state of a storage area of the
first storage and an allocation state of a storage area of the
second storage, and a first processor that is configured to execute
rearrangement control of a currently allocated storage area based
on the allocation information corresponding to a degree of
unevenness occurring between a storage capacity of a currently
allocated storage area in the first storage and a storage capacity
of a currently allocated storage area in the second storage.
[0009] The object and advantages of the invention will be realized
and attained by means of the elements and combinations particularly
pointed out in the claims.
[0010] It is to be understood that both the foregoing general
description and the following detailed description are exemplary
and explanatory and are not restrictive of the invention.
BRIEF DESCRIPTION OF DRAWINGS
[0011] FIG. 1 is an explanatory diagram of an example of a storage
system SM according to an embodiment;
[0012] FIG. 2 is a block diagram of an example of a hardware
configuration of a first control unit 111, etc.;
[0013] FIG. 3 is a block diagram of an example of a functional
configuration of the first control unit 111;
[0014] FIG. 4A is a flowchart of an example of a procedure for a
first rearrangement control process executed by the first control
unit 111;
[0015] FIG. 4B is a flowchart of an example of a procedure for a
second rearrangement control process executed by the first control
unit 111;
[0016] FIG. 5 is an explanatory diagram of an example of system
configuration of the storage system SM according to a first
example;
[0017] FIG. 6 is an explanatory diagram of an example of
configuration of a VDISK;
[0018] FIG. 7 is an explanatory diagram of an example of functional
configuration of a PU according to the first example;
[0019] FIG. 8 is an explanatory diagram of an example of the
contents of a volume index table 800;
[0020] FIG. 9 is an explanatory diagram of an example of the
contents of a mirror volume index table 900;
[0021] FIG. 10 is an explanatory diagram of an example of the
contents of a volume segment table 1000;
[0022] FIG. 11 is an explanatory diagram of an example of the
contents of a rearrangement plan table 720;
[0023] FIG. 12 is an explanatory diagram of an example of a
rearrangement plan for a volume;
[0024] FIG. 13 is an explanatory diagram (Part I) of an example of
rearrangement of volumes;
[0025] FIG. 14 is an explanatory diagram of an example of
arrangement of the volumes;
[0026] FIG. 15 is an explanatory diagram of an example of updating
of the volume index table 800;
[0027] FIG. 16 is an explanatory diagram of an example of updating
of the volume segment table 1000;
[0028] FIG. 17 is an explanatory diagram of an example of updating
of the rearrangement plan table 720;
[0029] FIG. 18 is an explanatory diagram (Part II) of the example
of rearrangement of the volumes;
[0030] FIGS. 19 and 20 are sequence diagrams of an example of a
procedure for a node addition process for the storage system
SM;
[0031] FIGS. 21, 22, 23, and 24 are sequence diagrams of an example
of a procedure for a rearrangement process for the storage system
SM;
[0032] FIG. 25 is a sequence diagram of an example of a procedure
for a first rearrangement suspension process for the storage system
SM;
[0033] FIG. 26 is a sequence diagram of an example of a procedure
for a second rearrangement suspension process of the storage system
SM;
[0034] FIG. 27 is a sequence diagram of an example of a procedure
for a temporary rearrangement suspension process of the storage
system SM;
[0035] FIG. 28 is a sequence diagram of an example of a procedure
for a rearrangement restart process for the storage system SM;
[0036] FIG. 29 is an explanatory diagram of an example of system
configuration of the storage system SM according to a second
example;
[0037] FIG. 30 is an explanatory diagram of an example of
functional configuration of the PU according to the second
example;
[0038] FIG. 31 is an explanatory diagram of an example of the
contents of a transfer source/destination volume correspondence
table 3100; and
[0039] FIGS. 32 and 33 are sequence diagrams of an example of a
procedure for a data transfer process of the storage system SM.
DESCRIPTION OF EMBODIMENTS
[0040] Embodiments of a storage system, a storage apparatus, and a
control program will be described in detail with reference to the
accompanying drawings.
[0041] FIG. 1 is an explanatory diagram of an example of a storage
system SM according to an embodiment. In FIG. 1, the storage system
SM includes first and second storage housings 101 and 102. The
first storage housing 101 includes a first control unit 111 and a
first storage unit 112. The first storage unit 112 includes first
storage 113 and a first storage control unit 114. The first storage
control unit 114 is a computer that controls access to the first
storage 113. The first storage housing 101 operates independently
as a storage apparatus.
[0042] The first control unit 111 is a computer that controls the
first storage unit 112 subordinate thereto, and has a function of
causing a second storage 123 to be available, expanding the storage
capacity of the overall storage system SM when the second storage
housing 102 is connected to the first storage housing 101.
[0043] The first control unit 111 manages a second storage unit 122
as a subordinate storage unit when the second storage unit 122 is
connected to the first control unit 111 and becomes accessible;
accepts access to the first and the second storage 113 and 123; and
manages other control units and controls the overall system as a
master control unit after another control unit (for example, a
second control unit 121) is added.
[0044] The second storage housing 102 includes the second control
unit 121 and the second storage unit 122. The second storage unit
122 includes the second storage 123 and a second storage control
unit 124. The second storage control unit 124 is a computer that
controls access to the second storage 123. The second control unit
121 and the second storage unit 122 are "components" used when
system expansion is performed and, for example, are incorporated in
the storage system SM to function as storage apparatuses.
[0045] The second control unit 121 is a computer that controls the
storage unit subordinate thereto; manages the first and the second
storage 113 and 123 as subordinate storage units when the second
storage housing 102 is connected to the first storage housing 101;
and accepts access to the first and the second storage 113 and
123.
[0046] The first and the second storage 113 and 123 each include
one or more memory apparatus(es) D. The memory apparatus D may be a
physical memory apparatus such as, for example, a hard disk, an
optical disk, a flash memory, and magnetic tape, or may be a
logical memory apparatus, such as a LUN.
[0047] The first and the second control units 111 and 121 are
respectively connected to the first and the second storage units
112 and 122 by a communication path 130 for connecting the storage
housings. Thus, the first control unit 111 can directly access the
second storage unit 122, and the second control unit 121 can
directly access the first storage unit 112.
[0048] The storage system SM accesses data by, for example,
wide-striping. Based on the wide-striping, degradation of the
performance consequent to concentration of access can be suppressed
and stable performance can be secured without executing complicated
performance design taking into consideration the amount of access
by the server, etc., and the physical position of the volume.
[0049] The storage area of the overall storage system SM may be
expanded with increases of the storage capacity demanded of the
storage system SM. It is assumed that the second storage unit 122
is added to the existing first storage 101 and the expansion of the
storage area of the overall storage system SM (i.e., "scale out")
is executed.
[0050] In this case, data stored after the system configuration has
been changed may be stored in the plural storage units (in the
example of FIG. 1, the first and the second storage units 112 and
122) based on the wide-striping and therefore, access performance
corresponding to the plural storage units can be expected.
[0051] On the other hand, the access performance for the data
stored before the change of the system configuration stays as the
performance of the storage unit (in the example of FIG. 1, the
first storage unit 112) before the change of the system
configuration. As above, unbalanced access performance for data
stored before and after the change of the system configuration is
not desirable for managing the performance of the storage system
SM.
[0052] In this embodiment, the first control unit 111 of the first
storage housing 101 executes rearrangement control for the
currently allocated storage areas, according to the degree of
unevenness of the storage capacity of the currently allocated
storage areas, occurring between the first and the second storage
113 and 123.
[0053] For example, when a storage capacity expansion process is
executed by connecting the second control unit 121 and the second
storage unit 122 to the first storage housing 101 and predefined
unevenness is detected, the first control unit 111 executes the
rearrangement control for the currently allocated storage areas.
Thereby, when the system configuration is changed by adding the
second control unit 121 and the second storage unit 122 to the
storage system SM, optimization can be facilitated of the access
performance for the data stored before and after the change.
[0054] An example of a hardware configuration of the computer of
the first and the second control units 111 and 121, and the first
and the second storage control units 114 and 124 (herein, simply
"the first control unit 111, etc.") will be described.
[0055] FIG. 2 is a block diagram of an example of a hardware
configuration of the first control unit 111, etc. As depicted in
FIG. 2, the first control unit 111, etc. includes a central
processing unit (CPU) 201, memory 202, and an interface (I/F) 203.
The components are respectively connected by a bus 210.
[0056] The CPU 201 governs overall control of the first control
unit 111, etc. The memory 202 includes, for example, read only
memory (ROM), random access memory (RAM), and flash ROM. For
example, the flash ROM stores programs such as an OS and firmware;
the ROM stores application programs; and the RAM is used as a work
area of the CPU 201. Programs stored in the memory 201 are loaded
onto the CPU 201, whereby encoded processes are executed by the CPU
201.
[0057] The I/F 203 controls the input and output of data from other
computers. For example, the I/F 203 is connected to a network such
as a local area network (LAN), a wide are network (WAN), and the
Internet, via a communication line and is connected to other
apparatuses through the network. The I/F 203 administers an
internal interface with the network and controls the input and
output of data from other computers.
[0058] FIG. 3 is a block diagram of an example of a functional
configuration of the first control unit 111. As depicted in FIG. 3,
the first control unit 111 includes a memory unit 301, a
calculating unit 302, a determining unit 303, a creating unit 304,
and a rearrangement control unit 305. Functions of the units from
the calculating unit 302 to the rearrangement control unit 305 are
implemented, for example, by causing the CPU 201 to execute
programs stored in the memory 202 depicted in FIG. 2, or by using
the I/F 203. The results of the processing by the functional units
are stored in, for example, the memory 202.
[0059] The memory unit 301 stores configuration information that
indicates the configurations of the storage areas of the first and
the second storage 113 and 123. The configuration information
includes allocation information that includes the allocation states
of the storage areas of the first and the second storage 113 and
123. The "allocation information" is information indicating, for
example, to which memory apparatus D, the volume is allocated in
the first and the second storage 113 and 123.
[0060] The "volume" is a storage area; the storage system SM is
managed in units of volumes. For example, the volume may be a
logical volume formed by grouping plural physical memory
apparatuses or partitions in a memory apparatus (e.g., a hard disk)
to virtually be one volume.
[0061] Although detailed description will be made with reference to
FIG. 6, for example, the volume is an aggregate of plural segment
sets and each "segment set" is an aggregate of plural segments. In
this case, the configuration information is information that
indicates to which memory apparatus D of the first and the second
storage 113 and 123, each of the segments constituting the volume
is allocated.
[0062] The configuration information is stored in, for example, the
memory 202 of the first storage control unit 114 or the first
storage 113. The first control unit 111 reads the configuration
information from the memory 202 of the first storage control unit
114 and stores the configuration information into the memory unit
301. The configuration information is updated, for example,
according to the allocation state of the storage areas of the first
and the second storage 113 and 123.
[0063] The configuration information may also be stored in the
memory 202 of the second storage control unit 124 or the second
storage 123 for redundancy. A specific example of the configuration
information will be described later with reference to FIGS. 8 to
10. The memory unit 301 is implemented by, for example, the memory
202 of the first control unit 111.
[0064] Based on the configuration information stored in the memory
unit 301, the calculating unit 302 calculates a difference "d"
between the first and the second storage 113 and 123, i.e., in the
storage capacity of the currently allocated storage area of the
memory apparatuses D included in the first storage 113 and the
storage capacity of the currently allocated storage area of the
memory apparatuses D included in the second storage 123. In the
description below, the storage capacity of the currently allocated
storage areas may be represented by "allocated amount q".
[0065] For example, the calculating unit 302 calculates the maximal
allocated amount "q.sub.max" of the memory apparatus D whose
allocated amount q is the greatest among the plural memory
apparatuses D included in the first and the second storage 113 and
123; and also calculates the minimal allocated amount "q.sub.min"
of the memory apparatus D whose allocated amount q is the least
among the plural memory apparatuses D. The calculating unit 302 may
calculate the difference d of the maximal allocated amount
"q.sub.max" and the minimal allocated amount "q.sub.min".
[0066] The allocated amount q of each of the memory apparatuses D
can be acquired from, for example, the number of segments of the
volume allocated to the memory apparatus D. For example, when the
capacity of each of the segments is 256 [MB] and the number of
segments allocated to a memory apparatus D is "two", the allocated
amount q of this memory apparatus D is 512 [MB]. Management is
performed in units of segments. A "segment" is a storage area
defined by a predetermined capacity and managed based on the
position information such as an address to instruct recording or
recreating from the host to the volume (logical block addressing:
LBA), etc.
[0067] Based on the difference d calculated by the calculating unit
302, the determining unit 303 determines whether predefined
unevenness is present in the allocation state of the storage areas
of the first and the second storage 113 and 123. A state where
"predefined unevenness is present" refers to a state where the
allocation state of the storage areas is uneven to the extent that
rearrangement of the currently allocated storage areas (for
example, the segments) in the first and the second storage 113 and
123 is desirable.
[0068] For example, when the difference d between the maximal
allocated amount "q.sub.max" and the minimal allocated amount
"q.sub.min" is greater than or equal to a predetermined rate
.alpha. of the maximal allocated amount "q.sub.max", the
determining unit 303 may determine that the predefined unevenness
is present in the allocation state of the storage areas of the
first and the second storage 113 and 123. When the calculated
difference d between the maximal allocated amount "q.sub.max" and
the minimal allocated amount "q.sub.min" is greater than or equal
to a predetermined size .beta., the determining unit 303 may
determine that the predefined unevenness is present in the
allocation state of the storage areas of the first and the second
storage 113 and 123.
[0069] When the difference d between the maximal allocated amount
"q.sub.max" and the minimal allocated amount "q.sub.min" is greater
than or equal to the predetermined rate a of the maximal allocated
amount "q.sub.max" and is greater than or equal to the
predetermined size .beta., the determining unit 303 may determine
that the predefined unevenness is present in the allocation state
of the storage areas of the first and the second storage 113 and
123.
[0070] When a storage unit is added whose storage capacity is
greater than that of the existing storage unit, more segments may
be allocated to the added storage unit compared to the existing
storage unit. Therefore, as described, the determining unit 303
determines whether the predefined unevenness is present in the
allocation state of the storage areas of the first and the second
storage 113 and 123, by using a comparison of the allocated
amounts, i.e., the absolute amounts used, rather than a usage ratio
of the amount used to the available storage capacity.
[0071] Immediately after the second storage unit 122 is added, the
allocation of the storage areas has not yet been executed for the
second storage 123 and the allocated amount of the second storage
123 (the storage capacity of the currently allocated storage area)
is zero. Therefore, the determining unit 303 detects the difference
d of the allocated amounts q and thereby, can easily detect the
unevenness of the allocated amounts of the storage areas of the
first and the second storage 113 and 123.
[0072] The determination process executed by the determining unit
303 may be executed, for example, periodically at time intervals
set in advance or may be executed at an arbitrary timing according
to an operational input by a manager of the storage system SM, etc.
The rate .alpha. and the size .beta. are stored in, for example,
the memory 202. Specific values of the rate .alpha. and the size
.beta. will be described later.
[0073] The creating unit 304 creates a rearrangement plan to
equalize the allocated amounts q between the first and the second
storage 113 and 123, based on the configuration information stored
in the memory unit 301. The "rearrangement plan" represents, for
example, the memory apparatuses D and storage into which the
segments constituting the volume are rearranged. For example, the
creating unit 304 creates the rearrangement plan to equalize the
allocated amounts q between the memory apparatuses D included in
the first and the second storage 113 and 123.
[0074] The creating unit 304 may create based on the configuration
information stored in the memory unit 301, a rearrangement plan to
reduce copying processes for data transfer executed consequent to
the rearrangement in each of the first and the second storage 113
and 123. The created rearrangement plan is stored in, for example,
a rearrangement plan table 720 depicted in FIG. 11 and described
later.
[0075] The rearrangement control unit 305 executes rearrangement
control of the currently allocated storage areas based on the
configuration information stored in the memory unit 301, according
to the degree of unevenness between the allocated amounts q of the
storage areas of the first and the second storage 113 and 123. For
example, the rearrangement control unit 305 controls the first and
the second storage units 112 and 122 and thereby, rearranges the
arrangement of the currently allocated storage areas in the first
and the second storage 113 and 123, according to the rearrangement
plan created by the creating unit 304.
[0076] For example, the rearrangement control unit 305 executes the
rearrangement control of the currently allocated storage areas when
the rearrangement control unit 305 detects that the second control
unit 121 and the second storage unit 122 are connected to the first
storage housing 101, whereby the storage capacity expansion process
is executed. The "expansion process" is, for example, a process of
setting the second storage 123 of the second storage unit 122
connected to the first storage housing 101, to be available. In
this case, the rearrangement control unit 305 executes, for
example, the rearrangement control to rearrange of a portion of the
arrangement of the currently allocated storage area in the first
storage 113, into an unallocated storage area of the second storage
123.
[0077] The rearrangement control unit 305 may execute, for example,
the rearrangement control of the currently allocated storage areas
when the rearrangement control unit 305 detects that the memory
apparatus D has been added to the first or the second storage 113
or 123; or may execute control to rearrange the arrangement of the
currently allocated storage areas in the first and the second
storage 113 and 123 when the determining unit 303 determines that
the predefined unevenness is present.
[0078] The rearrangement control unit 305 may execute the
rearrangement control of the currently allocated storage areas when
the rearrangement control unit 305 detects that, during a process
of transferring data from another storage unit to the first storage
unit 112, the storage capacity expansion process is executed by
connecting the second control unit 121 and the second storage unit
122. In this case, the rearrangement control unit 305 executes, for
example, the rearrangement control to rearrange a portion of the
arrangement of the currently allocated storage areas in the first
storage 113, into an unallocated storage area of the second storage
123.
[0079] A procedure for a rearrangement control process executed by
the first control unit 111 will be described. A procedure for a
first rearrangement control process executed by the first control
unit 111 will be described with reference to FIG. 4A. The first
rearrangement control process is an example of the rearrangement
control process executed when scale out is executed for the storage
system SM.
[0080] FIG. 4A is a flowchart of an example of the procedure for
the first rearrangement control process executed by the first
control unit 111. In the flowchart of FIG. 4A, the first control
unit 111 determines whether the second storage housing 102 (the
second control unit 121 and the second storage unit 122) is
connected to the first storage housing 101 and the second storage
123 is set to be available (step S411).
[0081] The first control unit 111 waits for the second storage 123
to become available (step S411: NO). When the first control unit
111 determines that the second storage 123 has become available
(step S411: YES), the first control unit 111 reads the
configuration information from the memory 202 of the first storage
control unit 114 (step S412).
[0082] Based on the read configuration information, the first
control unit 111 creates a rearrangement plan to equalize the
allocated amounts q between the memory apparatuses D included in
the first and the second storage 113 and 123 (step S413).
[0083] The first control unit 111 controls the first and the second
storage units 112 and 122 and thereby, rearranges the arrangement
of the allocated storage areas in the first and the second storage
113 and 123 according to the created rearrangement plan (step S414)
and causes the series of operations according to this flowchart to
come to an end.
[0084] Thus, when scale out is executed for the storage system SM,
rearrangement control of the currently allocated storage areas can
be executed in the first and the second storage 113 and 123.
[0085] A procedure for a second rearrangement control process
executed by the first control unit 111 will be described with
reference to FIG. 4B. The second rearrangement control process is a
rearrangement control process executed at an arbitrary timing or
periodically.
[0086] FIG. 4B is a flowchart of an example of a procedure for the
second rearrangement control process executed by the first control
unit 111. In the flowchart of FIG. 4B, the first control unit 111
reads the configuration information from the memory 202 of the
first storage control unit 114 (step S421).
[0087] Based on the read configuration information, the first
control unit 111 calculates the maximal allocated amount q.sub.max
of the memory apparatus D whose allocated amount q is the greatest
among the plural memory apparatuses D included in the first and the
second storage 113 and 123 (step S422) and, based on the read
configuration information, calculates the minimal allocated amount
q.sub.min of the memory apparatus D whose allocated amount q is the
least among the plural memory apparatuses D (step S423).
[0088] The first control unit 111 calculates the difference d of
the maximal allocated amount q.sub.max and the minimal allocated
amount q.sub.min (step S424) and determines if the calculated
difference d is greater than or equal to the rate .alpha. of the
maximal allocated amount q.sub.max (step S425). If the first
control unit 111 determines that the difference d is less than the
rate .alpha. of the maximal allocated amount q.sub.max (step S425:
NO), the first control unit 111 causes the series of operations
according to this flowchart to come to an end.
[0089] On the other hand, if the first control unit 111 determines
that the difference d is greater than or equal to the rate .alpha.
of the maximal allocated amount q.sub.max (step S425: YES), the
first control unit 111 determines if the difference d is greater
than or equal to the size .beta. (step S426). If the first control
unit 111 determines that the difference d is smaller than the size
.beta. (step S426: NO), the first control unit 111 causes the
series of operations according to this flowchart to come to an
end.
[0090] On the other hand, if the first control unit 111 determines
that the difference d is greater than or equal to the size .beta.
(step S426: YES), the first control unit 111, based on the
configuration information, creates the rearrangement plan to
equalize the allocated amounts q between the memory apparatuses D
included in the first and the second storage 113 and 123 (step
S427).
[0091] The first control unit 111 controls the first and the second
storage units 112 and 122 and thereby, rearranges the arrangement
of the currently allocated storage areas in the first and the
second storage 113 and 123, according to the created rearrangement
plan (step S428) and causes the series of operations according to
this flowchart to come to an end.
[0092] Thus, the allocation state of the storage areas of the first
and the second storage 113 and 123 can be determined at an
arbitrary timing or periodically, and the rearrangement control of
the currently allocated storage areas can be executed in the first
and the second storage 113 and 123. For example, immediately after
the second storage unit 122 is added, the unevenness of the
allocated amounts is determined between the first and the second
storage 113 and 123. However, during operation after the addition,
the unevenness of the allocated amounts can be determined among the
memory apparatuses in the storage of the overall system including
both the first and the second storage 113 and 123. For example, a
case can also be determined where unevenness is present among the
memory apparatuses in the first storage 113.
[0093] As described, according to the first control unit 111 of the
first storage housing 101 of the embodiment, the rearrangement
control of the currently allocated storage areas can be executed
according to the degree of unevenness of the allocated amounts q in
the first and the second storage 113 and 123. Thereby, access
performance can be optimized accessing data stored in the storage
system SM.
[0094] According to the first control unit 111, when the second
storage housing 102 (the second control unit 121 and the second
storage unit 122) is connected to the first storage housing 101 and
thereby, the storage capacity expansion process is executed, the
rearrangement control may be executed to rearrange a portion of the
arrangement of the currently allocated storage area in the first
storage 113, into an unallocated storage area in the second storage
123. Thus, when scale out is executed for the storage system SM,
access performance can be optimized for accessing data stored
before the change of the system configuration.
[0095] According to the first control unit 111, the rearrangement
control of the currently allocated storage areas can be executed in
the first and the second storage 113 and 123 to equalize the
allocated amounts q between the memory apparatuses D included in
the first and the second storage 113 and 123. Thereby, accesses to
the data can be distributed to be equalized between the first and
the second storage units 112 and 122.
[0096] According to the first control unit 111, the difference d is
calculated between the maximal allocated amount "q.sub.max" and the
minimal allocated amount "q.sub.min" of the memory apparatuses D
included in the first and the second storage 113 and 123; and it
can be determined that the predefined unevenness is present in the
allocation state of the storage areas of the first and the second
storage 113 and 123 if the calculated difference d is greater than
or equal to a predetermined rate .alpha. of the maximal allocated
amount "q.sub.max" and is greater than or equal to the
predetermined size .beta.. Thereby, it can be determined whether
the allocation state of the storage areas is uneven to the extent
that rearrangement of the currently allocated storage areas in the
first and the second storage 113 and 123 is desirable.
[0097] According to the first control unit 111, when, during a
process of transferring data from another storage unit to the first
storage unit 112, the storage capacity expansion process is
executed by connecting the second control unit 121 and the second
storage unit 122, the rearrangement control can be executed to
rearrange a portion of the arrangement of the currently allocated
storage area in the first storage 113, into an unallocated storage
area in the second storage 123. Thereby, even when scale out is
executed for the storage system SM during data transfer, access
performance can be optimized for accessing the data stored before
the change of the system configuration.
[0098] A first example of the storage system SM according to the
embodiment will be described.
[0099] FIG. 5 is an explanatory diagram of an example of system
configuration of the storage system SM according to the first
example. In FIG. 5, the storage system SM includes processor units
(PUs) #1 and #2, switches (SWs) #1 and #2, and a storage unit (SU)
#1.
[0100] The PUs #1 and #2 are computers that control SUs #1 and #2.
The PUs #1 and #2 are, for example, each a server accessible by a
business server BS and a management server MS described later. The
first control unit 111 depicted in FIG. 1 corresponds to, for
example, the PU #1. The SWs #1 and #2 are computers each having a
switching function.
[0101] The SU #1 includes redundant arrays of independent disks
(RAIDs) #1 to #4 and is a computer to control the access of the
RAIDs #1 to #4. The first storage unit 112 depicted in FIG. 1
corresponds to, for example, the SU #1.
[0102] The RAIDs #1 to #4 form a RAID group by combining plural
memory apparatuses (for example, hard disks) as one memory
apparatus. For example, each of the RAIDs #1 to #4 is configured by
two LUNs. The first storage 113 depicted in FIG. 1 corresponds to,
for example, the RAIDs #1 to #4. The memory apparatus D depicted in
FIG. 1 corresponds to, for example, a LUN.
[0103] The description has been made taking the example of a case
where the two PUs #1 and #2 are connected to the SU #1 for
redundancy. However, one PU (for example, the PU #1 or #2) may be
connected to the SU #1.
[0104] For the storage system SM, the storage area of the overall
storage system SM can be expanded with increases in the storage
capacity demanded of the storage system SM. For example, for the
storage system SM, the storage area of the overall storage system
SM may be expanded using a PU and an SU as one set.
[0105] In the description below, addition of expansion sets
(PUs+SUs) each including a PU and an SU as one set, to the storage
system SM may be written as "scale out"; the PUs #1 and #2, the SWs
#1 and #2, and the SU #1 included in the storage system SM may each
be written as "base node"; the expansion set added to the storage
system SM may be written as "additional node"; and a virtual volume
supplied by the storage system SM may be written as "VDISK".
[0106] FIG. 6 is an explanatory diagram of an example of
configuration of the VDISK. In FIG. 6, the VDISK is an aggregate of
plural segment sets. Each of the segment sets is an aggregate of
eight segments #1 to #8. The capacity of the segment set is, for
example, 2 [GB] and the capacity of the segment is, for example,
256 [MB].
[0107] Taking the storage system SM depicted in FIG. 5 as an
example, the segments #1 to #8 are allocated to LUNs #1 to #8 in
the SU #1. Data of a user is recorded in strips each having a fixed
length (1 [MB]) as units. The strips are striped in a manner of
using the segments #1 to #8 in this order.
[0108] FIG. 7 is an explanatory diagram of an example of functional
configuration of the PU according to the first example. In FIG. 7,
the storage system SM includes a base node N1 and an additional
node N2. The first storage apparatus 101 depicted in FIG. 1
corresponds to, for example, the base node N1. The second storage
apparatus 102 depicted in FIG. 1 corresponds to the additional node
N2. However, FIG. 7 depicts the state of the storage system SM
immediately after scale out is executed therefor. The base node N1
includes the PUs #1 and #2, and the SU #1. The additional node N2
includes a PU #3 and an SU #2.
[0109] The PUs #1 and #2 in the base node N1, the PU #3 in the
additional node N2, and the management server MS are connected to
each other through a management LAN. The management server MS is a
computer used by the manager of the storage system SM and includes
an apparatus management graphical user interface (GUI).
[0110] The PUs #1 and #2 in the base node N1, the PU #3 in the
additional node N2, and the business server BS are connected to
each other through an I/O LAN. The business server BS is a computer
having business applications installed therein.
[0111] The PUs #1, #2, and #3, and the SUs #1 and #2 are connected
to each other through the I/O LAN and an internal management LAN.
The SUs #1 and #2 include configuration management DBs #1 and #2,
and storage control units #1 and #2. The storage control units #1
and #2 are implemented by executing a storage control program on
the CPU. The "storage control program" is a program to control
access to the storage in the SUs #1 and #2.
[0112] The configuration management DBs #1 and #2 each include a
volume management table 710 and the rearrangement plan table 720.
The volume management table 710 and the rearrangement plan table
720 are read by the PU #1 from the configuration management DB #1
(or the configuration management DB #2) and are used. The volume
management table 710 includes a volume index table 800 (see FIG.
8), a mirror volume index table 900 (see FIG. 9), and a volume
segment table 1000 (see FIG. 10).
[0113] The volume index table 800 is a table to manage the volumes
(VDisks). The mirror volume index table 900 is a table to manage
mirror volumes. The volume segment table 1000 is a table to manage
the segments of the volumes. The tables 800, 900, and 1000 are
correlated with each other by volume numbers as indexes.
[0114] The rearrangement plan table 720 is a table to manage the
storage apparatus (SU), the LUN number, and the rearrangement state
of the arrangement destination after the formulation of the plan
that plans where each of the segments constituting the VDISK is
arranged. The rearrangement state of the segments whose
rearrangement has been completed (or whose rearrangement is
unnecessary) will be represented as "rearrangement completed (or
rearrangement unnecessary)". The rearrangement state of the
segments currently under rearrangement will be represented as
"under rearrangement". The rearrangement state of the segments to
be rearranged in the future will be represented as "awaiting
rearrangement".
[0115] During temporary suspension of the rearrangement, the PU #1
is maintained without deleting the rearrangement plan table 720. If
the rearrangement is suspended, the PU #1 discards the
rearrangement plan table 720. If the volume (VDISK) is deleted
during the temporary suspension of the rearrangement, the PU #1
deletes the corresponding record from the volume index table 800 of
the volume management table 710 and also deletes the corresponding
record from the rearrangement plan table 720. Detailed description
of the volume management table 710 and the rearrangement plan table
720 will be made later with reference to FIGS. 8 to 11.
[0116] The PU #1 includes an I/O control unit #1, a PU control unit
#1, a cluster control M, an apparatus management GUI control unit
#1, a PU load monitoring unit #1, an SU control unit #1, and volume
managers M and A #1. The calculating unit 302, the determining unit
303, the creating unit 304, and the rearrangement control unit 305
depicted in FIG. 3 correspond to, for example, the volume manager
M.
[0117] The PU #2 includes an I/O control unit #2, a PU control unit
#2, a cluster control unit #2, a PU load monitoring unit #2, an SU
control unit #2, and a volume manager A #2. The PU #3 includes an
I/O control unit #3, a PU control unit #3, a cluster control unit
#3, a PU load monitoring unit #3, an SU control unit #3, and a
volume manager A #3.
[0118] The I/O control units #1 to #3 each accept an I/O request
from the business server BS and each process the I/O request. The
PU control units #1 to #3 respectively control the PUs #1 to #3.
The cluster control M clusters the PUs. The PUs #1, #2, and #3 form
a cluster. The cluster control units #2 and #3 each recognize the
PUs #1 to #3 clustered by the cluster control M.
[0119] The apparatus management GUI control unit #1 determines the
state of the storage system SM and creates a new volume, according
to instructions from the management server MS. The PU load
monitoring units #1 to #3 respectively monitor the loads on the PUs
#1 to #3. The SU control units #1 to #3 control the SUs #1 and
#2.
[0120] The volume manager M controls the volume managers A #1 to
#3. For example, the volume manager M starts up a rearrangement
control thread and causes the volume managers A #1 to #3 to execute
the rearrangement control thread. The volume managers A #1 to #3
manage the volumes according to the control by the volume manager
M.
[0121] When the PU #1 fails in the storage system SM, for example,
the PU #2 or #3 takes over the function of the PU #1. The hardware
configuration of each of the business server BS and the management
server MS is implemented by, for example, a CPU, a memory, a
magnetic disk drive, a magnetic disk, a display, an I/F, a
keyboard, a mouse, etc.
[0122] The contents of the volume management table 710 will be
described with reference to FIGS. 8 to 10. The configuration
information corresponds to, for example, the volume management
table 710.
[0123] FIG. 8 is an explanatory diagram of an example of the
contents of the volume index table 800. In FIG. 8, the volume index
table 800 has fields for the volume number, the volume name, the
operating PU number, the volume attribute, the volume size, the
volume state, and the rearrangement state. By setting information
in each of the fields, volume information 800-1 to 800-n is stored
as records.
[0124] The volume number is an identifier of the VDISK. The volume
name is the name of the VDISK. The operating PU number is an
identifier of the PU operated by the VDISK. The volume attribute is
the attribute of the VDISK. The volume size is the size (GB) of the
VDISK for the business server BS. The volume state is a state
representing whether the VDISK is accessible. The rearrangement
state represents the rearrangement state of the VDISK.
[0125] For example, the volume information 800-1 indicates the
volume name "Vdisk 1", the operating PU number "1", the volume
attribute "thin-provisioning volume", the volume size "500", the
volume state "normal", and the rearrangement state "under
rearrangement" of the VDISK 1. Volume information having therein
the volume name "NULL", such as the volume information 800-n, is
information concerning an uncreated VDISK.
[0126] FIG. 9 is an explanatory diagram of an example of the
contents of the mirror volume index table 900. In FIG. 9, the
mirror volume index table 900 has fields for the volume number, the
number of mirrors, and mirror volume numbers 1 and 2. By setting
information in each of the fields, mirror volume information (for
example, mirror volume information 900-1 and 900-2) is stored as
records.
[0127] The volume number is an identifier of the mirror volume. The
number of mirrors is the number of mirroring volumes. The mirror
volume numbers 1 and 2 are each an identifier of the VDISK that is
the substance of the mirror volume. For example, the mirror volume
information 900-1 indicates the number of mirrors "two", the mirror
volume number 1 "127", and the mirror volume number 2 "128" of the
VDISK 5.
[0128] FIG. 10 is an explanatory diagram of an example of the
contents of the volume segment table 1000. In FIG. 10, the volume
segment table 1000 has fields for the volume number, the segment
set number, the segment number, the storage apparatus, the LUN
number, the segment state, and the rearrangement state. By setting
information in each of the fields, segment information (for
example, segment information 1000-1 and 1000-2) is stored as
records.
[0129] The volume number is an identifier of the VDISK. The segment
set number is an identifier of the segment set constituting the
VDISK. The segment number is an identifier of a segment
constituting the segment set. The storage apparatus is an
identifier of the SU to which the segment belongs. The LUN number
is an identifier of the LUN to which the segment is allocated. The
segment state is a state representing whether the segment is
accessible. The rearrangement state represents the rearrangement
state of the segments.
[0130] For example, the segment information 1000-1 indicates the
storage apparatus "1", the LUN number "1", the segment state
"valid", and the rearrangement state "blank (empty)" of a segment 1
of a segment set 1 of the VDISK 1.
[0131] The contents of the rearrangement plan table 720 will be
described with reference to FIG. 11.
[0132] FIG. 11 is an explanatory diagram of an example of the
contents of the rearrangement plan table 720. In FIG. 11, the
rearrangement plan table 720 has fields for the volume number, the
segment set number, the segment number, the current storage
apparatus, the current LUN number, the rearranged storage
apparatus, the rearranged LUN number, and the rearranged state. By
setting information in each of the fields, rearrangement plan
information (for example, rearrangement plan information 1100-1 to
1100-5) is stored as records.
[0133] The volume number is an identifier of the VDISK. The segment
set number is an identifier of the segment set constituting the
VDISK. The segment number is an identifier of a segment
constituting the segment set. The current storage apparatus is an
identifier of the SU to which the segment before the rearrangement
belongs. The current LUN number is an identifier of the LUN to
which the segment before the rearrangement is allocated. The
rearranged storage apparatus is an identifier of the SU to which
the segment after the rearrangement belongs. The rearranged LUN
number is an identifier of the LUN to which the segment after the
rearrangement is allocated. The rearranged state represents the
rearrangement state of the segments.
[0134] For example, the rearrangement plan information 1100-1
indicates the current storage apparatus "1", the current LUN number
"1", the rearranged storage apparatus "1", the rearranged LUN
number "1", and the rearrangement state "rearrangement unnecessary"
of the segment 1 of the segment set 1 of the VDISK 1.
[0135] An example of determination of the unevenness of the
allocated amounts q of the segments of each LUN in the storage
system SM will be described. The PU #1 monitors the allocated
amounts q of the segments of each LUN in the storage system SM. If
the PU #1 detects that "unevenness is present", using the apparatus
management GUI control unit, the PU #1 gives notification that the
unevenness has occurred. The trigger of the monitoring may be, for
example, a change of the system configuration associated with the
addition of a node, arrival of the time for periodic monitoring, or
an increase of the load capacity of the SU.
[0136] For example, the PU #1 refers to the volume management table
710; calculates the allocated amount q of the segments of each LUN
in the storage system SM; and identifies the maximal allocated
amount q.sub.max of the LUN whose allocated amount q of the
segments is the greatest and the minimal allocated amount q.sub.min
of the LUN whose allocated amount q of the segments is the least,
among all the LUNs in the storage system SM.
[0137] When the difference d of the maximal allocated amount
q.sub.max and the minimal allocated amount q.sub.min is greater
than or equal to the predetermined rate .alpha. of the maximal
allocated amount q.sub.max and is greater than or equal to the
predetermined size .beta., the PU #1 determines that the unevenness
is present in the allocated amount q of the segments in each LUN in
the storage system SM. The predetermined rate .alpha. and the
predetermined size .beta. can each be arbitrarily set. The rate
.alpha. is set to be, for example, a value that is about five to
10[%]. The size .beta. is set to be, for example, a value that is
about 64 or 128 [GB].
[0138] The rearrangement plan of the volume (VDISK) will be
described. The PU #1 formulates the rearrangement plan based on the
allocated amount q of the segments in the LUN constituting the SU.
Therefore, equalizing rearrangement can be executed even when the
load capacity of the SU to be added is different from that of the
existing SU.
[0139] FIG. 12 is an explanatory diagram of an example of a
rearrangement plan for a volume. As depicted for item 1 of FIG. 12,
the case is assumed where an expansion set (the SU #2) of 8.4 [TB]
is added to a basic set (the SU #1) of 8.4 [TB]. In this case, the
PU #1 performs distribution and rearrangement such that the
allocated amount q of the segments of each LUN is equalized between
the SUs #1 and #2.
[0140] As depicted for item 2 of FIG. 12, the case is assumed where
an expansion set (the SU #2) of 16.8 [TB] is added to the basic set
(the SU #1) of 8.4 [TB]. In this case, the PU #1 performs
distribution and rearrangement such that the allocated amount q of
the segments of each LUN is equalized between the SUs #1 and
#2.
[0141] As depicted for item 3 of FIG. 12, the case is assumed where
an expansion set (the SU #3) of 16.8 [TB] is added to the basic set
(the SU #1) of 8.4 [TB] and the expansion set (the SU #2) of 8.4
[TB]. In this case, the PU #1 performs distribution and
rearrangement such that the allocated amount q of the segments of
each LUN is equalized among the SUs #1, #2, and #3. Although a case
has been described where the expansion set including the PU and the
SU as one set is added, the SU alone may be added as an expansion
set.
[0142] How the existing volumes are rearranged for the SU #2 added
for scale out will be described with reference to FIG. 13. A case
will be described where the rearrangement process is automatically
started up after scale out. Nonetheless, a rearrangement
instruction can manually be issued from a GUI screen on the
management server MS.
[0143] FIG. 13 is an explanatory diagram (Part I) of an example of
rearrangement of the volumes. In FIG. 13, segments A0 to A31
constituting a volume 1 and segments B0 to B15 constituting a
volume 2 are arranged in the SU #1 (in FIG. 13, "before
rearrangement"). In FIG. 13, the cylindrical columns in the SUs #1
and #2 represent the LUNs in the SUs #1 and #2.
[0144] The PU #1 creates the rearrangement plan table 720 such that
the allocated amount q of the segments in each LUN is equalized
between the SUs #1 and #2 (in FIG. 13, "rearrangement proposal").
The disposed positions of the segments are tentative positions.
[0145] The PU #1 refers to the rearrangement plan table 720 and
rearranges the segments A0 to A31 of the volume 1. In this case,
the segments A8 to A15 and A24 to A31 of the volume 1 are
rearranged in the SU #2 (in FIG. 13, "under rearrangement").
[0146] The PU #1 refers to the rearrangement plan table 720 and
rearranges the segments B0 to B15 of the volume 2. In this case,
the segments B8 to B15 of the volume 2 are rearranged into the SU
#2 (in FIG. 13, "after rearrangement"). Thereby, the physical
capacities are equalized between the SUs #1 and #2.
[0147] Although the use state of the LUNs seems to be discrete, no
problem arises with regard to performance when the volume is
configured by the segments that are wide-striped. The discreteness
of the use state of the LUNs does not affect performance.
Therefore, to also avoid unnecessary transfer, not transferring the
segments A16 to A23 and B0 to B7 reduces wasteful processing for
the apparatus.
[0148] For a case where the rearrangement process is desired to be
temporarily suspended due to maintenance and inspection, etc., of
the PUs or the SUs, the storage system SM has a temporary
suspension function and a restart function for the rearrangement
process, and also has a suspension function for the rearrangement
process. However, when the rearrangement process is suspended, the
created rearrangement plan has to be discarded and when the
rearrangement is again executed, processes are executed for
determining unevenness of the allocated amount q of the segments of
each LUN in the storage system SM and for creating a re-plan
table.
[0149] In contrast, when the temporary suspension function is used,
during the temporary suspension of the rearrangement, the PU #1
does not discard and retains the rearrangement plan established for
the rearrangement of the volumes. When the PU #1 receives a restart
instruction, the PU #1 refers to the volume management table 710
and the rearrangement plan table 720, and continues the
rearrangement process from the entry to be restarted with. Thereby,
temporary suspension during rearrangement and restart from the
temporary suspension are enabled.
[0150] An example of updating of each of the tables 710 and 720
will be described with reference to FIGS. 14 to 17.
[0151] FIG. 14 is an explanatory diagram of an example of
arrangement of the volumes. In FIG. 14, the segments #1 to #16
constituting the VDISK 1 are arranged in the SU #1. The additional
node N2 including the PU #3 and the SU #2 as the set is added to
the base node N1. In this case, unevenness occurs in the allocated
amounts q of the segments of each LUN in the storage system SM and
therefore, the rearrangement process is executed for the VDISK
1.
[0152] FIG. 15 is an explanatory diagram of an example of updating
of the volume index table 800. In FIG. 15, when the additional node
N2 is added to the base node N1, the rearrangement state in volume
information 1500-1 in the volume index table 800 is updated from
"awaiting rearrangement" to "under rearrangement".
[0153] FIG. 16 is an explanatory diagram of an example of updating
of the volume segment table 1000. In (16-1) of FIG. 16, the volume
segment table 1000 stores the segment information on the VDISK 1 of
the volume number "1". Eight segments constituting the segment set
basically are sequentially arranged in eight different LUNs (for
example, indicated by thick-lined frames in FIG. 16).
[0154] However, when a volume is present that was created before
the VDISK 1 is created, a LUN may be allocated to the VDISK 1 at
the timing at which the LUN allocated to the volume is deleted. In
this case, when LUNs with serial numbers are not empty, the LUNs to
be arranged with the segments constituting the segment set may not
be equalized (for example, indicated by a dotted line frame in FIG.
16).
[0155] In (16-2) of FIG. 16, when the rearrangement plan table 720
(for example, see FIG. 17 described later) is created, the volume
manager M of the PU #1 sets the rearrangement state in the volume
segment table 1000 to be "during rearrangement process", refers to
the rearrangement plan table 720, creates the rearrangement control
thread for each PU for the segments whose rearrangement states are
each "awaiting rearrangement", and executes the rearrangement.
[0156] The rearrangement control thread of each PU instructs the
volume manager A to execute a copying process for the segments to
be rearranged in the rearrangement. When the volume manager A
responds to the rearrangement control thread indicating the
completion of the copying, the volume manager M changes the
rearrangement state in the rearrangement plan table 720 to
"rearrangement completed" and also changes the rearrangement state
of the volume segment table 1000 to "blank".
[0157] FIG. 17 is an explanatory diagram of an example of updating
of the rearrangement plan table 720. In (17-1) of FIG. 17, the PU
#1 allocates the segment sets of the VDISK 1 to all the SUs using
each segment set as a unit; also allocates the segments thereto;
and establishes a plan to allocate the LUN numbers using serial LUN
numbers. For example, the PU #1 plans to allocate the even-numbered
segments to consecutive LUNs of the SU #2 and also plans to
allocate the odd-numbered segments to consecutive LUNs of the SU
#1.
[0158] In (17-2) of FIG. 17, the PU #1 mechanically establishes a
rearrangement plan as above and thereafter, reviews the
rearrangement plan to reduce the copying processes for data
transfer consequent to the rearrangement. For example, the PU #1
compares for each segment set, the current state (the current
storage apparatuses and the current LUN numbers) and the state
after the rearrangement (the rearranged storage apparatuses and the
rearranged LUN numbers).
[0159] For example, for the segment set number "3", the current LUN
numbers do not match with all the rearranged LUN numbers while, for
the current LUN numbers, the segments are each allocated to a LUN
that is different from each other except two segments that are
allocated to the same LUN number. Therefore, the PU #1 again
creates a plan such that the two segments allocated to the same LUN
are allocated to different LUNs (although, for the LUN numbers, the
order is not same as that of the segments, it is determined that
the performance is not affected when the segments are each
allocated to a LUN that is different from each other).
[0160] For the segments to be rearranged, the PU #1 sets the
rearrangement state thereof to be "awaiting rearrangement" and sets
the rearrangement state in the volume segment table 1000 to be
"during rearrangement process". For the segments whose
rearrangement is unnecessary, the PU #1 sets the rearrangement
state in the rearrangement plan table 720 to be "rearrangement
unnecessary".
[0161] The rearrangement plan that reduces the copying processes to
transfer the segments consequent to rearrangement in the same SU
will be described. The plan for rearrangement in the same SU is
created, for example, after the rearrangement plan for SUs is
executed to equalize the allocated amounts q of the segments of
each LUN among the SUs.
[0162] FIG. 18 is an explanatory diagram (Part II) of the example
of rearrangement of the volumes. In FIG. 18, it is assumed that a
segment set constituted of segments "a" to "p" is disposed as that
"before the rearrangement" in an SU (for example, the SU #1). In
this case, for example, the PU #1 can line up the segments a to h
using an unused area of segments (white squares in FIG. 18) and
segments made unused by the transfer of the segments, as that of
"rearrangement proposal". In FIG. 18, the black squares each
represent a used area of the segment.
[0163] However, the RAID groups to which the segments "a" to d and
e to g are arranged differ from each other and therefore, when only
the segment h is transferred to the other RAID group, the
performance is sufficiently improved from the viewpoint of the IO
access performance. Therefore, the PU #1 establishes a
rearrangement plan to transfer only the segment h.
[0164] As a result, only the segment h is transferred to another
LUN as represented by "after rearrangement" and the segments are
equalized among the LUNs. In this manner, the rearrangement plan is
established to reduce the copying processes to transfer the
segments consequent to the rearrangement in the same SU and
thereby, the access performance can be improved suppressing extra
transfers of the segments. Access between the PU and SU can be
reduced for the rearrangement.
[0165] An example of the details of the rearrangement process of
the volumes will be described. Each PU controls the rearrangement
process such that the business operation is not obstructed by the
use of internal line bandwidth for moving segments and the
influence of the CPU loads of the PUs, caused by the rearrangement
process.
[0166] For example, the PU (for example, the PU #1) determines
whether the number of accesses per second from the business server
BS (input output per second: IOPS) reaches the maximal IOPS that
can be processed by the PU. If the PU determines that the IOPS from
the business server BS reaches the maximal IOPS, the PU does not
execute the rearrangement process and prioritizes the business
IOs.
[0167] On the other hand, if the PU determines that the IOPS from
the business server BS has not reached the maximal IOPS, the PU
executes the rearrangement process using an unused portion of the
IOPS. The "unused portion of the IOPS" refers to the portion
acquired by subtracting the current IOPS from the maximal IOPS.
Thereby, the rearrangement of the volumes can be executed
minimizing the influence on the business operation and without
discontinuing the business operation.
[0168] However, it can be considered that the IOPS of the business
IO is reduced by the use of the bandwidth between the PU and SU,
and by the increase of the CPU loads of the PU caused by the
execution of the rearrangement process. The PU may enable the
business IOPS to be maintained by not only monitoring the
difference between the current IOPS and the maximal IOPS of the PU
but also by thinning the rearrangement process when the reduction
rate of the current IOPS exceeds a predetermined rate (for example,
15[%]) due to the rearrangement process.
[0169] For example, as below, when the current IOPS is greater than
or equal to 95% of the maximal IOPS, the PU may insert a sleep (for
example, waiting for about one to five [sec]) into the process, to
suppress the rearrangement process. "x" represents the maximal IOPS
of the PU and "y" represents the current IOPS. The maximal IOPS of
the PU is set in advance.
[0170] 0.95x.gtoreq. The rearrangement process is caused to
sleep.
095> The rearrangement process is operated.
[0171] When the PU causes the rearrangement to operate and thereby,
"0.85 (immediately previous y).ltoreq.(the current y)" is
established, the PU inserts the sleep into the rearrangement
process and thereby, suppresses the affect on the business
operation. The "immediately previous y" is, for example, the
current IOPS acquired immediately before the rearrangement
process.
[0172] Procedures for various processes of the storage system SM
according to the first example will be described. The procedure for
a node addition process for the storage system SM will be
described. The procedure for the node addition process for the
storage system SM will be described taking an example of a case
where the additional node N2 is added to the base node N1.
[0173] FIGS. 19 and 20 are sequence diagrams of an example of the
procedure for the node addition process for the storage system SM.
In the sequence diagram of FIG. 19, a customer engineer (CE)
physically connects the additional node N2 to the base node N1 and
turns on the power of the SU #2 (step S1901).
[0174] The cluster control M detects the addition of the SU #2
(step S1902) and notifies the apparatus management GUI control unit
#1 of the detection of the addition of the SU #2 (step S1903). The
apparatus management GUI control unit #1 outputs an SU addition
detection event to the GUI of the management server MS (step
S1904).
[0175] The cluster control M instructs the storage control unit #2
of the SU #2 to allocate a new management IP address (step S1905).
The storage control unit #2 of the SU #2 sets the value of the
management IP address to be the instructed value (step S1906). The
cluster control M instructs the SU control units #1 and #2
respectively of the PUs #1 and #2 to establish connections to the
SU #2 (step S1907).
[0176] The SU control unit #1 detects the LUN for the management DB
and the LUN for user data of the SU #2 (step S1908). The SU control
unit #2 detects the LUN for the management DB and the LUN for the
user data of the SU #2 (step S1909). The SU control unit #1
executes a log-in process for the detected LUNs (step S1910). The
SU control unit #2 executes the log-in process for the detected
LUNs (step S1911).
[0177] The SU control unit #1 notifies the cluster control M of the
completion of the connection to the SU #2 (step S1912). The SU
control unit #2 notifies the cluster control M of the completion of
the connection to the SU #2 (step S1913). The cluster control M
notifies the apparatus management GUI control unit #1 of the
completion of the addition of the SU #2 (step S1914). The apparatus
management GUI control unit #1 outputs an SU addition completion
event to the GUI of the management server MS (step S1915).
[0178] The CE turns on the power of the PU #3 of the additional
node N2 (step S1916). When the cluster control M detects the
addition of the PU #3, the cluster control M notifies the apparatus
management GUI control unit #1 of the detection of the addition of
the PU #3 (step S1917) and outputs a PU detection event to the GUI
of the management server MS (step S1918).
[0179] In the sequence diagram of FIG. 20, the cluster control M
instructs the PU control unit #3 to set an IP address for the
detected PU #3 (step S1919). The PU control unit #3 changes the IP
address to a management IP address (step S1920). The cluster
control M instructs the SU control unit #3 of the PU #3 to
establish a connection to the SUs #1 and #2 (step S1921).
[0180] The SU control unit #3 detects the LUN for the management DB
and the LUN for the user data of the SU #1 (step S1922), executes
the log-in process for the detected LUNs (step S1923), and detects
the LUN for the management DB and the LUN for the user data of the
SU #2 (step S1924).
[0181] The SU control unit #3 executes the log-in process for the
detected LUNs (step S1925) and notifies the cluster control M of
the completion of the connection to the SUs #1 and #2 (step S1926).
The cluster control M instructs the cluster control units #2 and #3
respectively of the PUs #2 and #3 to change the cluster (step
S1927).
[0182] The cluster control M incorporates the PU #3 into the
cluster management information and thereby, updates the cluster
configuration to that including the PUs #1, #2, and #3 (step
S1928). The cluster control unit #2 incorporates the PU #3 into the
cluster management information and thereby, updates the cluster
configuration to that including the PUs #1, #2, and #3 (step
S1929). The cluster control unit #3 incorporates the PU #3 into the
cluster management information and thereby, updates the cluster
configuration to that including the PUs #1, #2, and #3 (step
S1930).
[0183] The cluster control M notifies the apparatus management GUI
control unit #1 of the completion of the addition of the PU #3
(step S1931). The apparatus management GUI control unit #1 outputs
a PU addition completion event to the GUI of the management server
MS (step S1932) and outputs a scale out button to the GUI of the
management server MS (step S1933).
[0184] When a user clicks the "scale out button" on the GUI of the
management server MS, meaning that the user approves the completion
of the connection and the internal apparatus incorporation, the
addition process is completed. Consequent to the completion
instruction for the scale out, the storage capacity as the storage
system SM can be increased by an amount of the SU #2 and new data
can also be stored in the SU #2.
[0185] A procedure for the rearrangement process of the storage
system SM will be described. The rearrangement process is executed,
for example, after the scale out of the storage system SM is
completed or when a rearrangement instruction is issued from the
GUI screen on the management server MS.
[0186] FIGS. 21, 22, 23, and 24 are sequence diagrams of an example
of the procedure for the rearrangement process for the storage
system SM. In the sequence diagram of FIG. 21, the apparatus
management GUI of the management server MS notifies the apparatus
management GUI control unit #1 of the PU #1 of a scale out
instruction or a rearrangement instruction (step S2101).
Notification of a scale out instruction is given, for example, when
the "scale out button" on the GUI screen is clicked. Notification
of a rearrangement instruction is given, for example, when a
"rearrangement button" on the GUI screen is clicked.
[0187] The apparatus management GUI control unit #1 of the PU #1
determines whether the apparatus management GUI control unit #1 has
received a scale out instruction (step S2102). If the apparatus
management GUI control unit #1 determines that the apparatus
management GUI control unit #1 has received a scale out instruction
(step S2102: YES), the apparatus management GUI control unit #1
notifies the volume manager M of the scale out instruction and the
volume manager M adds a capacity corresponding to that of the added
SU #2 to the overall capacity of the storage system SM and thereby,
sets the area of the SU #2 to also be available (step S2103).
[0188] On the other hand, if the apparatus management GUI control
unit #1 determines that the apparatus management GUI control unit
#1 has received a rearrangement instruction (step S2102: NO), the
apparatus management GUI control unit #1 notifies the volume
manager M of the rearrangement instruction (step S2104). The volume
manager M reads the volume management table 710 from the
configuration management DB (step S2105).
[0189] In the sequence diagram of FIG. 22, the volume manager M
refers to the volume management table 710 and calculates the
allocated amount q of the segments of each LUN in the storage
system SM (step S2106). The volume manager M determines whether
unevenness of the allocated amount q of the segments of each LUN in
the storage system SM is present, based on the calculated allocated
amount q of the segments of each LUN (step S2107).
[0190] If the volume manager M determines that no unevenness is
present (step S2107: NO), the volume manager M progresses to the
process at step S2119 of FIG. 24. On the other hand, if the volume
manager M determines that unevenness is present (step S2107: YES),
the volume manager M establishes a rearrangement plan and creates
the rearrangement plan table 720 (step S2108). In this case, the
volume manager M sets the rearrangement state in the rearrangement
plan table 720 to be "awaiting rearrangement" and also sets the
rearrangement state in the volume segment table 1000 to be "during
rearrangement process", for the segments to be rearranged.
[0191] The volume manager M refers to the rearrangement plan table
720 and creates the rearrangement control threads for the PUs #1,
#2, and #3 (step S2109). The rearrangement control threads for the
PUs #1, #2, and #3 instruct the volume managers A #1, #2, and #3 of
the PUs #1, #2, and #3 to rearrange segment-by-segment, the volumes
instructed by the volume manager M (step S2110).
[0192] For example, the rearrangement control threads for the PUs
#1, #2, and #3 respectively notify the volume managers A #1, #2,
and #3 of the PUs #1, #2, and #3 of information concerning disks
that are to be rearranged (information to identify the disks, and
the segments to be transferred) and information concerning the
destinations to which the segments are to be transferred.
[0193] In the sequence diagram of FIG. 23, the volume managers A
#1, #2, and #3 determine whether the current IOPS of the business
IO is greater than or equal to 95% of the maximal IOPS (step
S2111). If the volume managers A #1, #2, and #3 determine that the
current IOPS is greater than or equal to 95% of the maximal IOPS
(step S2111: YES), the volume managers A #1, #2, and #3 sleep for a
specific time period (step S2112) and return to the process at step
S2111.
[0194] On the other hand, if the volume managers A #1, #2, and #3
determine that the current IOPS is lower than 95% of the maximal
IOPS (step S2111: NO), the volume managers A #1, #2, and #3 execute
copying of the segments according to the instruction and thereby,
update the instructed segments (step S2113).
[0195] The volume managers A #1, #2, and #3 determine whether the
IOPS of the business IO decreases by 15% (step S2114). If the
volume managers A #1, #2, and #3 determine that the IOPS decreases
by 15% (step S2114: YES), the volume managers A #1, #2, and #3
sleep for a specific time period (step S2115) and return to the
process at step S2114.
[0196] On the other hand, if the volume managers A #1, #2, and #3
determine that the IOPS does not decrease by 15% (step S2114: NO),
the volume managers A #1, #2, and #3 respectively give to the
rearrangement control threads of the PUs #1, #2, and #3 that
requested instruction, notification of the completion of the
copying (step S2116). The rearrangement control threads receive the
notification of the completion of the copying and set the
rearrangement state in the rearrangement plan table 720 to be
"rearrangement completed" and the rearrangement state in the volume
segment table 1000 to be blank, for the segments whose copying is
completed.
[0197] In the sequence diagram of FIG. 24, the rearrangement
control threads of the PUs #1, #2, and #3 refer to the
rearrangement plan table 720 and determine whether any segment
remains that has not been rearranged (step S2117). If the
rearrangement control threads determine that such a segment is
present (step S2117: YES), the rearrangement control threads of the
PUs #1, #2, and #3 return to the process at step S2110 depicted in
FIG. 23.
[0198] On the other hand, if the rearrangement control threads
determine that no such segment is present (step S2117: NO), the
rearrangement control threads of the PUs #1, #2, and #3 notify the
volume manager M of the completion of the rearrangement of the
volumes. The volume manager M refers to the rearrangement plan
table 720 and determines whether any unprocessed volume is present
(step S2118).
[0199] If the volume manager M determines that an unprocessed
volume is present (step S2118: YES), the volume manager M returns
to the process at step S2109 depicted in FIG. 22. On the other
hand, if the volume manager M determines that no unprocessed volume
is present (step S2118: NO), the volume manager M determines
whether the storage system SM has started operation triggered by
the rearrangement instruction (step S2119).
[0200] If the volume manager M determines that the storage system
SM has started operation triggered by the rearrangement instruction
(step S2119: YES), the storage system SM causes the series of
operations to come to an end. On the other hand, if the volume
manager M determines that the storage system SM has started
operation triggered by the scale out instruction (step S2119: NO),
the volume manager M sleeps for a specific time period (step S2120)
and returns to the process at step S2105 depicted in FIG. 21.
[0201] Thus, rearrangement of the volumes can be executed such that
the allocated amount q of the segments of each LUN is equalized
among the SUs. When the storage system SM starts operation
triggered by a scale out instruction, it can be periodically
determined whether any unevenness of the allocated amount q of the
segments of each LUN in the storage system SM is present, and
rearrangement of the volumes can be executed.
[0202] A procedure for a rearrangement suspension process for the
storage system SM will be described. A procedure for the
rearrangement suspension process will be described that is executed
when the user of the management server MS issues a suspension
instruction for the rearrangement process.
[0203] FIG. 25 is a sequence diagram of an example of the procedure
for a first rearrangement suspension process for the storage system
SM. In the sequence diagram of FIG. 25, when the apparatus
management GUI of the management server MS receives a suspension
instruction for the rearrangement process, the apparatus management
GUI notifies the apparatus management GUI control unit #1 of the PU
#1 of the suspension instruction for the rearrangement process
(step S2501).
[0204] When the apparatus management GUI control unit #1 receives
the suspension instruction for the rearrangement process, the
apparatus management GUI control unit #1 notifies the volume
manager M of the suspension instruction for the rearrangement
process (step S2502). The volume manager M changes the
rearrangement states in the volume management table 710 and the
rearrangement plan table 720 to "rearrangement completed" (step
S2503).
[0205] The volume manager M gives to the rearrangement control
threads of the PUs #1, #2, and #3 executing the rearrangement
process, notification of the suspension instruction for the
rearrangement process (step S2504). The rearrangement control
threads of the PUs #1, #2, and #3 suspend the rearrangement process
currently under execution (step S2505). The volume manager M
discards the rearrangement plan table 720 (step S2506) and the
storage system SM causes the series of operations to come to an
end. Thus, the user of the management server MS can suspend, at an
arbitrary timing, the rearrangement process currently under
execution.
[0206] A procedure will described for the rearrangement suspension
process for the storage system SM executed when an event for
suspension of the rearrangement occurs. An event for suspension of
the rearrangement can be, for example, execution of a new scale out
session, stoppage of the RAID group, and deletion of a LUN in the
SU.
[0207] FIG. 26 is a sequence diagram of an example of a procedure
for a second rearrangement suspension process of the storage system
SM. In the sequence diagram of FIG. 26, when an event occurs for
suspension of the rearrangement, the volume manager M changes the
rearrangement states in the volume management table 710 and the
rearrangement plan table 720 to "rearrangement completed" (step
S2601).
[0208] The volume manager M gives to the rearrangement control
threads of the PUs #1, #2, and #3 currently executing the
rearrangement process, notification of the suspension instruction
for the rearrangement process (step S2602). The rearrangement
control threads of the PUs #1, #2, and #3 suspend the rearrangement
process currently under execution (step S2603). The volume manager
M discards the rearrangement plan table 720 (step S2604) and the
storage system SM causes the series of operations to come to an
end. Thus, when an event for suspension of the rearrangement
occurs, the rearrangement process currently under execution can be
suspended.
[0209] A procedure for a temporary rearrangement suspension process
for the storage system SM will be described. The temporary
rearrangement suspension process is executed, for example, when
temporary suspension of the rearrangement process is desired
consequent to maintenance and inspection, etc., of the PUs or the
SUs.
[0210] FIG. 27 is a sequence diagram of an example of the procedure
for the temporary rearrangement suspension process of the storage
system SM. In the sequence diagram of FIG. 27, when the apparatus
management GUI of the management server MS receives a temporary
suspension instruction for the rearrangement process, the apparatus
management GUI gives to the apparatus management GUI control unit
#1 of the PU #1, notification of the temporary suspension
instruction for the rearrangement process (step S2701).
[0211] When the apparatus management GUI control unit #1 receives
the temporary suspension instruction for the rearrangement process,
the apparatus management GUI control unit #1 notifies the volume
manager M of the temporary suspension instruction for the
rearrangement process (step S2702). The volume manager M changes
the rearrangement state to "temporarily suspended" for the entry
whose rearrangement state is "under rearrangement" in each of the
volume management table 710 and the rearrangement plan table 720
(step S2703).
[0212] The volume manager M gives to the rearrangement control
threads of the PUs #1, #2, and #3 currently executing the
rearrangement process, notification of the suspension instruction
for the rearrangement process (step S2704). The rearrangement
control threads of the PUs #1, #2, and #3 suspend the rearrangement
process currently under execution (step S2705) and the storage
system SM causes the series of operations to come to an end. Thus,
at an arbitrary timing, the user of the management server MS can
temporarily suspend the rearrangement process currently under
execution.
[0213] A procedure for a rearrangement restart process for the
storage system SM will be described. The rearrangement restart
process is executed when the rearrangement is restarted after the
rearrangement process is temporarily suspended consequent to
maintenance and inspection, etc., of the PUs or the SUs.
[0214] FIG. 28 is a sequence diagram of an example of the procedure
for the rearrangement restart process for the storage system SM. In
the sequence diagram of FIG. 28, when the apparatus management GUI
of the management server MS receives a restart instruction for the
rearrangement process, the apparatus management GUI gives to the
apparatus management GUI control unit #1 of the PU #1, notification
of the restart instruction for the rearrangement process (step
S2801).
[0215] When the apparatus management GUI control unit #1 receives
the restart instruction for the rearrangement process, the
apparatus management GUI control unit #1 notifies the volume
manager M of the restart instruction for the rearrangement process
(step S2802). The volume manager M searches for an entry whose
rearrangement state is "temporarily suspended" in both the volume
management table 710 and the rearrangement plan table 720 (step
S2803) and progresses to the process at step S2109 depicted in FIG.
22. Thereby, the user of the management server MS can restart the
temporarily suspended rearrangement process at an arbitrary
timing.
[0216] As described, according to the storage system SM according
to the first example, data stored before the scale out can also be
reallocated in the overall SU in the storage system SM. Thereby,
improvement can be facilitated of the access performance that
corresponds to the potential of the storage system SM after the
scale out.
[0217] A second example of the storage system SM according to the
embodiment will be described. Portions identical to those described
in the first example will not again be depicted or described.
[0218] FIG. 29 is an explanatory diagram of an example of system
configuration of the storage system SM according to the second
example. In FIG. 29, the storage system SM includes a transfer
source storage apparatus 2901 and a transfer destination storage
apparatus 2902. The transfer destination storage apparatus 2902
corresponds to, for example, the base node N1 (or the base node N1
and the additional node N2) depicted in FIG. 7, and is connected to
the business server BS. The transfer source storage apparatus 2901
and the transfer destination storage apparatus 2902 are connected
to each other through, for example, an I/O LAN.
[0219] For example, a connection port of the transfer source
storage apparatus 2901 for the business server BS is connected to a
data transfer port of the transfer destination storage apparatus
2902. Thereby, data exchanges can be executed between the transfer
source storage apparatus 2901 and the transfer destination storage
apparatus 2902 without adding any special I/O port to the transfer
source storage apparatus 2901.
[0220] For example, the user of the management server MS sets the
volume to be transferred of the transfer source storage apparatus
2901 to be able to access the transfer destination storage
apparatus 2902 such that the transfer destination storage apparatus
2902 can access the transfer source storage apparatus 2901. The
transfer destination storage apparatus 2902 accesses the volume to
be transferred of the transfer source storage apparatus 2901,
autonomously creates a volume corresponding to the volume to be
transferred in the transfer destination, and copies the pieces of
data between the volumes.
[0221] The business server BS executes the business IO for the
volume of the transfer destination storage apparatus 2902. If the
transfer destination storage apparatus 2902 receives a read
instruction for data not present in the volume of the transfer
destination, the transfer destination storage apparatus 2902 reads
the corresponding data from the corresponding volume of the
transfer source storage apparatus 2901, transmits the data to the
business server BS, and stores the data into the corresponding
volume of the transfer destination storage apparatus 2902.
[0222] When the data is updated, the transfer destination storage
apparatus 2902 updates the data for the volume of the transfer
destination storage apparatus 2902. In this case, the transfer
destination storage apparatus 2902 may reflect the data to be
updated also on the transfer source storage apparatus 2901.
Thereby, the data transfer is completed between the finally
corresponded volumes.
[0223] FIG. 30 is an explanatory diagram of an example of
functional configuration of the PU according to the second example.
In FIG. 30, the storage system SM includes the base node N1 and the
additional node N2. The base node N1 includes the PUs #1 and #2,
and the SU #1. The additional node N2 includes the PU #3 and the SU
#2.
[0224] The PUs #1 and #2 in the base node N1, the PU #3 in the
additional node N2, and the management server MS are connected to
each other through the management LAN. The PUs #1 and #2 in the
base node N1, the PU #3 in the additional node N2, and the business
server BS are connected to each other through the I/O LAN.
[0225] The PUs #1, #2, and #3, and the SUs #1 and #2 are connected
to each other through the I/O LAN and the internal management LAN.
The SUs #1 and #2 respectively include the configuration management
DBs #1 and #2, and the storage control units #1 and #2. The
configuration management DBs #1 and #2 each include the volume
management table 710 and the rearrangement plan table 720.
[0226] The PU #1 includes the I/O control unit #1, the PU control
unit #1, the cluster control M, the apparatus management GUI
control unit #1, a transfer VOL control unit #1, a data transfer
control unit #1, the PU load monitoring unit #1, the SU control
unit #1, and the volume managers M and A #1. The PU #2 includes the
I/O control unit #2, the PU control unit #2, the cluster control
unit #2, the PU load monitoring unit #2, the SU control unit #2,
and the volume manager A #2. The PU #3 includes the I/O control
unit #3, the PU control unit #3, the cluster control unit #3, the
PU load monitoring unit #3, the SU control unit #3, and the volume
manager A #3.
[0227] The transfer VOL control unit #1 reads the volume
information concerning the transfer source storage apparatus 2901
and creates a volume of the transfer destination. In this creation,
the transfer VOL control unit #1 arranges volumes such that the
number of created volumes is equalized taking into consideration,
for example, the load balance among the PUs #1 to #3 and the SUs #1
and #2. When data are transferred during the rearrangement, the
transfer VOL control unit #1, for example, refers to the
rearrangement plan table 720 and arranges the volumes to equalize
the number of created volumes. The data transfer control unit #1
controls the transfer of data between the storage apparatuses.
[0228] The contents of a transfer source/destination volume
correspondence table 3100 used by the transfer destination storage
apparatus 2902 will be described. The transfer source/destination
volume correspondence table 3100 is information indicating which
volume of the transfer destination storage apparatus 2902, a volume
of the transfer source storage apparatus 2901 corresponds to. The
transfer source/destination volume correspondence table 3100 is
correlated with the volume management table 710 and the
rearrangement plan table 720 using the volume numbers as
indexes.
[0229] FIG. 31 is an explanatory diagram of an example of the
contents of the transfer source/destination volume correspondence
table 3100. In FIG. 31, the transfer source/destination volume
correspondence table 3100 has fields for the transfer source target
ID, the transfer source LUN number, the transfer source size, the
transfer destination target ID, the transfer destination LUN
number, and the volume number. By setting information in each of
the fields, transfer source/destination correspondence information
(for example, transfer source/destination correspondence
information 3100-1 to 3100-n) is stored as records.
[0230] The transfer source target ID is an identifier of the volume
of the transfer source. The transfer source LUN number is the LUN
number of the volume of the transfer source. The transfer source
size is the size (bytes) of the volume of the transfer source. The
transfer destination target ID is an identifier of the volume of
the transfer destination. The transfer destination LUN number is
the LUN number of the volume of the transfer destination. The
volume number is the volume (VDISK) number. The volume number may
be included in the transfer destination Target ID.
[0231] The transfer destination storage apparatus 2902 makes an
inquiry to the transfer source storage apparatus 2901 about the
volumes that are present therein and thereby, stores the inquiry
result into the transfer source/destination volume correspondence
table 3100, for each volume as one entry. For example, the transfer
destination storage apparatus 2902 stores into the transfer
source/destination volume correspondence table 3100, the transfer
destination Target ID, the transfer destination LUN number, and the
volume number that correspond to the transfer source target ID, the
transfer source LUN number, and the transfer source size.
[0232] When the volumes of the transfer destination are created,
the volume information concerning each of the volumes is registered
into the volume index table 800. As to a segment allocated to any
one SU among the segments constituting the volume, segment
information concerning the segment is registered into the volume
segment table 1000. Therefore, when scale out is executed for the
transfer destination storage apparatus 2902 during data transfer,
the data are also rearranged to establish capacity balance among
the real storage apparatus added by the scale out and the existing
real storage.
[0233] A procedure for a data transfer process of the storage
system SM according to the second example will be described. The
description will be made with reference to FIGS. 32 and 33 that
depict only the portions corresponding to the base node N1 as the
transfer destination storage apparatus 2902.
[0234] FIGS. 32 and 33 are sequence diagrams of an example of the
procedure for the data transfer process of the storage system SM.
In the sequence diagram of FIG. 32, the CE connects the transfer
destination storage apparatus 2902 to the I/O LAN to which the
transfer source storage apparatus 2901 is connected (step S3201).
As a result, the storage control unit #3 of the transfer source
storage apparatus 2901 connects the transfer source storage
apparatus 2901 and the transfer destination storage apparatus
2902.
[0235] The CE disconnects the logical connection to the business
server BS from the transfer source storage apparatus 2901 (step
S3202). The storage control unit #3 of the transfer source storage
apparatus 2901 changes access privilege for the volume that is to
be transferred of the transfer source storage apparatus 2901, from
the business server BS to the transfer destination storage
apparatus 2902 (step S3203).
[0236] The management server MS instructs the transfer destination
storage apparatus 2902 through the apparatus management GUI, to
read the volume information of the transfer source storage
apparatus 2901 (step S3204). The transfer VOL control unit #1 of
the transfer destination storage apparatus 2902 reads the volume
information of the transfer source storage apparatus 2901 (step
S3205).
[0237] The transfer VOL control unit #1 creates the transfer
source/destination volume correspondence table 3100 based on the
read volume information (step S3206). In FIG. 32, the transfer
source/destination volume correspondence table 3100 is simply
labeled as "correspondence table". The transfer VOL control unit #1
refers to the transfer source/destination volume correspondence
table 3100 and instructs the volume manager M to create a volume
having the same size as that of the volume of the transfer source
(step S3207).
[0238] The volume manager M evenly allocates the volumes to be
created to the PUs #1 and #2 (step S3208). As a result, the volume
manager M gives to the volume managers A #1 and #2 of the PUs #1
and #2, notification of a creation instruction for the volumes. The
volume managers A #1 and #2 instruct the storage control unit #1 of
each SU #1 to allocate the segments of the volumes to be created
(step S3209).
[0239] Consequently, the storage control unit #1 of each SU #1
writes the designated segment data into the LUN #n and gives to the
volume managers A #1 and #2 of the request source, notification of
the allocation completion notification. The volume managers A #1
and #2 notifies the volume manager M of the VOL creation completion
notification and the volume manager M notifies the transfer VOL
control unit #1 of the transfer VOL creation completion
notification.
[0240] The transfer VOL control unit #1 determines whether the
creation of the volumes of the transfer destination has been
completed (step S3210). If the transfer VOL control unit #1
determines that the creation of the volumes of the transfer
destination has not been completed (step S3210: NO), the transfer
VOL control unit #1 returns to the process at step S3207. On the
other hand, if the transfer VOL control unit #1 determines that the
creation of the volumes of the transfer destination has been
completed (step S3210: YES), the transfer VOL control unit #1
publishes the transfer source/destination volume correspondence
table 3100 such that the table 3100 can be referred to from the
apparatus management GUI of the management server MS (step
S3211).
[0241] The transfer VOL control unit #1 instructs each of the PUs
#1 and #2 to which the volumes of the transfer destination are
allocated, to read the data to be transferred from the transfer
source storage apparatus 2901 (step S3212). As a result, the data
transfer control unit #1 not depicted starts a process of reading
the data from the volumes of the transfer source of the transfer
source storage apparatus 2901 and writing the read data into the
volumes of the transfer destination.
[0242] The data length (channel size) used in the data transfer
process is, for example, 256 [KB]. However, the chunk size does not
need to be fixed at 256 [KB] and may be variable in each case
corresponding to the transfer efficiency, or may be a size such as
1 [MB].
[0243] The transfer VOL control unit #1 determines whether the data
transfer to the volumes of the transfer destination has been
completed (step S3213). If the transfer VOL control unit #1
determines that the data transfer to the volumes of the transfer
destination has not been completed (step S3213: NO), the transfer
VOL control unit #1 returns to the process at step S3212. On the
other hand, if the transfer VOL control unit #1 determines that the
data transfer to the volumes of the transfer destination has been
completed (step S3213: YES), the transfer VOL control unit #1
connects the business server BS to the transfer destination storage
apparatus 2901 in the sequence diagram of FIG. 33 (step S3214).
[0244] The management server MS makes a setting in the business
server BS to enable access to the volumes of the transfer
destination, based on the transfer source/destination volume
correspondence table 3100 that can be referred to from the
apparatus management GUI (step S3215). This setting refers to a
setting for the business server BS to normally access the volumes
(for example, a setting to recognize a device, to bundle recognized
devices using a multi-path, etc.).
[0245] When the transfer destination storage apparatus 2902
receives from the business server BS, a read access for the volumes
of the transfer destination in the transfer destination storage
apparatus 2902, the transfer destination storage apparatus 2902
determines whether the IO area is a transferred area (step S3216).
If the transfer destination storage apparatus 2902 determines that
the IO area is a transferred area (step S3216: YES), the transfer
destination storage apparatus 2902 reads the data from the volumes
of the transfer destination and responds to the business server
BS.
[0246] On the other hand, if the transfer destination storage
apparatus 2902 determines that the IO area is not a transferred
area (step S3216: NO), the transfer destination storage apparatus
2902 reads the data from the corresponding volumes of the transfer
source storage apparatus 2901 (step S3217), writes the read data
into the volumes of the transfer destination, and responds to the
business server BS.
[0247] When the transfer destination storage apparatus 2902
receives from the business server BS, a write access for the
volumes of the transfer destination in the transfer destination
storage apparatus 2902, the transfer destination storage apparatus
2902 determines whether the IO area is a transferred area (step
S3216). If the transfer destination storage apparatus 2902
determines that the IO area is a transferred area (step S3216:
YES), the transfer destination storage apparatus 2902 writes the
data into the volumes of the transfer destination and responds to
the business server BS.
[0248] On the other hand, in a case where the transfer destination
storage apparatus 2902 determines that the IO area is not a
transferred area (step S3216: NO), if data complement is necessary,
the transfer destination storage apparatus 2902 reads from the
transfer source storage apparatus 2901, data of the portion for
which the complement is necessary, merges the read data with the
data from the business server BS, and writes the merged data into
the volumes of the transfer destination. If the data complement is
unnecessary, the transfer destination storage apparatus 2902 writes
the data from the business server BS into the volumes of the
transfer destination.
[0249] The data complement is necessary when the data length of the
data to be written is smaller than the chunk size of 256 [KB]
employed for the data transfer from the transfer source storage
apparatus 2901 to the transfer destination storage apparatus 2902.
The data complement is also necessary for the data areas that are
not determined when the data length of the data to be written
exceeds the chunk size.
[0250] As described, according to the storage system SM according
to the second example, when the data is transferred, the transfer
destination storage apparatus 2902 can autonomously create the
volumes that correspond to the volumes of the transfer source in
the transfer destination storage apparatus 2902. Thereby, when the
data is transferred, the work load necessary for the data transfer
can be reduced without any need to manually prepare in the transfer
destination storage apparatus 2902, the volumes that correspond to
the volumes of the transfer destination.
[0251] According to the storage system SM, when scale out is
executed for the storage system SM during the data transfer, the
data stored before the scale out and the data to be transferred can
be reallocated overall in the SUs in the storage system SM.
Thereby, improvement can be facilitated of the access performance
that corresponds to the potential of the storage system SM after
the execution of the scale out.
[0252] Although description has been made taking an example of a
case where the SUs are incorporated earlier than the PUs and the
PUs are incorporated later when the scale out is executed for the
storage system SM, the PUs may be incorporated earlier than the SUs
and the SUs may be incorporated later.
[0253] The control method described in the present embodiment may
be implemented by executing a prepared program on a computer such
as a personal computer and a workstation. The program is stored on
a non-transitory, computer-readable recording medium such as a hard
disk, a flexible disk, a CD-ROM, an MO, and a DVD, read out from
the computer-readable medium, and executed by the computer. The
program may be distributed through a network such as the
Internet.
[0254] According to an aspect of the present embodiments,
optimization of the access performance with respect to data stored
before and after a system configuration change is enabled.
[0255] All examples and conditional language provided herein are
intended for pedagogical purposes of aiding the reader in
understanding the invention and the concepts contributed by the
inventor to further the art, and are not to be construed as
limitations to such specifically recited examples and conditions,
nor does the organization of such examples in the specification
relate to a showing of the superiority and inferiority of the
invention. Although one or more embodiments of the present
invention have been described in detail, it should be understood
that the various changes, substitutions, and alterations could be
made hereto without departing from the spirit and scope of the
invention.
* * * * *