U.S. patent number 11,449,402 [Application Number 16/835,749] was granted by the patent office on 2022-09-20 for handling of offline storage disk.
This patent grant is currently assigned to EMC IP Holding Company LLC. The grantee listed for this patent is EMC IP Holding Company LLC. Invention is credited to Jibing Dong, Jian Gao, Jianbin Kang, Baote Zhuo.
United States Patent |
11,449,402 |
Zhuo , et al. |
September 20, 2022 |
Handling of offline storage disk
Abstract
Techniques for storage management involve: in response to a
first disk becoming offline and remaining offline until a first
time point, selecting a second storage slice in a second disk as a
backup storage slice for a first storage slice in the first disk,
the first storage slice being one of slices forming a redundant
array of independent disks (RAID), the slices being located in
different disks. The techniques further involve: writing, between
the first time point and a second time point, data to be written
into the first storage slice in the RAID to the second storage
slice, the second time point being later than the first time point.
The techniques further involve: in response to the first disk
remaining offline until the second time point, replacing the first
storage slice in the RAID with the second storage slice. Such
techniques may improve performance of a RAID-based storage
system.
Inventors: |
Zhuo; Baote (Beijing,
CN), Dong; Jibing (Beijing, CN), Gao;
Jian (Beijing, CN), Kang; Jianbin (Beijing,
CN) |
Applicant: |
Name |
City |
State |
Country |
Type |
EMC IP Holding Company LLC |
Hopkinton |
MA |
US |
|
|
Assignee: |
EMC IP Holding Company LLC
(Hopkinton, MA)
|
Family
ID: |
1000006570328 |
Appl.
No.: |
16/835,749 |
Filed: |
March 31, 2020 |
Prior Publication Data
|
|
|
|
Document
Identifier |
Publication Date |
|
US 20200341874 A1 |
Oct 29, 2020 |
|
Foreign Application Priority Data
|
|
|
|
|
Apr 29, 2019 [CN] |
|
|
201910357279.1 |
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F
3/0659 (20130101); G06F 3/0689 (20130101); G06F
11/2094 (20130101); G06F 3/0619 (20130101); G06F
2201/82 (20130101) |
Current International
Class: |
G06F
11/20 (20060101); G06F 3/06 (20060101) |
Field of
Search: |
;714/6.22 |
References Cited
[Referenced By]
U.S. Patent Documents
Primary Examiner: Leibovich; Yair
Attorney, Agent or Firm: BainwoodHuang
Claims
We claim:
1. A method of storage management, comprising: in response to a
first storage disk becoming offline and remaining offline until a
first time point, selecting a second storage slice in a second
storage disk as a backup storage slice for a first storage slice in
the first storage disk, the first storage slice being one of a
plurality of storage slices forming a redundant array of
independent disks (RAID), the plurality of storage slices being
located in different storage disks; writing, between the first time
point and a second time point, data to be written into the first
storage slice in the RAID to the second storage slice, the second
time point being later than the first time point; and in response
to the first storage disk remaining offline until the second time
point, replacing the first storage slice in the RAID with the
second storage slice.
2. The method of claim 1, wherein replacing the first storage slice
with the second storage slice comprises: rebuilding data in the
first storage slice into the second storage slice, using data in
other storage slices in the RAID.
3. The method of claim 2, wherein rebuilding data in the first
storage slice into the second storage slice comprises: determining
a stripe in the RAID into which data is written before the first
storage disk becoming offline; for the determined stripe, reading,
from other storage slices in the RAID, data associated with the
stripe; calculating, based on the read data, data stored in the
first storage slice associated with the stripe; and writing the
calculated data into the second storage slice.
4. The method of claim 1, further comprising: in response to the
first storage disk restoring online between the first time point
and the second time point, copying data in the second storage slice
into the first storage slice.
5. The method of claim 4, further comprising: recording a stripe in
the RAID into which data is written between the first time point
and the second time point, to perform the copying.
6. The method of claim 4, wherein copying data in the second
storage slice into the first storage slice comprises: determining a
stripe in the RAID into which data is written between the first
time point and the second time point; and copying data in the
second storage slice associated with the stripe into the first
storage slice.
7. The method of claim 1, wherein selecting the second storage
slice in the second storage disk as the backup storage slice
comprises: selecting the second storage disk from a plurality of
storage disks, such that the plurality of storage disks are evenly
used to form a plurality of RAIDs; and selecting a free storage
slice in the second storage disk as the second storage slice.
8. The method of claim 1, further comprising: suspending a write
operation to the RAID during a period from the first storage slice
becoming offline to the first time point.
9. An electronic device, comprising: at least one processor; and at
least one memory storing computer program instructions, the at
least one memory and the computer program instructions are
configured, with the at least one processor, to cause the
electronic device to: in response to a first storage disk becoming
offline and remaining offline until a first time point, selecting a
second storage slice in a second storage disk as a backup storage
slice for a first storage slice in the first storage disk, the
first storage slice being one of a plurality of storage slices
forming a redundant array of independent disks (RAID), the
plurality of storage slices being located in different storage
disks; writing, between the first time point and a second time
point, data to be written into the first storage slice in the RAID
to the second storage slice, the second time point being later than
the first time point; and in response to the first storage disk
remaining offline until the second time point, replacing the first
storage slice in the RAID with the second storage slice.
10. The electronic device of claim 9, wherein the at least one
memory and the computer program instructions are configured, with
the at least one processor, to cause the electronic device to:
rebuilding data in the first storage slice into the second storage
slice, using data in other storage slices in the RAID.
11. The electronic device of claim 10, wherein the at least one
memory and the computer program instructions are configured, with
the at least one processor, to cause the electronic device to:
determining a stripe in the RAID into which data is written before
the first storage disk becoming offline; for the determined stripe,
reading, from other storage slices in the RAID, data associated
with the stripe; calculating, based on the read data, data stored
in the first storage slice associated with the stripe; and writing
the calculated data into the second storage slice.
12. The electronic device of claim 9, wherein the at least one
memory and the computer program instructions are configured, with
the at least one processor, to cause the electronic device to: in
response to the first storage disk restoring online between the
first time point and the second time point, copying data in the
second storage slice into the first storage slice.
13. The electronic device of claim 12, wherein the at least one
memory and the computer program instructions are configured, with
the at least one processor, to cause the electronic device to:
recording a stripe in the RAID into with data is written between
the first time point and the second time point, to perform the
copying.
14. The electronic device of claim 12, wherein the at least one
memory and the computer program instructions are configured, with
the at least one processor, to cause the electronic device to:
determining a stripe in the RAID into which data is written between
the first time point and the second time point; and copying data in
the second storage slice associated with the stripe into the first
storage slice.
15. The electronic device of claim 9, wherein the at least one
memory and the computer program instructions are configured, with
the at least one processor, to cause the electronic device to:
selecting the second storage disk from a plurality of storage
disks, such that the plurality of storage disks are evenly used to
form a plurality of RAIDs; and selecting a free storage slice in
the second storage disk as the second storage slice.
16. The electronic device of claim 9, wherein the at least one
memory and the computer program instructions are configured, with
the at least one processor, to cause the electronic device to:
suspending a write operation to the RAID during a period from the
first storage slice becoming offline to the first time point.
17. A computer program product having a non-transitory computer
readable medium which stores a set of instructions to perform
storage management; the set of instructions, when carried out by
computerized circuitry, causing the computerized circuitry to
perform a method of: in response to a first storage disk becoming
offline and remaining offline until a first time point, selecting a
second storage slice in a second storage disk as a backup storage
slice for a first storage slice in the first storage disk, the
first storage slice being one of a plurality of storage slices
forming a redundant array of independent disks (RAID), the
plurality of storage slices being located in different storage
disks; writing, between the first time point and a second time
point, data to be written into the first storage slice in the RAID
to the second storage slice, the second time point being later than
the first time point; and in response to the first storage disk
remaining offline until the second time point, replacing the first
storage slice in the RAID with the second storage slice.
Description
CROSS REFERENCE TO RELATED APPLICATION
This application claims priority to Chinese Patent Application No.
CN201910357279.1, on file at the China National Intellectual
Property Administration (CNIPA), having a filing date of Apr. 29,
2019, and having "HANDLING OF OFFLINE STORAGE DISK" as a title, the
contents and teachings of which are herein incorporated by
reference in their entirety.
FIELD
Embodiments of the present disclosure generally relate to a
computer system or a storage system, and more particularly, to a
storage management method, an electronic device and a computer
program product.
In a conventional redundant array of independent disks (RAID)-based
storage system, if a storage disk (drive) or a storage slice (drive
slice) in a RAID is removed or failed, the storage system will send
an event notification to backend, making the backend update its
storage disk information. After that, backend can send event
notification to RAID module, to indicate that the storage disk or
the storage slice is offline. Next, RAID module will update its
storage disk information or storage slice information, wait for a
predetermined duration (e.g., 5 minutes), then trigger a backup
operation wherein the offline storage disk or storage slice is
replaced by a backup storage disk or storage slice, and perform
data rebuilding in the backup storage disk or storage slice.
However, the above mentioned backup operation and rebuild operation
in the conventional RAID-based storage system have various
shortcomings and deficiencies, which cannot meet the performance
requirements of storage systems in many scenarios, resulting in a
poor user experience.
SUMMARY
Embodiments of the present disclosure relate to a storage
management method, an electronic device and a computer program
product.
In the first aspect of the present disclosure, a storage management
method is provided. The method includes: in response to a first
storage disk becoming offline and remaining offline until a first
time point, selecting a second storage slice in a second storage
disk as a backup storage slice for a first storage slice in the
first storage disk, the first storage slice being one of a
plurality of storage slices forming a redundant array of
independent disks (RAID), the plurality of storage slices being
located in different storage disks. The method further includes:
writing, between the first time point and a second time point, data
to be written into the first storage slice in the RAID to the
second storage slice, the second time point being later than the
first time point. The method further comprises: in response to the
first storage disk remaining offline until the second time point,
replacing the first storage slice in the RAID with the second
storage slice.
In the second aspect of the present disclosure, an electronic
device is provided. The electronic device includes: at least one
processor; and at least one memory storing computer program
instructions, the at least one memory and the computer program
instructions are configured, with the at least one processor, to
cause the electronic device to: in response to a first storage disk
becoming offline and remaining offline until a first time point,
selecting a second storage slice in a second storage disk as a
backup storage slice for a first storage slice in the first storage
disk, the first storage slice being one of a plurality of storage
slices forming a redundant array of independent disks (RAID), the
plurality of storage slices being located in different storage
disks. The at least one memory and the computer program
instructions are further configured to, with the at least one
processor, cause the electronic device to: writing, between the
first time point and a second time point, data to be written into
the first storage slice in the RAID to the second storage slice,
the second time point being later than the first time point. The at
least one memory and the computer program instructions are further
configured to, with the at least one processor, cause the
electronic device to: in response to the first storage disk
remaining offline until the second time point, replacing the first
storage slice in the RAID with the second storage slice.
In the third aspect of the present disclosure, a computer program
product is provided. The computer program product is tangibly
stored on a non-volatile computer-readable medium and including
machine-executable instructions, the machine-executable
instructions, when executed, cause the machine to perform steps of
the method according to the first aspect.
It will be appreciated that the contents described in the Summary
part is not intended to limit the key or important features of
embodiments of the present disclosure, as well as not used to limit
the scope of the present disclosure. Other features of the present
disclosure will be readily appreciated by the following
description.
BRIEF DESCRIPTION OF THE DRAWINGS
Through the reading of the following detailed description with
reference to the accompanying drawings, the above and other
objectives, features, and advantages of embodiments of the present
disclosure will become more apparent. Several embodiments of the
present disclosure will be illustrated by way of example but not
limitation in the drawings in which:
FIG. 1 is a diagram illustrating an example storage system in which
embodiments of the present disclosure may be implemented;
FIG. 2 is a schematic flow chart illustrating a storage management
method according to an embodiment of the present disclosure.
FIG. 3 is a schematic diagram illustrating that a plurality of
storage slices from a plurality of storage disks form a RAID in
accordance with an embodiment of the present disclosure.
FIG. 4 is a schematic diagram illustrating that selecting a second
storage slice as a backup storage slice for a first storage slice
after the first storage disk remains offline until a first time
point, according to an embodiment of the present disclosure.
FIG. 5 is a schematic diagram illustrating writing data to be
written to a first storage slice into a backup storage slice, in
accordance with an embodiment of the present disclosure.
FIG. 6 is a schematic diagram illustrating rebuilding data in a
first storage slice to a backup storage slice after the first
storage disk remains offline until a second time point, in
accordance with an embodiment of the present disclosure.
FIG. 7 is a schematic diagram illustrating copying data from a
backup storage slice to a first storage slice when a first storage
disk restores online according to an embodiment of the present
disclosure.
FIG. 8 is a schematic block diagram illustrating a device that can
be used to implement embodiments of the present disclosure.
Throughout the drawings, the same or similar reference numerals are
used to refer to the same or similar components.
DETAILED DESCRIPTION
The individual features of the various embodiments, examples, and
implementations disclosed within this document can be combined in
any desired manner that makes technological sense. Furthermore, the
individual features are hereby combined in this manner to form all
possible combinations, permutations and variants except to the
extent that such combinations, permutations and/or variants have
been explicitly excluded or are impractical. Support for such
combinations, permutations and variants is considered to exist
within this document.
It should be understood that the specialized circuitry that
performs one or more of the various operations disclosed herein may
be formed by one or more processors operating in accordance with
specialized instructions persistently stored in memory. Such
components may be arranged in a variety of ways such as tightly
coupled with each other (e.g., where the components electronically
communicate over a computer bus), distributed among different
locations (e.g., where the components electronically communicate
over a computer network), combinations thereof, and so on.
Principles and sprits of the present disclosure will now be
described with reference to various example embodiments illustrated
in the drawings. It should be appreciated that those embodiments
are described merely for the purpose of better understanding and
further implementing the present disclosure for those skilled in
the art and not intended for limiting the scope of the present
disclosure in any manner.
FIG. 1 is a diagram illustrating an example storage system 100 in
which embodiments of the present disclosure may be implemented. As
shown in FIG. 1, the storage system 100 can include a processor 110
and a storage structure 120, wherein the processor 110 is used for
performing operations related to the storage system 100, such as
input/output (I/O) operations, control operations, management
operations, and the like. More generally, the processor 110 can
perform any operation related to the storage system 100, with
cooperation with necessary hardware and software. In addition, the
storage structure 120 is used to organize and manage storage
resources of the storage system 100, such as various physical
storage disks and the like, in an appropriate manner.
As shown in FIG. 1, the storage structure 120 includes a storage
slice pool 130 that can be used to manage all of the storage disks
in storage structure 120. The storage disks in the storage slice
pool 130 will be organized into a plurality of RAID Recovery Sets
(RRS) 132, RRS 134, RRS 136, RRS 138, and the like. Each RRS is a
failure domain, meaning that if one storage disk in one RRS (e.g.,
RRS 132) fails, this failure will not impact the recovery of other
RRSs. In a typical storage system, one RRS could include up to 25
storage disks. However, embodiments of the present disclosure are
not limited to the specific numerical value described above, and in
other embodiments, RRS may include any suitable number of storage
disks.
In the storage slice pool 130, each storage disk can be divided
into storage slices with fixed size. The size of each storage slice
may be set as 4 gigabytes (GB). However, embodiments of the present
disclosure are not limited to the specific numerical value
described above, and in other embodiments, the size of the storage
slices in the storage disk may be set as any suitable size.
Therefore, the storage slice pool 130 can be considered as being
constituted of sets of storage slices, and as such, it is referred
to as storage slice pool.
In the storage slice pool 130, a plurality of storage slices from
different storage disks may form a small RAID (compared to a large
RAID constituted of a plurality of physical storage disks). The
RAID may be of any suitable type. For example, if the RAID type is
4+1 RAID-5 to create one RAID group, the processor 110 can allocate
5 free storage slices from different storage disks and combine the
5 storage slices into a small RAID. In some cases, it can be
required that all storage slices within one RAID come from a same
RRS. In addition, each RAID may include a plurality of RAID
stripes. The size of each RAID stripe may be 2 megabytes (MB),
which can also be referred to as a physical large block (PLB). It
will be appreciated that embodiments of the present disclosure are
not limited to the specific numerical value described above, and in
other embodiments, the size of the RAID stripe can be set as any
suitable size.
Furthermore, as shown in FIG. 1, the storage slice pool 130 may
expose or provide some tiers used by other components, including
the first tier 140, the second tier 142, the third tier 144, . . .
, the Nth tier 146 (N is a natural number), and so on. Each tier
can be constituted of a plurality of RAID groups. For each tier,
different RAID policies can be applied based on the type of the
data stored thereon. Typically, all RAIDS within the same tier can
have the same RAID policy, meaning the same RAID width and RAID
type. Various tiers can be expanded on demands, in other words, the
processor 110 may dynamically allocate a new RAID and add it to
some tier.
The storage structure 120 may further include a mapper 150. The
mapper 150 is a core component in the storage structure 120 that
treats each tier as a flat linear physical address space. The
mapper 150, on the other hand, provides a single flat linear
logical address space to a namespace module 170. In some cases,
this logical address space can be up to 8 exabytes (EB). As an
example, the mapper 150 may use a B+ tree data structure to
maintain mapping between logical addresses and physical addresses
in the granularity of 4K page. It will be appreciated that
embodiments of the present disclosure are not limited to specific
numerical values and specific data structures described above. In
other embodiments, the size of the logical address space and the
granularity of the mapper may be set as any suitable value, and the
mapper 150 may employ other suitable data structures to maintain
mapping between logical addresses and physical addresses.
The storage structure 120 may further include a caching module 160.
The caching module 160 can provide caching function within a memory
(not shown). It may have two instances in the storage system 100,
one of which is for user data and the other is for metadata. The
caching module 160 can provide transactional operation function to
the mapper 150. It will be appreciated that embodiments of the
present disclosure are not limited to the specific examples
described above, and in other embodiments, the caching module 160
may have any other suitable number and use of instances. In
addition, the storage structure 120 may further include the
namespace module 170 mentioned above. As noted above, the namespace
module 170 can consume and manage the flat linear logical space
provided by the mapper 150 (e.g., 8 EB in size). On the other hand,
the namespace module 170 can create and provide a storage volume to
a host (not shown) of the storage system 110, for use by the
host.
In some embodiments, storage disks organized and managed by the
storage structure 120 may include various types of devices with
storage capabilities including, but not limited to, hard disk
(HDDs), solid state disks (SSDs), removable disk, compact disk
(CD), laser disk, optical disk, digital versatile disk (DVD),
floppy disk, Blu-ray disk, serial attached small computer system
interface (SCSI) storage disk (SAS), serial advanced technology
attached SATA storage disk, any other magnetic storage device, and
any other optical storage device, or any combination thereof.
Similarly, the processor 110 may include any device that implements
control functions including, but not limited to, a special purpose
computer, a general purpose computer, a general purpose processor,
a microprocessor, a microcontroller, or a state machine. The
processor 110 may also be implemented as an individual computing
device or combination of computing devices, e.g., a combination of
a DSP and a microprocessor, a plurality of microprocessors, one or
more microprocessors in conjunction with a DSP core, or any other
such configuration.
It should be appreciated that FIG. 1 only schematically illustrates
units, modules or components in the storage system 100 that are
related to embodiments of the present disclosure. In practice, the
storage system 100 may further include other units, modules, or
components for other functions. Thus, embodiments of the present
disclosure are not limited to the specific devices, units, modules
or components depicted in FIG. 1, but are generally applicable to
any RAID technology based storage system.
As noted above, backup operations and rebuild operations in the
conventional RAID storage system still have various shortcomings
and deficiencies, which cannot meet the performance requirements of
storage systems in many cases, resulting in a poor user experience.
For example, a conventional RAID storage system widely employs an
"incremental rebuild" method to optimize rebuild process. It means
that in the case that a storage disk or a storage slice in a RAID
becomes offline and restores online within short period of time
(e.g. 5 minutes), the "incremental rebuild" method only rebuilds
data that are newly written during a period from a storage disk or
a storage slice being offline to recovering online.
Specifically, if write operations need to be performed to a
degraded RAID (a RAID including an offline storage disk or storage
slice) within 5 minutes from a storage disk or a storage slice
becoming offline, a conventional way is to perform "degraded write"
operation algorithm to the degraded RAID, that is, data are only
written to remaining online storage disks or storage slices in the
degraded RAID and which stripes in the degraded RAID into which
data are written are recorded in metadata (e.g., Virtual Large
Block, VLB).
If an offline storage disk or storage slice restores online within
5 minutes, the RAID performs an "incremental rebuild" algorithm,
that is, only rebuild the stripes into which data are written after
a storage disk or a storage slice being offline. As described
previously, the information of which stripes are written is
recorded in metadata. In contrast, if an offline storage disk or
storage slice does not restore online within 5 minutes, the RAID
including the offline storage disk or storage slice will perform
backup operations, that is, replacing the offline storage disk or
storage slice in the RAID by a backup storage disk or storage slice
and performing full data rebuilding in a new storage disk or
storage slice.
The above-mentioned processing method of the conventional RAID for
the offline storage disk or storage slice has at least the
following problems and defects. First, components such as storage
systems, mappers, and RAID modules need to support the "degraded
write", where data written to stripes of a degraded RAID is easily
lost due to lack of redundancy protection. Second, in the
incremental rebuilding process, the mapper will read all of the
RAID strips, for example, in the case of 4+1 RAIDS type, RAID will
read data in the other 4 online storage disks or storage slices and
rebuild data in the offline storage disk or storage slice, and then
restore all of the stripes to the mapper.
Next, the mapper sends an indication of "fix write" operation to
RAID to cause the RAID write data stored in the degraded RAID,
while the RAID only writes restoring storage disks or storage
slices. Therefore, when rebuilding a RAID stripe, a mapper will
perform at least 4 read operations and 1 write operation and thus,
normal I/O operation of the storage system is affected because the
access bandwidth of the storage disk and processing bandwidth of
the processor are occupied. In addition, in the case where the
offline storage disk or storage slice does not restore online, RAID
needs to perform full data rebuilding, which also occupies the
access bandwidth of the storage disk and the processor and thereby
affecting the normal I/O operation of the storage system.
In view of the above problems and other potential problems in the
conventional scheme, embodiments of the present disclosure propose
a storage management method, an electronic device, and a computer
program product for improving processing for storage disks or
storage slices in a RAID storage system becoming offline. The basic
idea of embodiments of the present disclosure is that: when a
storage slice from a certain storage disk in the RAID becomes
offline and continues until a first time point (for example, a few
seconds), a backup storage slice in a backup storage disk is
selected to temporarily store data to be written to the offline
storage slice; if the offline storage disk restores online before a
second time point (e.g., a few minutes), it is only required to
copy the data in the backup storage slice into the offline storage
slice; if the offline storage disk does not restore online before
the second time point, a RAID backup process is performed in which
it is only required to rebuild existing data in the offline storage
slice into the backup storage slice.
Embodiments of the present disclosure may improve performance of a
RAID-based storage system in both cases that a storage slice
restores after being offline and a storage slice is permanently
offline. For example, in the case that a storage slice restores
after offline, embodiments of the present disclosure only needs to
copy data in the backup storage slice into the storage slice
offline but restored, that is, one read operation and one write
operation. Compared to four read operations and one write operation
in the conventional method (in the case of 4+1 RAIDS), this saves
bandwidths of processor and storage disk and reduces impact to
normal I/O performance of the storage system.
In addition, in the case that an offline storage slice is
permanently offline, a backup storage slice will replace the
offline storage slice in the RAID. For this purpose, it is only
required to rebuild data, written into the offline storage slice
before being offline, in the backup storage slice and thus,
avoiding full data rebuilding in the conventional method, thereby
accelerating the rebuilding process. Furthermore, embodiments of
the present disclosure will not affect the implementation of other
potential rebuilding optimization methods, including rebuild-based
valid data perception (wherein rebuild only the area containing
valid data in the offline storage slice if the storage slice is
offline and does not restore), and "thin rebuild" method and so on.
Some example embodiments of the present disclosure are described in
detail below in conjunction with FIGS. 2-7.
FIG. 2 is a flow chart illustrating a storage management method 200
according to an embodiment of the present disclosure. In some
embodiments, the method 200 can be implemented by a processor 110
or a processing unit of the storage system 100, or by various
functional modules of the storage system 100. In other embodiments,
the method 200 can also be implemented by a computing device
independent from the storage system 100, or can be implemented by
other units or modules in the storage system 100.
As described above with reference to FIG. 1, the storage disks in
the storage system 100 can be divided into storage slices with
fixed size, and the storage slices from different storage disks can
form a RAID. In other words, based on the large number of storage
slices provided in the storage slice pool 130, the storage system
100 can form a plurality of RAIDs. In some embodiments, the method
200 of FIG. 2 can be performed for an individual RAID constituted
of a plurality of storage slices. An example scenario in which a
plurality of storage slices from a plurality of different storage
disks form a RAID will be described below first with reference to
FIG. 3.
FIG. 3 is a schematic diagram illustrating that a plurality of
storage slices 315-355 from a plurality of storage disks 310-350
form a RAID 305 in accordance with an embodiment of the present
disclosure. As shown in FIG. 3, a storage disk 310 (also referred
to as a first storage disk hereinafter) is divided into a plurality
of storage slices including a storage slice 315 (also referred to
as a first storage slice hereinafter). Similarly, a storage disk
320 is divided into a plurality of storage slices including the
storage slice 325, a storage disk 330 is divided into a plurality
of storage slices including the storage slice 335, a storage disk
340 is divided into a plurality of storage slices including the
storage slice 345, a storage disk 350 is divided into a plurality
of storage slices including the storage slice 355, and a storage
disk 360 is divided into a plurality of storage slices including
the storage slice 365.
As shown, the storage slices 315, 325, 335, 345, and 355 form a 4+1
RAID 5 type RAID 305, in which the storage slice 315 stores data-1,
the storage slice 325 stores data-2, the storage slice 335 stores
data-3, the storage slice 345 stores data-4, and the storage slice
355 stores data-5. According to the property of RAIDS, any one of
data-1 to data-5 can be rebuilt from the other four data through
RAID algorithms. In other words, the first storage slice 315 in the
first storage disk 310 is one of a plurality of storage slices
forming the RAID 305, and the plurality of storage slices forming
the RAID 305 are located in different storage disks. Further,
although not shown, a certain number of free storage slices may be
reserved in each of the storage disks as backup storage slices, for
replacement when failure occurs to storage slices in the RAID. For
example, it is shown in FIG. 3 that a storage disk 360 (also
referred to as a second storage disk hereinafter) includes a backup
storage slice 365 (also referred to as a backup storage slice
hereinafter).
It should be appreciated that although FIG. 3 shows, by way of
example, a particular number of storage disks and each storage disk
includes a particular number of storage slices and the RAID 305
consists of a particular number of storage slices and has a
particular RAID type. This is intended to be illustrative only, and
is not intended to limit the scope of the present disclosure in any
way. In other embodiments, embodiments of the present disclosure
may be applicable to any number of storage disks or storage slices,
as well as to any type of RAID. For ease of discussion, the storage
management method 200 of FIG. 2 will be discussed below in
conjunction with FIG. 3.
Referring back to FIG. 2, at 205, the processor 110 of the storage
system 100 determines whether the first storage disk 310 becomes
offline and remains offline until a first time point. The fact that
the first storage disk 310 becomes offline means that all of the
storage slices (including the storage slice 315) will become
offline. That is, the components of RAID 305, storage slice 315,
will become offline and unavailable. In this case, the RAID 305
will not be able to perform normal RAID operations. In addition, it
is worth noting that the storage disk may experience a brief glitch
in practice, which behaves that the storage disk is temporarily
offline but restores quickly within a short period of time. In
order to avoid excessive remedial operations due to such glitch,
embodiments of the present disclosure set a first time point, which
may also be referred to as a storage disk offline overriding time
point. In some embodiments, the first time point may be set as
several seconds (e.g., 5 seconds) from the first storage disk 310
being offline. In other embodiments, the first time point may be
predetermined by a technician in accordance with a particular
technical environment and design requirements. Hereinafter, for
ease of description, a time point at which the first storage disk
310 becomes offline may be referred to as t0, and the
above-described first time point may be referred to as t1.
At 210, if the first storage disk 310 becomes offline and remains
offline until the first time point t1, the processor 110 selects
the second storage slice 365 in the second storage disk 360 as a
backup storage slice for the first storage slice 315 in the first
storage disk 310. Hereinafter, for ease of description, the second
storage slice 365 may also be referred to as a backup storage slice
365. The selected backup storage slice 365 is used to temporarily
store data to be written to the first storage slice 315 in the RAID
305 during the first storage slice 315 is offline. As used herein,
the meaning of "temporarily store" means that the backup storage
slice 365 cannot completely replace the functions of the first
storage slice 315 in the RAID 305, because the backup storage slice
365 does not have the data stored in the first storage slice 315
before the first storage disk 110 is offline. An example of
selecting the second storage slice 365 as the backup storage slice
of the first storage slice 315 will be described below with
reference to FIG. 4.
FIG. 4 is a schematic diagram illustrating that selecting the
second storage slice 365 as the backup storage slice for the first
storage slice 315 after the first storage disk 310 remains offline
until a first time point t1, according to an embodiment of the
present disclosure. In FIG. 4, state 405 represents the state of
RAID 305 when the first storage disk 310 is not offline (i.e.,
before t0), wherein the storage slices 315 to 355 store data-1 to
data-5, respectively. State 415 represents the state of RAID 305
when the first storage disk 310 is offline at time t0. As shown at
407, the first storage slice 315 becomes offline at time t0, making
data-1 unavailable. State 425 represents that the first storage
slice 315 has not restored at the first time point t1, and the
processor 110 may select the second storage slice 365 as the backup
storage slice for the first storage slice 315.
As shown, in some embodiments, information regarding that the
second storage slice 365 acts as the backup storage slice for the
first storage slice 315 can be recorded in the metadata 410. That
is, the metadata 410 is used to record which storage slice is
selected as the backup storage slice for the first storage slice
315. As shown at 417, in the particular example of FIG. 4, the
metadata 410 points to the second storage slice 365 acting as a
backup storage slice. In some embodiments, the metadata 410 may
also be referred to as RAID geometry metadata, which may reuse
existing metadata in the current RAID storage system, for further
use to record information of the backup storage slice 365. As an
example, this information can be recorded in a "copy position"
information element of the RAID geometry metadata. By reusing
existing metadata, it can be avoided to introduce additional
metadata into the storage disk or storage slice. Furthermore, it is
noted that the copy operation from the backup storage slice 365 to
the first storage slice 315, discussed in detail below, may have
lower priority than temporary backup operations and the permanent
backup operations and thus, it is safe to reuse the existing RAID
geometry metadata. However, it will be appreciated that in other
embodiments, the metadata 410 may also be a metadata that is newly
set in the storage system 100.
In some embodiments, selection of a backup storage slice may be
similar to the selection of a backup storage slice in the RAID
system for rebuilding. For example, the processor 110 may select
the second storage disk 360 from a plurality of storage slices,
such that the plurality of storage slices are evenly used to form a
plurality of RAIDs. For example, in a plurality of storage disks of
the storage system 100, it is assumed that the storage slices in
the second storage disk 360 are currently least likely in the same
RAID with the storage slices in the storage disks 320 to 350. In
this case, in order to make the respective RAIDs distributed as
evenly as possible in all of the storage disks, the processor 110
may determine that a storage slice in the second storage disk 360
is selected as the backup storage slice for the first storage slice
315 of the RAID 305 to facilitate replacing the first storage slice
315 in the RAID 305 after the first storage slice 315 is
permanently offline. After the determination of selecting the
second storage disk 360, the processor 110 may further select a
free storage slice of the second storage disk 360 as the second
storage slice 365.
Referring back to FIG. 2, at 215, between the first time point t1
and a second time point (also referred to as t2 hereinafter, for
ease of description), the processor 110 writes data to be written
to the first storage slice 315 in the RAID 305 into the second
storage slice 365. In other words, as the backup storage slice for
the first storage slice 315, the second storage slice 365 will be
used to temporarily store data to be originally written to the
first storage slice 315, thereby avoiding "degraded write" in the
conventional scheme. This process will be described below with
reference to FIG. 5. In addition, the second time point t2 here can
be considered as a time point after which the storage disk 310 will
no longer be possible to return online again. Therefore, the second
time point t2 can also be considered as a time point at which the
use of the backup storage slice 365 is triggered to completely
replace the offline first storage slice 310.
FIG. 5 is a schematic diagram illustrating writing data to be
written to the first storage slice 315 into the backup storage
slice 365, in accordance with an embodiment of the present
disclosure. As shown in FIG. 5, it is assumed that between the
first time point t1 and the second time point t2, the processor 110
receives an I/O request to write new data 510 to the RAID 305, due
to the presence of the backup storage slice 365, the processor 110
can write, as normal, new data-2 to new data-5 associated with data
510 into the storage slices 325 to 355 in accordance with a RAID
algorithm. Hereinafter, for ease of description, data stored in the
RAID 305 associated with the data 510 is referred to as new data
and data stored in the RAID 305 before the first storage slice 310
being offline is referred to as old data.
Specifically, the processor 110 may write 512 new data-2 to the
storage slice 325, write 514 new data-3 to the storage slice 335,
write 516 new data-4 to the storage slice 345, and write 518 new
data-5 to the storage slice 525. Unlike the case when the first
storage slice 315 is online, since the first storage slice 315
cannot be written at this time and there is a backup storage slice
365, the new data-1 to be written to the storage slice 315 will be
written 520 to the backup storage slice 365. Therefore, after the
new data 510 is written, the old data-2 and the new data-2 are
stored in the storage slice 325, the old data-3 and the new data-3
are stored in the storage slice 335, the old data-4 and new data-4
are stored in the storage slice 345, the old data-5 and the new
data-5 are stored in the storage slice 355, and the new data-1 is
stored in the second storage slice 365. In this way, the new data
510 can be written using a secure RAID algorithm, thereby avoiding
the risk of data lost caused by the use of "degraded write" in
conventional schemes.
Referring back to FIG. 2, at 220, the processor 110 determines if
the first storage disk 310 remains offline until time point t2. As
mentioned above, the second time point t2 can be considered as the
time point at which the use of the backup storage slice 365 is
triggered to completely replace the offline first storage slice
310. In other words, at 220, the processor 110 can determine if the
first storage disk 310 needs to be replaced. In some embodiments,
based on practical experience regarding the storage system, the
second time point t2 can be set as several minutes (e.g., 5
minutes). In other embodiments, the second time point t2 can be
predetermined by the technician according to the specific technical
environment and design requirements.
At 225, in the case that the first storage disk 310 remaining
offline until the second time point t2, the processor 110 replaces
the first storage slice 315 in the RAID 305 with the second storage
slice 365. This means that the second storage slice 365 will
permanently replace the first storage slice 315 as part of the RAID
305. However, as described above, only the new data-1 associated
with the new data 510 is stored in the second storage slice 365,
and the old data-1 already stored in the first storage slice 315
before the storage slice being offline is not stored. Due to the
nature of the redundant storage of the RAID 305, the old data-1 in
the first storage slice 315 can be rebuilt from the old data-2 to
the old data-5 in the other storage slices 325 to 355. In other
words, the processor 110 can rebuild data in the first storage
slice 315 into the second storage slice 365 using data of other
storage slices in the RAID 305. This rebuilding process will be
described below in conjunction with FIG. 6.
FIG. 6 is a schematic diagram illustrating rebuilding data in the
first storage slice 310 to a backup storage slice 365 after the
first storage disk 310 remains offline until a second time point
t2, in accordance with an embodiment of the present disclosure.
Specifically, as shown in FIG. 6, the processor 110 may copy 612
the old data-2 in the storage slice 325, copy 614 the old data-3 in
the storage slice 335, copy 616 the old data-4 in the storage slice
345 and copy 618 the old data-5 in the storage slice 355. Next, the
processor 110 may restore the old data-1 stored in the first
storage slice 315 from the old data-2 to the old data-5 using a
RAID rebuilding algorithm. Then, the processor 110 may store 620
the rebuilt old data-1 into the second storage slice 365. In this
manner, the second storage slice 365 will have data previously
stored in the first storage slice 315, and thus the first storage
slice 315 in the RAID 305 may be completely replaced.
In some embodiments, the rebuilding of the old data-1 can be
performed by strips. For example, the processor 110 may determine a
stripe in the RAID 305 into which data is written before first
storage slice 310 becomes offline. As an example, this information
can be recorded in the metadata 410 mentioned above. For the
determined stripe, the processor 110 may read data associated with
the determined stripe from other storage slices 325 to 355 in the
RAID 305. Based on the read data, the processor 110 can calculate
data stored in the first storage slice 315 associated with the
determined strips. As an example, for a RAID 5 type, the processor
110 may derive the data stored in the first storage slice 315 by
calculating an XOR value of the data. The processor 110 can then
write the calculated data into second storage slice 365. In the
case that a plurality of stripes have been written, the processor
110 may perform the above-described rebuilding process for each of
the stripes written. In this way, the old data-1 previously stored
in the first storage slice 315 can be rebuilt into the second
storage slice 365 more in an effective manner. In some embodiments,
after the completion of the above-described rebuilding process, the
processor 110 may clear the relevant information in the metadata
410 and re-mark the RAID 305 as normal operation. Moreover, the
locking processing for the metadata 410 may be required to be
performed in a usual manner.
On the other hand, at 220, if the first storage disk 310 restores
online between the first time point t1 and the second time point
t2, the processor 110 may copy the data in the second storage slice
365 to the first storage slice 310. As shown in the foregoing, the
copy involves only one read operation and one write operation,
thereby avoiding the "incremental rebuild" in the conventional
scheme that requires four read operations and one write operation,
which greatly saves the bandwidths of processors and storage slices
and improves performance of the storage system. This copy process
will be described below in conjunction with FIG. 7.
FIG. 7 is a schematic diagram illustrating copying data from a
backup storage slice 365 to a first storage slice 315 when a first
storage disk 310 restores online, according to an embodiment of the
present disclosure. As shown in FIG. 7, at the time point when the
first storage disk 310 restores online, the first storage slice 315
has the old data-1 stored before the first storage disk 310 is
offline, the storage slice 325 has the old data-2 stored before
first storage disk 310 is offline and the new data-2 stored after
the first storage disk 310 is offline, the storage slice 335 has
the old data-3 stored before the first storage disk 310 is offline
and the new data-3 stored after the first storage disk 310 is
offline, the storage slice 345 has the old data-4 stored before the
first storage disk 310 is offline and the new data-4 stored after
the first storage disk 310 is offline, the storage slice 355 has
old data-5 stored before the first storage disk 310 is offline and
the new data-5 stored after the first storage disk 310 is offline,
and the second storage slice 365 has the new data-1 stored after
the first storage disk 310 is offline.
It can be seen that, at the time point when the first storage disk
310 restores online, the new data-1 after the first storage disk
310 is offline is absent in the first storage slice 315.
Accordingly, in order that the first storage slice 315 continues to
form the RAID 305 with the storage slices 325 to 355, the processor
110 may copy 710 the new data-1 in the second storage slice 365
into the first storage slice 315. In this way, the first storage
slice 315 restored online has not only the old data-1 but also the
new data-1 through only one copy operation, so that it can become a
part of the RAID 305 again. As noted above, this effectively avoids
the complicated "incremental rebuild" operation performed by
conventional RAID storage systems when offline storage slices
restore online.
In some embodiments, the copy of the new data-1 can also be
performed by strips. For example, the processor 110 may determine a
stripe in the RAID 305 into which data is written between a first
time point t1 and a second time point t2. The processor 110 may
then copy the data in second storage slice 365 associated with the
determined stripe into first storage slice 315. In the case that
there are a plurality of stripes into which new data are written,
the above-described copy operation by stripes may be sequentially
performed for a plurality of stripes. In some embodiments, after
the completion of the copy operation described above, the processor
110 may clear the relevant information in metadata 410 and re-mark
RAID 305 as normal operation. In addition, in order to perform more
conveniently the copying of the new data-1, the processor 110 may
record a stripe in the RAID 305 into which data is written between
the first time point t1 and the second time point t2. For example,
it is recorded in the metadata 410.
Further, in some embodiments, during a period from t0 when the
first storage disk 310 becomes offline to a first time point t1,
the processor 110 may suspend a write operation to the RAID 305.
That is, the write I/O operations generated for RAID 305 during
this period can be safely written by the normal RAID algorithm
after the first time point t1, that is, after the backup storage
slice 365 is ready. In this way, the safe write of the data to be
written can be achieved at the expense of a short delay (several
seconds). This is advantageous because a few seconds of I/O delay
does not cause a significant delayed feeling to users in many
technical scenarios. On the other hand, since the RAID 305 lacking
the first storage slice 315 can still derive the stored data from
other storage slices, the processor 110 may not suspend read
operation from the RAID 305.
FIG. 8 is a block diagram illustrating a device 800 that can be
used to implement embodiments of the present disclosure. As shown
in FIG. 8, the device 800 includes a central processing unit (CPU)
801 that may perform various appropriate actions and processing
based on computer program instructions stored in a read-only memory
(ROM) 802 or computer program instructions loaded from a storage
section 808 to a random access memory (RAM) 803. In the RAM 803,
there further store various programs and data needed for operations
of the device 800. The CPU 801, ROM 802 and RAM 803 are connected
to each other via a bus 804. An input/output (I/O) interface 805 is
also connected to the bus 804.
The following components in the device 800 are connected to the I/O
interface 805: an input 806 such as a keyboard, a mouse and the
like; an output unit 807 including various kinds of displays and a
loudspeaker, etc.; a memory unit 808 including a magnetic disk, an
optical disk, and etc.; a communication unit 809 including a
network card, a modem, and a wireless communication transceiver,
etc. The communication unit 809 allows the device 800 to exchange
information/ data with other devices through a computer network
such as the Internet and/or various kinds of telecommunications
networks.
Various processes and processing described above, e.g., the method
200, may be executed by the processing unit 801. For example, in
some embodiments, the method 200 may be implemented as a computer
software program that is tangibly embodied on a machine readable
medium, e.g., the storage unit 808. In some embodiments, part or
all of the computer programs may be loaded and/or mounted onto the
device 800 via ROM 802 and/or communication unit 809. When the
computer program is loaded to the RAM 803 and executed by the CPU
801, one or more steps of the method 200 as described above may be
executed.
As used herein, the term "includes" and its variants are to be read
as open-ended terms that mean "includes, but is not limited to."
The term "based on" is to be read as "based at least in part on."
The term "one example embodiment" and "an example embodiment" are
to be read as "at least one example embodiment." The terms "first",
"second" and the like may refer to different or identical objects.
This article may also include other explicit and implicit
definitions.
As used herein, the term "determining" encompasses a wide variety
of actions. For example, "determining" can include calculating,
computing, processing, deriving, investigating, looking up (e.g.,
looking up in a table, database, or another data structure),
ascertaining, and the like. Further, "determining" can include
receiving (e.g., receiving information), accessing (e.g., accessing
data in memory), and the like. Further, "determining" may include
parsing, selecting, choosing, establishing, and the like.
It will be noted that the embodiments of the present disclosure can
be implemented in software, hardware, or a combination thereof. The
hardware part can be implemented by a special logic; the software
part can be stored in a memory and executed by a suitable
instruction execution system such as a microprocessor or a special
purpose design hardware. Ordinary skilled in the art may understand
that the above method and system may be implemented with computer
executable instructions and/or in processor-controlled code, for
example, such code is provided on a carrier medium such as an
optical or electronic signal bearer.
Further, although operations of the present methods are described
in a particular order in the drawings, it does not require or imply
that these operations are necessarily performed according to this
particular order, or a desired outcome can only be achieved by
performing all operations shown. On the contrary, the execution
order for the steps as depicted in the flowcharts may be varied.
Alternatively, or in addition, some steps may be omitted, a
plurality of steps may be merged into one step, and/or a step may
be divided into a plurality of steps for execution. In practice,
according to the embodiments of the present invention, the features
and functions of two or more units described above may be embodied
in one unit. In turn, the features and functions of one unit
described above may be further embodied in more units.
Although the present disclosure has been described with reference
to various embodiments, it should be appreciated that the present
disclosure is not limited to the disclosed embodiments. The present
disclosure is intended to cover various modifications and
equivalent arrangements included in the spirit and scope of the
techniques disclosed herein.
* * * * *