U.S. patent application number 11/924782 was filed with the patent office on 2008-02-21 for method, apparatus and computer program product providing storage network dynamic tuning of i/o flow with queue depth.
This patent application is currently assigned to International Business Machines Corporation. Invention is credited to Lucy Ya Mei Kung, Limei Shaw, Che Lui SHUM, Rong Zeng.
Application Number | 20080046609 11/924782 |
Document ID | / |
Family ID | 37737836 |
Filed Date | 2008-02-21 |
United States Patent
Application |
20080046609 |
Kind Code |
A1 |
SHUM; Che Lui ; et
al. |
February 21, 2008 |
Method, Apparatus and Computer Program Product Providing Storage
Network Dynamic Tuning of I/O Flow with Queue Depth
Abstract
In accordance with a computer program product, apparatus and a
method there is provided a redundant network wherein a host
computer operates with a plurality of storage devices by monitoring
conditions of the multipath storage network and controlling a
storage multipath device driver in conjunction with an associated
storage multipath device input/output (I/O) pending queue to
increase I/O throughput to a storage device driver, such as a disk
device driver, when I/O demand increases, and to decrease I/O
throughput to the storage device driver in the event of an I/O
error condition.
Inventors: |
SHUM; Che Lui; (San Jose,
CA) ; Shaw; Limei; (San Jose, CA) ; Kung; Lucy
Ya Mei; (Milpitas, CA) ; Zeng; Rong; (San
Jose, CA) |
Correspondence
Address: |
HARRINGTON & SMITH, PC
4 RESEARCH DRIVE
SHELTON
CT
06484-6212
US
|
Assignee: |
International Business Machines
Corporation
|
Family ID: |
37737836 |
Appl. No.: |
11/924782 |
Filed: |
October 26, 2007 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11206938 |
Aug 17, 2005 |
|
|
|
11924782 |
Oct 26, 2007 |
|
|
|
Current U.S.
Class: |
710/33 |
Current CPC
Class: |
H04L 69/40 20130101;
H04L 67/1097 20130101; H04L 43/0852 20130101; H04L 41/5003
20130101; H04L 69/14 20130101; H04L 43/00 20130101 |
Class at
Publication: |
710/033 |
International
Class: |
G06F 13/38 20060101
G06F013/38 |
Claims
1. A computer program product comprising a computer useable medium
including a computer readable program, wherein the computer
readable program when executed on the computer causes the computer
to operate with at least one storage device in a redundant storage
network by operations comprising: monitoring operation of the
redundant storage network; and controlling a storage multipath
device driver in conjunction with an associated storage multipath
device input/output (I/O) pending queue to increase I/O throughput
to a storage device driver when I/O demand increases, and to
decrease I/O throughput to the storage device driver in the event
of an I/O error condition.
2. The computer program product as in claim 1 where, during a
normal I/O operating condition, controlling the storage multipath
device driver comprises calculating an amount of I/O sent to the
storage device driver using: total amount of I/O sent to storage
device driver=queue depth.times.global factor for queue
depth.times.total number of functioning paths, where global factor
for queue depth=1.
3. The computer program product as in claim 2 where, during a
stress I/O operating condition, controlling the storage multipath
device driver comprises monitoring a length of the I/O pending
queue and adjusting the global factor for queue depth value
accordingly to allow more I/O to be sent to the storage device
driver.
4. The computer program product as in claim 2 where, during an
error condition, controlling the storage multipath device driver
comprises monitoring a number of functioning paths and adjusting an
individual factor for queue depth correspondingly to reduce the
amount of I/O sent to the storage device driver, where if a
percentage of functioning paths is reduced to less than 100%, the
storage multipath device driver switches from using the global
factor for queue depth value to an individual factor of a storage
device for controlling I/O pending queue depth.
5. The computer program product as in claim 1, where the at least
one storage device comprises a disk storage device.
6. A system comprising a redundant storage network that includes a
host coupled via a plurality of paths to at least one storage
device, said host comprising a storage multipath device driver
coupled with a storage multipath device input/output (I/O) pending
queue that is coupled to a storage device driver comprised of a
storage device driver I/O queue, said storage multipath device
driver operable for monitoring conditions of the redundant storage
network to increase I/O throughput to the storage device driver
when I/O demand increases, and to decrease I/O throughput to the
storage device driver in the event of an I/O error condition.
7. The system as in claim 6 where, during a normal I/O operating
condition, the storage multipath device driver calculates an amount
of I/O sent to the storage device driver using: total amount of I/O
sent to storage device driver=queue depth.times.global factor for
queue depth.times.total number of functioning paths, where global
factor for queue depth=1.
8. The system as in claim 7 where, during a stress I/O operating
condition, the storage multipath device driver monitors a length of
the I/O pending queue and adjusts the global factor for queue depth
value accordingly to allow more I/O to be sent to the storage
device driver.
9. The system as in claim 7 where, during an error condition, the
storage multipath device driver monitors the number of functioning
paths and adjusts an individual factor for queue depth
correspondingly to reduce the amount of I/O sent to the storage
device driver, where if a percentage of functioning paths is
reduced to less than 100%, the storage multipath device driver
switches from using the global factor for queue depth value to an
individual factor of a storage device for controlling I/O pending
queue depth.
10. The system as in claim 6, where the at least one storage device
comprises a disk storage device.
11. A host comprising a multipath interface for coupling via a
plurality of paths to at least one storage device, said host
comprising a storage multipath device driver comprising a storage
multipath device input/output (I/O) pending queue, said storage
multipath device driver coupled to a storage device driver that
comprises a storage device driver I/O queue, said storage multipath
device driver operable to increase I/O throughput to the storage
device driver when I/O demand increases, and to decrease I/O
throughput to the storage device driver in the event of an I/O
error condition.
12. The host as in claim 11 where, during a normal I/O operating
condition, the storage multipath device driver calculates an amount
of I/O sent to the storage device driver using: total amount of I/O
sent to storage device driver=queue depth.times.global factor for
queue depth.times.total number of functioning paths, where global
factor for queue depth=1.
13. The host as in claim 12 where, during a stress I/O operating
condition, the storage multipath device driver monitors a length of
the I/O pending queue and adjusts the global factor for queue depth
value accordingly to allow more I/O to be sent to the storage
device driver.
14. The host as in claim 12 where, during an error condition, the
storage multipath device driver monitors the number of functioning
paths and adjusts an individual factor for queue depth
correspondingly to reduce the amount of I/O sent to the storage
device driver, where if a percentage of functioning paths is
reduced to less than 100%, the storage multipath device driver
switches from using the global factor for queue depth value to an
individual factor of a storage device for controlling I/O pending
queue depth.
15. The host as in claim 11, where the at least one storage device
comprises a disk storage device.
16. A method to operate a host with at least one storage device in
a redundant storage network, comprising: monitoring operation of
the redundant storage network; and operating a storage multipath
device driver in conjunction with an associated storage multipath
device input/output (I/O) pending queue to increase I/O throughput
to a storage device driver when I/O demand increases, and to
decrease I/O throughput to the storage device driver in the event
of an I/O error condition.
17. The method as in claim 16 where, during a normal I/O operating
condition, operating the storage multipath device driver comprises
calculating an amount of I/O sent to the storage device driver
using: total amount of I/O sent to storage device driver=queue
depth.times.global factor for queue depth.times.total number of
functioning paths, where global factor for queue depth=1.
18. The method as in claim 17 where, during a stress I/O operating
condition, operating the storage multipath device driver comprises
monitoring a length of the I/O pending queue and adjusting the
global factor for queue depth value accordingly to allow more I/O
to be sent to the storage device driver.
19. The method as in claim 17 where, during an error condition,
operating the storage multipath device driver comprises monitoring
the number of functioning paths and adjusting an individual factor
for queue depth correspondingly to reduce the amount of I/O sent to
the storage device driver, where if a percentage of functioning
paths is reduced to less than 100%, the storage multipath device
driver switches from using the global factor for queue depth value
to an individual factor of a storage device for controlling I/O
pending queue depth.
20. The method as in claim 16, where the at least one storage
device comprises a disk storage device.
Description
TECHNICAL FIELD
[0001] These teachings relate generally to data storage networks,
systems and methods and, more specifically, relate to data queue
management apparatus and methods that are useful in a storage area
network (SAN) type architecture.
BACKGROUND
[0002] It is common practice for a data storage device, such as a
disk-based data storage device, to have a recommended value on the
number of requests that it can handle under good conditions and
under error conditions. This value is typically referred to as a
Queue Depth, and can be used by a disk device driver to control the
input/output (I/O) flow to the storage device.
[0003] In a multipath configuration environment there are multiple
paths to provide I/O with the disk device driver. Since the number
of I/O sent from the disk device driver to the storage device is
still limited by the Queue Depth value, there can be many jobs that
are queued at a disk device driver pending queue. This can cause a
problem during error recovery, as the disk device driver will
typically retry all the requests on the queue for some number of
times (e.g., five retries per queued request). Since the pending
queue could become much longer in a multipath configuration
environment, this can result in a significant performance
degradation during error recovery, or in worst case, the system
hanging resulting in an application timeout.
[0004] In order to address this problem, a storage multipath device
driver can implement Queue Depth control at its level to limit the
amount of I/O sent to the disk device driver. This process can aid
in solving the performance degradation problem at the level of the
disk device driver during an error recovery procedure.
[0005] However, a further problem can then be introduced during
normal (non-error) conditions with heavy or stress I/O at the
storage multipath device driver level, especially with certain
types of applications that flood very heavy I/O to a small number
of storage devices. Under this condition, a large number of jobs
can be enqueued at a pending queue of the storage multipath device
driver, which can result in severe performance degradation and/or a
system hanging event.
[0006] It can be appreciated that absent a Queue Depth limit at the
storage multipath device driver level, the disk device driver can
become a bottleneck in the error recovery situation. However, if
the storage multipath device driver uses Queue Depth to limit I/O
flow, then storage multipath device driver can become the
bottleneck during normal (non-error) condition with stress I/O.
[0007] In US 2004/0194095 A1, "Quality of Service Controller and
Method for a Data Storage System", Lumb et al. disclose that
requests for each of a plurality of storage system workloads are
prioritized. The requests are selectively forwarded to a storage
device queue according to their priorities so as to maintain the
device queue at a target queue depth. The target queue depth is
adjusted in response to a latency value for the requests, where the
latency value is computed based on a difference between an arrival
time and a completion time of the requests for each workload.
Prioritizing the requests can be accomplished by computing a target
deadline for a request based on a monitored arrival time of the
request and a target latency for its workload. To reduce latencies,
it is said that the target queue depth may be reduced when the
target latency for a workload is less than its computed latency
value, and to increase throughput the target queue depth may be
increased when the target latency for each workload is greater than
each computed latency value.
[0008] In U.S. Pat. No. 6,636,909 B1, "Adaptive Throttling for
Fiber Channel Disks", Kahn et al. disclose a method that sends a
write request to a disk and, in response to receiving a queue full
signal from the disk if the disk queue is full, sets a throttle
value. The method is said to seek to avoid triggering a queue full
status for a storage device by queueing commands that would
overload the storage device in a local software disk driver queue.
Since a predefined limit on command issuance is said to not be
feasible, initiator devices instead must be able to recognize
potential error producing situations and thereafter limit or
throttle the number of commands issued. Accordingly, a method
operates by sending a write request to a disk, receiving a queue
full signal from the disk if the disk queue is full, and responsive
to receiving the queue full signal setting a throttle value and
thereafter dynamically adjusting the throttle value to maintain the
storage device in a steady state.
[0009] In U.S. Pat. No. 6,170,042 B1, "Disc Drive Data Storage
System and Method for Dynamically Scheduling Queued Commands",
Gaertner et al. disclose a data storage system and method of
scheduling commands in which commands are stored in a command sort
queue and a scheduled command queue. Commands in the command sort
queue are sorted and assigned a priority. Eventually, commands in
the command sort queue are transferred to the scheduled command
queue, where commands in the scheduled command queue are executed
without further sorting. The desired queue depth or size of the
scheduled command queue is determined as a function of both the
queue depth of the command sort queue and a command execution rate
value indicative of the rate at which commands in the scheduled
command queue are executed. The desired queue depth can be
dynamically determined using the queue depth of the command sort
queue and the command execution rate value as inputs to a look-up
table. The data storage system is said may include a small computer
system interface (SCSI) disc (or "disk") drive that executes
commands from a host system. These various U.S. Patents and the
U.S. Patent Publication do not address the specific problems
discussed above, and thus do not provide a solution for these
problems.
SUMMARY OF THE PREFERRED EMBODIMENTS
[0010] The foregoing and other problems are overcome, and other
advantages are realized, in accordance with the presently preferred
embodiments of these teachings.
[0011] In accordance with a computer program product, apparatus and
a method there is provided a redundant network wherein a host
computer operates with at least one storage device by monitoring
conditions of the multipath storage network and controlling a
storage multipath device driver in conjunction with an associated
storage multipath device input/output (I/O) pending queue to
increase I/O throughput to a storage device driver, such as a disk
device driver, when I/O demand increases, and to decrease I/O
throughput to the storage device driver in the event of an I/O
error condition.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] The foregoing and other aspects of these teachings are made
more evident in the following Detailed Description of the Preferred
Embodiments, when read in conjunction with the attached Drawing
Figures, wherein:
[0013] FIG. 1 is a block diagram of a Storage Area Network (SAN)
system that is suitable for practicing this invention; and
[0014] FIG. 2 is a logic flow diagram that illustrates a method in
accordance with exemplary embodiments of the invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0015] FIG. 1 is a block diagram of a Storage Area Network (SAN)
system 10 that is suitable for practicing this invention. The SAN
system 10 includes a host 12 coupled to at least one storage
device, such as a disk drive 16, via at least one bus, also
referred to herein as a cable or data path 14. The host 12 may be a
computer, such as a mainframe computer, a workstation, a personal
computer, or any type of computing device. The disk drive 16 may in
practice comprise a plurality of disk drives, such as an array of
disk drives 15, and may be embodied as a Redundant Array of
Inexpensive Disks (RAID). There maybe a disk drive controller 19
having overall responsibility for storing data in and reading data
from the disk drives 16. The path 14 may be a Fiber Channel (FC)
type bus, and may implement a SCSI-type of interface. The exact
nature of the path 14 and/or the specifics of the path protocol are
not germane to the practice of the exemplary embodiments of this
invention.
[0016] For the purposes of describing the exemplary embodiments of
this invention the host 12 is assumed to include a storage
multipath device driver (SMDD) 12A that operates with a storage
multipath device I/O pending queue 12B. The host 12 is further
assumed to include a disk device driver (DDD) 18A that operates
with a disk device driver I/O pending queue 18B.
[0017] One or both of the storage multipath device driver 12A and
disk device driver 18A can be implemented using a data processor
that executes a stored software program, or with hardware logic
elements, or with a combination of software programs and hardware
logic elements. The I/O pending queues 12B and 18B may be
implemented using read/write memory of any suitable type, such as
semiconductor random access memory (RAM).
[0018] The host 12 may be assumed to include or be coupled to at
least one application (APP), and more typically a plurality of
applications (APP1, APP2, . . . , APPn), at least some of which
perform disk-based I/O via the storage multipath device driver 12A.
Typically the storage multipath device driver 12A is coupled to the
applications (APP1, APP2, . . . , APPn) via an operating system
13.
[0019] For the purposes of this invention the bus 14 may be
considered to a path, and in practice there may be a plurality of
paths (i.e., multipaths) between the host 12 and the storage
devices. This can be implemented using at least one host adapter
(HA) 12C coupled to at least two paths and to a switching fabric 20
from which multiple paths emanate to the storage devices. The use
of multiple paths between the host 12 and the storage devices
provides redundancy and avoids the generation of a single point of
failure (POF). Through the use of the plurality of paths 14 the SAN
10 may be considered to be a redundant SAN. The storage multipath
device driver 12A is assumed to have knowledge of the operational
status of the various paths 14 connecting the host 12 to the
storage devices 16. Another HA 12C can be used to couple to another
storage device or devices 22 either directly or via another switch
fabric (not shown).
[0020] By example, there maybe ten storage device LUNs (Logical
Unit Numbers), each a disk drive, and there may be eight paths 14
to each LUN.
[0021] In accordance with exemplary embodiments of this invention
the problems discussed above are solved by the addition of
intelligence into the workload management at the level of the
storage multipath device driver 12A. As opposed to using a constant
depth of the I/O pending queue 12B, regardless of the workload
change, the storage multipath device driver 12A dynamically adjusts
the amount of I/O sent to the disk device driver 18A depending on
the change of workload. This technique aids in balancing the size
of the storage multipath device driver I/O pending queue 12B, as
well as the disk device driver I/O pending queue 18B, under various
conditions of normal (non-error) operation and error recovery
operation.
[0022] By the use of the exemplary embodiments of this invention
the I/O throughput is increased when demand from the application(s)
is increasing, thus avoiding performance degradation and preventing
system hanging caused by the queue depth control implemented by the
storage multipath device driver 12A. In the event of an I/O
failure, the storage multipath device driver 12A is sensitive to
the change and effectively "tunes" the amount of I/O sent to the
disk device driver 18A to a smaller value to prevent I/O hanging or
performance degradation at the level of the disk device driver 18A,
as it would typically retry some number of times for each I/O
request.
[0023] At the level of the storage multipath device driver 12A
multiple retries are not performed to the same degree (if at all)
as the disk device driver 18A for each job queued at the I/O
pending queue 18B. Once a particular path 14 receives some certain
number of continuous errors it is taken offline. If all the paths
14 are taken offline, the storage multipath device driver 12A may
return all of the I/O requests on the I/O pending queue 12B to the
application(s), without any retries. Therefore, the storage
multipath device driver 12A does not typically encounter the same
performance degradation during an error recovery procedure as the
disk device driver 18A does.
[0024] By adding intelligence in the storage multipath device
driver 12A it becomes capable of dynamically sensing and responding
to a changing I/O volume, and to an occurrence of I/O errors, so as
to efficiently handle both normal or good conditions and error
conditions. The non-limiting embodiments of this invention can be
practiced with any storage multipath device driver on any platform
through the use of the Queue Depth of the I/O queue 12B to
dynamically control I/O flow.
[0025] For the implementation of the exemplary embodiments of this
invention a set of rules is established to control the I/O flow in
order to avoid performance degradation and/or system hanging during
stress I/O and/or error recovery. The set of rules are established
in consideration of at least the following elements:
[0026] (a) a Queue Depth value recommended by the storage device
controller 19 (e.g., the controller of the disk drive(s) 16);
[0027] (b) a Length of the storage multipath device driver I/O
pending queue 12B (where all unprocessed I/O requests are queued)
when the storage multipath device driver 12A begins to experience a
performance degradation during a stress I/O (high volume)
condition;
[0028] (c) a Length of the disk device driver I/O pending queue 18B
when disk device driver 18A begins to experience a performance
degradation during error recovery; and
[0029] (d) a Factor of the Queue Depth used by the storage
multipath device driver 12A to control the I/O flow to disk device
driver 18A when a performance degradation begins to be experienced
during error recovery.
[0030] After determining the above elements, the storage multipath
device driver 12A uses these elements to implement the following
logic. Reference is also made to the logic flow diagram of FIG.
2.
[0031] During a normal condition, at Block A the storage multipath
device driver 12A calculates an amount of I/O sent to the disk
device driver 18A using the following formula: Total amount of I/O
sent to disk device driver 18A on a device=Queue Depth.times.Global
Factor for Queue Depth.times.Total Number of Functioning Paths;
where the Global Factor for Queue Depth=1. Note that the Global
Factor for Queue Depth is preferably a factor of the Queue Depth
value used by all of the multipath storage devices 16.
[0032] The "normal condition" may be considered to be one where
there is absence of stress I/O and/or an error condition. As
employed herein "stress I/O" may be considered to be an amount of
application initiated storage device activity that exceeds a normal
amount of activity by some predetermined amount. The predetermined
amount may be fixed, or it may be variable depending on system
conditions. For example, on an AIX.TM. operating system 13 equipped
host 12 system (AIX.TM. is an open operating system, based on
UNIX.TM., that is available from the assignee of this patent
application), the following table defines a suitable rule for this
implementation (n is the number of I/O requests): TABLE-US-00001
Global Factor for Queue Depth Length of Pending Queue (for all
storage devices) n >= 1200 3 800 < n < 1200 2 n < 800
1
[0033] For example, if the number of I/O requests on the I/O
pending queue 12B is 1100, the calculation performed by the storage
multipath device driver 12A using the formula recited above uses a
value of 2 for the Global Factor for Queue Depth.
[0034] In the stress I/O environment, and at Block B, the storage
multipath device driver 12A monitors the length of the I/O pending
queue 12B and adjusts the Global Factor for Queue Depth value
accordingly to allow more I/O to be sent to the disk device driver
18A.
[0035] In the error condition, at Block C, the storage multipath
device driver 12A monitors the number of functioning paths and
adjusts the individual factor for Queue Depth correspondingly to
reduce the amount of I/O sent to the disk device driver 18A. If a
percentage of functioning paths of a multipath device (referred to
herein as m) is reduced to less than 100%, the storage multipath
device driver 12A switches from using the global factor for normal
or stress I/O conditions to an individual factor of this disk 16
for controlling the queue depth during the error condition.
[0036] For example, and assuming again the non-limiting case of an
AIX.TM. operating system 13 installed on the host 12, the following
illustrates a suitable rule for use in the implementation: if
50%<m<100%, then the Individual Factor for Queue Depth (per
multipath device), denoted as (f), is given by: If Global Factor
for Queue Depth>=2, f=Global Factor for Queue Depth-1; else if
m<=50%, then f=1.
[0037] For example, assume in an exemplary case that the total
number of paths to the storage devices 16 is eight, that the number
of functioning paths is six, and that the global factor for queue
depth is three. In this case the condition of 50%<m<100% is
satisfied. Therefore, the following calculation is performed by the
storage multipath device driver 12A: Total amount of I/O sent to
disk device driver 18A=Queue Depth.times.(Global Factor for Queue
Depth-1).times.Total number of functioning paths.
[0038] Based on the foregoing discussion it should be appreciated
that by implementing the described methods in the storage multipath
device driver 12A, the storage multipath device driver 12A is
enabled to dynamically adjust the I/O flow to the disk device
driver 18A based on the workload and the presence or absence of I/O
errors to avoid performance degradation or system hanging in stress
I/O and in error conditions.
[0039] A feature of the exemplary embodiments of this invention is
that the bandwidth between the host 12 and the storage devices 16
can be adjusted corresponding to I/O conditions to avoid the
generation of a bottleneck at either the storage multipath device
driver 12A or the disk device driver 18A.
[0040] A further feature of the exemplary embodiments of this
invention is that overloading of the storage multipath device
driver 12A is avoided during non-error conditions (including during
stress I/O conditions), and the overloading of the disk device
driver 18A is avoided during error conditions.
[0041] The foregoing description has provided by way of exemplary
and non-limiting examples a full and informative description of the
embodiments of this invention. However, various modifications and
adaptations may become apparent to those skilled in the relevant
arts in view of the foregoing description, when read in conjunction
with the accompanying drawings and the appended claims. For
example, while the storage devices have been described as the disk
drives 16 coupled to the disk device driver 18A, in other
embodiments other types of storage devices may be used, such as
tape storage devices and semiconductor memory-based storage
devices. The DDD 18A may thus be referred to more generally as a
storage device driver, and the associated I/O queue 18B as a
storage device driver I/O pending queue. Further, the disk drives
15 maybe based on magnetic technology, or on optical technology,
and may use fixed or removable storage medium. Still further, it
can be appreciated that the SMDD 12A may be responsive to a
plurality of different error conditions, such as errors arising in
one or more of the disk drives 16, disk drive controller 19, the
switch fabric 20 and/or the HA 12C. Further in this regard the
error condition processing performed by the SMDD 12A may be
tailored, if desired, in accordance with the source of the error
and may thus be adaptive in nature. However, all such modifications
of the teachings of this invention will still fall within the scope
of the embodiments of this invention.
[0042] Furthermore, some of the features of the embodiments of this
invention may be used to advantage without the corresponding use of
other features. As such, the foregoing description should be
considered as merely illustrative of the principles, teachings and
embodiments of this invention, and not in limitation thereof.
* * * * *