U.S. patent application number 11/965274 was filed with the patent office on 2008-05-01 for system and method for selecting fibre channel switched fabric frame paths.
This patent application is currently assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION. Invention is credited to Jonathan Wade Ain, Robert George Emberty, Craig Anthony Klein, Peter Connley Lancaster.
Application Number | 20080101256 11/965274 |
Document ID | / |
Family ID | 39329980 |
Filed Date | 2008-05-01 |
United States Patent
Application |
20080101256 |
Kind Code |
A1 |
Ain; Jonathan Wade ; et
al. |
May 1, 2008 |
SYSTEM AND METHOD FOR SELECTING FIBRE CHANNEL SWITCHED FABRIC FRAME
PATHS
Abstract
A system and method for measuring data transmission activity
through a port of a switch device interconnecting nodes of a
storage area network, the port transmitting data as words of
predetermined length, one data word indicating idle port activity.
The method includes steps of counting a number of transmitted words
received from the port in a first counter device; and, for each
word counted, comparing that word with a predetermined word
indicating no (idle) port transmission activity. In response to the
comparing, a number of matches are counted in a second counter
device. In this manner, a ratio of a number of counted matches with
a total amount of words counted indicates available bandwidth for
transmitting additional data over that link. Preferably, this
available bandwidth information is included in a link state record
that the switch communicates to other switch devices
interconnecting that link. Processing devices at the switches
determine a link cost factor, based on the available bandwidth of
that link and, in addition, the link speed, the cost factor being
used to optimize path selection over links in the network according
to a path routing algorithm.
Inventors: |
Ain; Jonathan Wade; (Tucson,
AZ) ; Klein; Craig Anthony; (Tucson, AZ) ;
Emberty; Robert George; (Tucson, AZ) ; Lancaster;
Peter Connley; (Tucson, AZ) |
Correspondence
Address: |
SCULLY, SCOTT, MURPHY, & PRESSER, P.C.
400 GARDEN CITY PLAZA
SUITE 300
GARDEN CITY
NY
11530
US
|
Assignee: |
INTERNATIONAL BUSINESS MACHINES
CORPORATION
New Orchard Road
Armonk
NY
|
Family ID: |
39329980 |
Appl. No.: |
11/965274 |
Filed: |
December 27, 2007 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10317765 |
Dec 12, 2002 |
7327692 |
|
|
11965274 |
Dec 27, 2007 |
|
|
|
10238751 |
Sep 10, 2002 |
7339896 |
|
|
10317765 |
Dec 12, 2002 |
|
|
|
Current U.S.
Class: |
370/252 ;
370/232; 370/395.1; 370/401; 370/468 |
Current CPC
Class: |
H04L 69/329 20130101;
H04L 25/4908 20130101; H04L 47/10 20130101; H04L 47/125 20130101;
H04L 49/351 20130101; H04L 67/322 20130101; H04L 43/0882 20130101;
H04L 49/357 20130101; H04L 67/1097 20130101; H04L 29/06 20130101;
H04L 47/11 20130101 |
Class at
Publication: |
370/252 ;
370/232; 370/395.1; 370/468; 370/401 |
International
Class: |
H04L 12/26 20060101
H04L012/26; G08C 15/00 20060101 G08C015/00 |
Claims
1. A system for optimizing data transmission activity through ports
of a switch device interconnecting nodes of a network, the port
transmitting data as words of predetermined length, one data word
indicating idle port activity, said system comprising: first
counter means for counting a fixed amounts of transmitted words
received from said port; means for comparing each received word of
said fixed amount with a predetermined word indicating said idle
port transmission activity; and, means for counting a number of
matches in a second counter means, a processing device for
computing a ratio of a number of counted matches with said fixed
amount of words counted, said ratio indicating available bandwidth
for transmitting additional data through said port, said processing
device communicating said available bandwidth information to other
switch devices to thereby optimize transmission of data through
ports interconnecting said switch devices.
2. The system as claimed in claim 1, wherein said network is a
fiber channel network comprising switch devices interconnecting
nodes by communication links, said links carrying data in serial
form between switch devices in a switch fabric of said fiber
channel network, said processing device further computing a cost of
transmission over a link interconnecting said port in the network
as a basis for determining transmission of data over a path
including said interconnected link, wherein said link cost
considers a speed of said link and said available bandwidth
information.
3. The system as claimed in claim 2, wherein said data words are
communicated over said link in serial form and received as a serial
stream, said system further comprising: a means for synchronizing
receipt of said data words from said serial stream and generating a
clock signal indicating receipt of a transmitted word in said
serial stream; and, a means for de-serializing said data stream and
converting each received word to a parallel format.
4. The system as claimed in claim 2, wherein said link cost forming
a basis for routing data in said network is calculated according
to: Link Cost=S*(n/Baud Rate) with S and n being pre-defined
values, and Baud Rate indicating said link speed.
5. The system as claimed in claim 4, wherein said ratio indicating
available bandwidth for transmitting additional data through said
port is defined as a variable w' of byte length, said processing
device further computing a used bandwidth, w, of a link according
to: w=1-w'/255.
6. The system as claimed in claim 5, wherein said link cost forming
a basis for routing data in said network is calculated according
to: Link Cost=S*w*(n/baud rate) with S and n being pre-defined
values, and Baud Rate indicating said link speed.
7. The system as claimed in claim 5, wherein said processing device
generates a Link State Record (LSR) for communicating said
available bandwidth information to other switch devices in said
network, said available bandwidth information inserted in said LSR
as said byte W'.
8. The system as claimed in claim 7, wherein said network
implements a Fabric Shortest Path First algorithm for determining
frame routing through said network based on link speed and said
available bandwidth information provided in said LSR.
9. A switch device for routing data over links interconnecting
nodes of a network, each switch including a port interfaced to a
link for communicating data along paths including one or more links
in the network, each port transmitting data as words of
predetermined length, one data word indicating idle port activity,
the switch device comprising: first counter means for counting a
fixed amounts of transmitted words received from said port; means
for comparing each received word of said fixed amount with a
predetermined word indicating said idle port transmission activity;
and, means for counting a number of matches in a second counter
means, a processing device for computing a ratio of a number of
counted matches with said fixed amount of words counted, said ratio
indicating available bandwidth for transmitting additional data
through said port, said processing device communicating said
available bandwidth information to other switch devices to thereby
optimize transmission of data through ports interconnecting said
switch devices.
10. The switch device as claimed in claim 9, wherein said network
is a fiber channel network comprising switch devices
interconnecting nodes by said links, said links carrying data in
serial form between switch devices in a switch fabric of said fiber
channel network, said processing device further computing a cost of
transmission over a link interconnecting said port in the network
as a basis for determining transmission of data over a path
including said interconnected link, wherein said link cost
considers a speed of said link and said available bandwidth
information.
11. The switch device as claimed in claim 10, wherein said
processing device calculates said link cost as: Link Cost=S*(n/Baud
Rate) with S and n being pre-defined values, and Baud Rate
indicating said link speed.
12. The switch device as claimed in claim 11, wherein said ratio
indicating available bandwidth for transmitting additional data
through said port is defined as a variable w' of byte length, said
processing device farther computing a used bandwidth, w, of a link
according to: w=1-w'/255.
13. The switch device as claimed in claim 12, wherein said link
cost forming a basis for routing data in said network is calculated
according to: Link Cost=S*w*(n/baud rate) with S and n being
pre-defined values, and Baud Rate indicating said link speed.
14. The switch device as claimed in claim 12, wherein said
processing device generates a Link State Record (LSR) for
communicating said available bandwidth information to other switch
devices in said network, said available bandwidth information
inserted in said LSR as said byte w'.
15. A storage area network comprising: a plurality of network nodes
each capable of receiving and transmitting data; one or more switch
devices for routing data over links interconnecting said nodes,
each switch including a port interfaced to a link for communicating
data along paths including one or more links in the network, each
port transmitting data as words of predetermined length, one data
word indicating idle port activity, wherein the switch device
comprises: first counter means for counting a fixed amounts of
transmitted words received from said port; means for comparing each
received word of said fixed amount with a predetermined word
indicating said idle port transmission activity; and, means for
counting a number of matches in a second counter means, a
processing device for computing a ratio of a number of counted
matches with said fixed amount of words counted, said ratio
indicating available bandwidth for transmitting additional data
through said port, and said processing device communicating said
available bandwidth information to other switch devices to thereby
optimize path selection over links in the network according to a
path routing algorithm.
Description
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application is a continuation application of U.S.
patent application Ser. No. 10/317,765 filed Dec. 12, 2002, which
is a continuation-in-part application based upon and claiming the
benefit of the filing of commonly-owned, co-pending U.S. patent
application Ser. No. 10/238,751 filed Sep. 10, 2002 entitled
"AVAILABLE BANDWIDTH DETECTOR FOR SAN SWITCH PORTS," the contents
and disclosure of which are fully incorporated herein by
reference.
BACKGROUND OF THE INVENTION
[0002] 1. Field of the Invention
[0003] The present invention relates to fibre channel switched
networks and particularly to a system and method for selecting
frame paths in a fibre channel switched network that takes into
account available bandwidth considerations.
[0004] 2. Description of the Prior Art
[0005] FIG. 1 depicts generally a Storage Area Network (SAN) 10
which is a dedicated high performance network capable of moving
data between heterogeneous servers 16a, 16b, . . . , 16n and
storage resources such as disk drives and arrays (RAIDS) 18 or tape
storage devices and/or libraries 20. As shown in FIG. 1, a Local
Area Network (LAN) 12 is provided which enables the sharing of data
files among groups of user clients, such as desktop computers 14a,
14b, . . . , 14n. The LAN 12 may comprise an Internet Protocol (IP)
network such as Ethernet and provides client/server connectivity
between the desktop client 14a, and SAN server devices 16a, 16b, .
. . , 16n using messaging communications protocols like TCP/IP. The
SAN 10 includes a separate dedicated network, such as a Fiber
Channel network 25, that preferably comprises a switched topology
or "fabric" including fiber channel interconnect devices such as
switches, 30, routers 22 and high speed serial links 26
interconnecting the servers 16a, 16b, . . . ,16n to the storage
subsystems 18, 20 for storage networking. As known, such a SAN
architecture 10 advantageously minimizes any traffic conflicts and
provides for increased scalability, availability, and file
transfers over longer distances as compared to SANs of traditional
messaging networks comprising bus architectures. The Fiber Channel
based SAN, such as shown in FIG. 1, combines the high performance
of an I/O channel and the advantages of a network (connectivity and
distance of a network) using similar network technology components
like routers 22, switches 30 and gateways (not shown). Thus, SAN
products do not function like a server. Rather, the SAN product
processes block I/O protocols, such as Fiber Channel Protocol
(SCSI-FCP) or Fiber Connection (FICON), for some other system,
e.g., a server.
[0006] As known, the fiber channel switching fabric 25 is organized
into logical entities including ports, nodes and platforms. For
instance, fiber channel "nodes" are physical devices, e.g., disk
drive or disk arrays, workstations, storage devices, etc., that may
be a source or destination of information to/from other nodes. Each
node comprises one or more "ports" which are the hardware
interfaces that connect all fiber channel devices to the topology
via links, i.e., electrical or optical transmit fibers, e.g. cables
of copper or optical fiber. Ports are designated and have different
attributes depending upon the switch topology in which they are
implemented, e.g., point-to-point, arbitrated loop, fabric.
[0007] In Fibre Channel networks comprising a switching fabric,
such as shown in FIG. 1, switches 30 communicate to each other over
switch-to-switch links via Expansion or "E"-ports. A part of each
switch's function in the network is to generate a Link State Record
("LSR") 99 that completely describes the connectivity of a switch
to all switches to which it is directly attached. The LSR 99
generated at a switch is communicated to all other switches
connected to that switch to provide the switch fabric with
information such as the status of each switch port. The ANSI Fibre
Channel Switch Fabric-3 (FC-SW-3) rev 6.01 (NCITS) working draft
proposed American National Standard for Information Technology
(Jun. 1, 2002), incorporated herein by reference, describes in
greater detail the composition of the LSR that is communicated. For
instance, as described in the proposed ANSI Fibre Channel Switch
Fabric-3 standard, basic information included in the LSR includes,
but is not limited to: whether a particular port is up, the speed
of a link connected to the port, e.g., 1 Gbit/sec, 2 Gbit/sec,
etc., the LSR age, an options field, a length, checksum bytes,
etc.
[0008] Typically, the LSR header is 24 bytes having a configuration
as follows: [0009] byte 1 . . . Type [0010] byte 2 . . . Reserved
[0011] bytes 3-4 . . . LSR Age [0012] bytes 5-8 . . . Options
[0013] bytes 9-12 . . . Link State ID [0014] bytes 13-16 . . .
Advertising Domain ID [0015] bytes 17-20 . . . Link State
Incarnation [0016] bytes 21-22 . . . Checksum [0017] bytes 23-24 .
. . LSR Length
[0018] From this information, whenever a switch comes up in the
Fibre Channel network, it may then look at the speed of the link
and the number of hops to determine the cost of a particular path,
the proposed cost being a combination of the speed of the links
versus the number of switches it goes to. From this information, a
shortest path may be calculated using a well known algorithm, e.g.,
a Fabric Shortest Path First (FSPF) path selection protocol. A more
detailed description of the FSPF algorithm may be found at the T 11
standards website at section (8) of the D Switch Fabric-2
specification, revision 5.4, incorporated by reference herein.
[0019] Within a Storage Area Network (SAN) a path selection process
for routing frames only considers the link cost in the fibre
channel switched fabric to determine the best path for routing
frames through fibre switches. The link cost is a measurement that
is calculated by the following formula: Link Cost=S*(1.0625e12/Baud
Rate)
[0020] By default, S is an administrative value, typically set to
one. The number 1.0625e12 is exemplary and for purposes of
discussion is equal to 1000 times 1.0625e9 (which represents a 1
Gb/s link speed). Thus, for example, when the Link Cost is
calculated for a 1.0625 Gb/s Fibre Channel Link, this calculation
yields (with S set to 1.0): 1.0 *(1.0625e12/1.0625e9)=1000. It
should be understood that the 1.0625e12 number is configurable and
may change in accordance with link speed. Currently, link cost only
considers link speed (i.e., the Baud rate). However, while link
speed is one important measurement to consider in best frame path
selection, there are several other factors that may be considered
as well. One of these additional factors would be the current
congestion or amount of available bandwidth for each link along
each available path through fabric.
[0021] It would be highly desirable to provide a frame path
selection system and method that takes into account available
bandwidth of each port (link) and the link cost, in real time.
SUMMARY OF THE INVENTION
[0022] It is an object of the present invention to provide a system
and method for determining an amount of available bandwidth at each
switch port, in real-time, and utilizing this available bandwidth
information in a manner to provide for more accurate path selection
and frame routing algorithms.
[0023] It is a further object of the present invention to provide a
system and method for determining an amount of available bandwidth
at each switch port, in real-time, and inserting this available
bandwidth information in the Link State Record for propagation to
all other switches in the fabric, so that it each switch will know
the available bandwidth for all ports within the network to
optimize routing decisions.
[0024] It is another object of the present invention to provide a
system and method for determining an amount of available bandwidth
at each switch port, in real-time, and inserting this available
bandwidth information in the Link State Record and utilizing this
added bandwidth information to influence frame routing
decisions.
[0025] The invention particularly comprises adding a definition of
a value for placement in a defined byte field in the Link State
Record (LSR) that would reflect the amount of bandwidth available
for each link. Using this value, fibre channel network switches may
take not only link speed into consideration but also consider
current traffic and congestion on the associated link. Thus, the
percentage of bandwidth available or current congestion found on
the fibre link may be factored in along with the link speed.
[0026] Thus, according to the principles of the invention, there is
provided a system and method for measuring data transmission
activity through a port of a switch device interconnecting nodes of
a storage area network, the port transmitting data as words of
predetermined lengths one data word indicating idle port activity.
The method includes steps of: counting a number of transmitted
words received from the port in a first counter device; and, for
each word counted, comparing that word with a predetermined word
indicating no (idle) port transmission activity. In response to the
comparing, a number of matches are counted in a second counter
device. In this manner, a ratio of a number of counted matches with
a total amount of words counted indicates available bandwidth for
transmitting additional data over that link. Preferably, this
available bandwidth information is included in a Link State Record
that the switch communicates to other switch devices
interconnecting that link. Processing devices at the switches
determine a link cost factor, based on the available bandwidth of
that link and, in addition, the link speed, the cost factor being
used to optimize path selection over links in the network according
to a path routing algorithm.
[0027] It is understood that the system and method of the present
invention may be implemented at switch nodes in many types of SANs,
including Gigabit Ethernet, Infiniband, and iSCSI. Furthermore, the
present invention may be implemented for determining available
bandwidth for other types of Fiber Channel node ports. That is,
other ports interconnected by links in a switch fabric may benefit
from the system and method.
BRIEF DESCRIPTION OF THE DRAWINGS
[0028] Further features, aspects and advantages of the apparatus
and methods of the present invention will become better understood
with regard to the following description, appended claims, and
accompanying drawings where:
[0029] FIG. 1 depicts generally a Storage Area Network (SAN) 10
including a dedicated high performance network capable of moving
data between heterogeneous servers and storage resources such as
disk drives and arrays (RAIDS) or tape storage devices and/or
libraries; and,
[0030] FIG. 2 illustrates the state machine for measuring the
activity through the various ports of switches in a switch fabric
of a Fibre Channel Network.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
[0031] The fibre bandwidth available at a port is measured
according to a technique that includes counting the number of idles
state words found at any one time on the fibre link. Details
concerning this measurement technique is disclosed in
commonly-owned, co-pending U.S. patent application Ser. No.
10/238,751 filed Sep. 10, 2002 entitled "AVAILABLE BANDWIDTH
DETECTOR FOR SAN SWITCH PORTS," the whole content and disclosure of
which is fully incorporated herein by reference.
[0032] Briefly, in view of FIG. 2, there is depicted a novel state
machine for measuring the activity through the various ports of the
switches in a switch fabric of a Storage Area Network according to
the present invention. As shown in FIG. 2, a data stream 110
communicated from a node is received at a switch port (not shown)
along link 100. The data stream is received and processed by the
SERDES module 102 which provides link control for a fiber channel
port. The SERDES deserializer receives the serial stream and
generates 10-bit wide data bytes (encoded characters), and a word
clock 130, indicating a word is available.
[0033] The received 10-bit wide data byte is tapped off the output
of the SERDES module 102 and clocked into a 10-bit wide.times.4
deep shift FIFO register 112 with parallel access to accumulate a
transmitted ordered set comprising 40 characters, i.e., four
ten-bit words. The resulting 40-bit data word is compared with the
"IDLE" ordered set, which is a special ordered set (40-bit word)
specified by the Fiber Channel protocol to be transmitted when a
port (of a node) has no valid data to send. Preferably, the special
40-bit IDLE word is hard-wired in a register 114 or equivalent data
storage structure. When the FIFO register 112 has received four
characters in succession (i.e., the 40 bit word), a comparator
device 116 is triggered compares the received ordered set to
determine if the received ordered set corresponds to the IDLE
ordered set (word). Each time an IDLE word is detected by
comparator l16, a comparator output signal is generated to
increment a counter device 120 for counting IDLE words.
Simultaneously with the detection and counting of received IDLE
words, a word counter device 122 is provided to count the total
number of words received. Particularly, as shown in FIG. 2, the
word clock 130 that clocks the received 10-bit wide data words into
the shift FIFO register 112, is additionally implemented to count
the total number of received words in the word counter device 122.
Reset logic circuit 124 is provided to generate a reset signal 132
when the counter device 122 has counted a pre-determined number of
words. The reset logic word count is configurable depending upon
the type of network implemented, and for purposes of explanation,
may be set to reach a value 25.times.10.sup.6, for example. The
value of 25.times.10.sup.6 words, in the example system illustrated
in FIG. 2, would correspond to a link 100 data rate of 1.0 Gbit/sec
as there are 4 characters/word and 10 bits/character (according to
the 8b/10b encoding scheme) which is multiplied by
25.times.10.sup.6 words total number received and counted. Thus,
when the amount of words received (and counted) has reached the
value specified by the reset logic circuit 124 (e.g.,
25.times.10.sup.6), the reset signal 132 is generated to latch the
value of the IDLE counter register 120 by a counter latch device
122. Additionally, at that moment, the reset signal 132 resets the
IDLE counter 120 and word counter 122, so that continuous bandwidth
activity at a switch port may be ascertained. Preferably, the
latched IDLE counter value is communicated to a processor device
e.g., provided in the switch, via a bus 140. In this manner, the
switch processor may thus compute a percentage comprising a ratio
of the number of IDLE ordered sets (words) received for a fixed
number of transmission words (e.g., 25.times.10.sup.6), which
translates into available bandwidth.
[0034] Thus, in one embodiment, as the fibre link bandwidth
available is measured by counting the number of idles found at any
one time in the fibre link, this measurement value may be inserted
in the Link State Record (LSR), for example, in the defined Link
Options field within the LSR which field is large enough to count
up to 4 Gbyte of idles on each link. Presently, this Link Options
field has no options defined, and is set to 0.times.00 0.times.00
0.times.00 0.times.00.
[0035] In an embodiment that avoids the use of the entire Link
Options field, the unused bandwidth may be computed as a percentage
of the total bandwidth of the associated link. In this manner, the
switch processor device may compute a percentage comprising a ratio
of the number of IDLE ordered sets (words) received for a fixed
number of transmission words (e.g., 25.times.10.sup.6), which
translates into available bandwidth, referred to herein as a
variable w'. Preferably, the available bandwidth w' is computed for
each link subsection and may comprise a one byte number having
values 1-255, for example.
[0036] Once the amount of available bandwidth w' is determined,
this value is inserted in the Link State Record (LSR), for example,
in the defined Reserved field (one byte) within the LSR, or, may be
provided in a new defined byte field provided in the LSR. For
example, this new field may reside in byte 0.times.45 of the FSPF
(Fabric Shortest Path First) Information Unit, i.e. word 3, byte 1
of the link descriptor. Accordingly, based on the available
bandwidth information provided in the LSR, the bandwidth of any
selected path is determined to be equal to the bandwidth of the
link having the least available bandwidth within that path.
[0037] Link Cost may then be computed using this additional factor,
and thus to some degree, reflect actual link usage. Thus, with the
available bandwidth information w' (a number from 1 to 255, for
example), the used bandwidth, w, of a link may be computed as
follows: w=1-w'/255
[0038] Link Cost for each link can then be calculated using the
current administratively defined factor S, the baud rate and the
percentage of used bandwidth: Link Cost=S*w*(1.0625e12/Baud
rate)
[0039] In an exemplary embodiment, the switch that owns the LSR
record will transmit an update of the LSR including the available
bandwidth information for each LSR Refresh Time-Out Value
(L_R_TOV), which is 30 minutes by default. In this way, each
additional switch will have the current Link Cost as well as the
amount of available bandwidth for each link that it is attached to
select the optimum paths for subsequent frames. This method would
result in better performance and control over the Storage Area
Network (SAN) preventing bottlenecks due to over used links and
paths from the switch.
[0040] It is understood that the system and method of the present
invention may be implemented at switch nodes in many types of SANs,
including Gigabit Ethernet, Infiniband, and iSCSI. Furthermore, the
present invention may be implemented for determining available
bandwidth for other types of Fiber Channel node ports. That is,
other ports interconnected by links in a switch fabric may benefit
from the system and method.
[0041] While the invention has been particularly shown and
described with respect to illustrative and preferred embodiments
thereof, it will be understood by those skilled in the art that the
foregoing and other changes in form and details may be made therein
without departing from the spirit and scope of the invention which
should be limited only by the scope of the appended claims.
* * * * *