U.S. patent application number 10/827930 was filed with the patent office on 2004-10-07 for centralized technique for assigning i/o controllers to hosts in a cluster.
This patent application is currently assigned to Intel Corporation. Invention is credited to Shah, Rajesh R..
Application Number | 20040199680 10/827930 |
Document ID | / |
Family ID | 32298396 |
Filed Date | 2004-10-07 |
United States Patent
Application |
20040199680 |
Kind Code |
A1 |
Shah, Rajesh R. |
October 7, 2004 |
Centralized technique for assigning I/O controllers to hosts in a
cluster
Abstract
A technique is provided for assigning an I/O controller to a
host in a cluster. The cluster includes one or more hosts and one
or more I/O controllers connected by a cluster interconnection
fabric. In an example embodiment, an I/O controller is connected to
the cluster interconnection fabric. The I/O controller connected to
the fabric is detected and a network address is assigned to the I/O
controller. An administrative agent is used to assign the I/O
controller to a host that is connected to the cluster
interconnection fabric. A message is sent to the host informing the
host that the I/O controller is assigned to the host and providing
the network address of the I/O controller.
Inventors: |
Shah, Rajesh R.; (Portland,
OR) |
Correspondence
Address: |
SCHWEGMAN, LUNDBERG, WOESSNER & KLUTH, P.A.
P.O. BOX 2938
MINNEAPOLIS
MN
55402
US
|
Assignee: |
Intel Corporation
|
Family ID: |
32298396 |
Appl. No.: |
10/827930 |
Filed: |
April 19, 2004 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10827930 |
Apr 19, 2004 |
|
|
|
09472445 |
Dec 27, 1999 |
|
|
|
6738818 |
|
|
|
|
Current U.S.
Class: |
710/36 |
Current CPC
Class: |
H04L 61/2038 20130101;
H04L 29/12254 20130101; H04L 29/12839 20130101; H04L 61/6022
20130101 |
Class at
Publication: |
710/036 |
International
Class: |
G06F 003/00 |
Claims
1-19. (Canceled)
20. A fabric manager coupled to a cluster interconnection fabric,
the fabric manager comprising instructions to assign an I/O
controller coupled to the cluster interconnection fabric to one or
more hosts coupled to the cluster interconnection fabric.
21. The fabric manager of claim 20 wherein the fabric manager
further comprises instructions to send messages to hosts coupled to
the cluster interconnection fabric indicating which hosts are
allowed access to the I/O controller.
22. The fabric manager of claim 20, wherein the fabric manager
comprises: a fabric services to detect I/O units coupled to the
cluster interconnection fabric, to assign a network address to each
I/O unit and to each port of each I/O unit, and to initialize each
I/O unit; and an I/O controller manager to identify each I/O
controller coupled to a port of each of the I/O units, to control
the I/O controllers, to identify each I/O device coupled to each
I/O controller, to assign each I/O controller to one or more hosts
coupled to the cluster interconnection fabric to allow each I/O
controller assigned to a selected host to be accessed by the
selected host, the I/O controller manager to send messages to the
hosts over the cluster interconnection fabric to report each I/O
controller assigned to each host.
23. The fabric manager of claim 22 wherein the I/O controller
manager comprises a software agent running on one of the hosts or a
distributed software agent running on a plurality of the hosts.
24. The fabric manager of claim 20 wherein the fabric manager is
coupled to a computer and is controlled by a human administrator
through the computer.
25. The fabric manager of claim 20 wherein the fabric manager, the
I/O controller, the hosts, and the cluster interconnection fabric
are compatible with a Next Generation Input/Output (NGIO)
specification.
26. A method of operating a cluster comprising: detecting an I/O
controller in a cluster; assigning the I/O controller to a host in
the cluster; and informing the host that the I/O controller is
assigned to the host.
27. The method of claim 26, further comprising: assigning the I/O
controller to two or more selected hosts in the cluster; and
informing the selected hosts that the I/O controller is assigned to
the selected hosts.
28. The method of claim 26, further comprising: assigning a network
address to the I/O controller; and providing the network address of
the I/O controller to the host.
29. The method of claim 26 wherein: detecting an I/O controller
comprises detecting an I/O controller selected from one or more I/O
controllers coupled to a cluster interconnection fabric in the
cluster; assigning the I/O controller further comprises assigning
the selected I/O controller to a host selected from one or more
hosts coupled to the cluster interconnection fabric; and informing
the host further comprises sending a message to the selected host
over the cluster interconnection fabric.
30. The method of claim 29, further comprising operating the
cluster according to a Next Generation Input/Output (NGIO)
specification.
31. The method of claim 29 wherein: detecting an I/O controller
further comprises detecting the selected I/O controller using an
interrupt mechanism or by sweeping the cluster interconnection
fabric; and assigning the I/O controller further comprises
assigning the selected I/O controller to the selected host by
matching the selected I/O controller to the selected host using a
database or based upon an assignment or information received from a
human administrator.
32. The method of claim 29, further comprising: sending a message
over the cluster interconnection fabric to the selected host
identifying the selected I/O controller according to a type of I/O
controller; and sending a message over the cluster interconnection
fabric to a host-fabric adapter of the selected host, the
host-fabric adapter being coupled to the cluster interconnection
fabric and having an address.
33. The method of claim 29 wherein: detecting an I/O controller
further comprises detecting the selected I/O controller with a
fabric manager coupled to the cluster interconnection fabric;
assigning the I/O controller further comprises assigning the
selected I/O controller using the fabric manager to allow the
selected host to access the selected I/O controller; and informing
the host further comprises sending a message to the selected host
over the cluster interconnection fabric from the fabric manager to
report the selected I/O controller to the selected host.
34. The method of claim 29, further comprising: detecting one or
more I/O units coupled to the cluster interconnection fabric, each
I/O unit being coupled between the cluster interconnection fabric
through an I/O controller-fabric adapter and a port for each of one
or more I/O controllers; identifying each I/O controller coupled
between a port of one of the I/O units and one or more I/O devices
to exchange information between the one or more I/O devices and the
cluster interconnection fabric; sending a message from a fabric
manager over the cluster interconnection fabric to each I/O unit to
identify I/O controllers coupled to each I/O unit; assigning an
address to each I/O unit or to each port of each I/O unit; and
assigning each I/O controller to one or more of the hosts coupled
to the cluster interconnection fabric, each of the hosts comprising
a personal computer or a server.
35. A cluster comprising: one or more hosts; an I/O controller; and
a fabric manager constructed and arranged to assign the I/O
controller to a selected one of the hosts comprising a personal
computer and to send a message to the personal computer indicating
that the I/O controller has been assigned to the personal
computer.
36. The cluster of claim 35 wherein the fabric manager comprises: a
fabric services to detect an I/O unit coupled to the I/O controller
in the cluster and to assign a network address to the I/O unit; and
an I/O controller manager coupled to the fabric services to
identify the I/O controller coupled to the I/O unit, to assign the
I/O controller to the personal computer, and to send a message to
the personal computer to report the I/O controller.
37. The cluster of claim 35 wherein: at least one of the one or
more hosts are coupled to a cluster interconnection fabric; the I/O
controller is coupled to the cluster interconnection fabric; and
the fabric manager is coupled to the cluster interconnection fabric
to assign the I/O controller and to send messages to the hosts over
the cluster interconnection fabric.
38. The cluster of claim 37 wherein the hosts, the I/O controller,
the cluster interconnection fabric, and the fabric manager are
compatible with a Next Generation Input/Output (NGIO)
specification.
39. The cluster of claim 37, further comprising: a plurality of I/O
controllers coupled to the cluster interconnection fabric; one or
more I/O devices coupled to each I/O controller; and wherein the
fabric manager is coupled to the cluster interconnection fabric to
assign each I/O controller to one or more hosts.
40. The cluster of claim 39 wherein: the cluster interconnection
fabric appears to each of the hosts in the cluster as an I/O bus;
each host comprises: a processor; a memory coupled to the
processor; a host-fabric adapter coupled between the processor and
the cluster interconnection fabric to interface the host to the
cluster interconnection fabric; and an operating system comprising
a kernel, a device driver for the host-fabric adapter, an I/O
controller driver that is specific for each assigned I/O
controller, and a fabric bus driver to provide a bus abstraction
for the cluster interconnection fabric; and further comprising one
or more I/O units, each I/O unit being coupled between one or more
I/O controllers and the cluster interconnection fabric through an
I/O controller-fabric adapter to interface between the cluster
interconnection fabric and each I/O controller coupled to the I/O
unit, each I/O controller being coupled between an I/O unit and one
or more I/O devices to control the I/O devices.
41. The cluster of claim 37 wherein the cluster interconnection
fabric comprises a collection of routers, switches, and
communication links that connect hosts and I/O units.
42. A machine-readable medium that provides instructions, which
when executed by one or more processors, cause the processors to
perform operations comprising: detecting an I/O controller in a
cluster; assigning the I/O controller to a host in the cluster; and
informing the host that the I/O controller is assigned to the
host.
43. The machine-readable medium of claim 42 wherein the
instructions, when further executed by one or more of the
processors, cause the processors to perform operations further
comprising: detecting each of one or more I/O controllers coupled
to a cluster interconnection fabric in the cluster; assigning each
I/O controller to one or more hosts coupled to the cluster
interconnection fabric to allow each I/O controller assigned to a
selected host to be accessed by the selected host; and sending
messages to the hosts over the cluster interconnection fabric to
report each I/O controller assigned to each host.
44. The machine-readable medium of claim 42 wherein the
instructions, when further executed by one or more of the
processors, cause the processors to perform operations further
comprising: detecting I/O units coupled to a cluster
interconnection fabric coupling elements of the cluster together;
assigning a network address to each I/O unit; and initializing each
I/O unit.
45. The machine-readable medium of claim 44 wherein the
instructions, when further executed by one or more of the
processors, cause the processors to perform operations further
comprising: identifying each I/O controller coupled to a port of
each of the I/O units; and identifying each I/O device coupled to
each I/O controller.
46. The machine-readable medium of claim 42 wherein the
instructions, which when executed by one or more of the processors,
are executed according to a Next Generation Input/Output (NGIO)
specification.
Description
FIELD
[0001] The invention generally relates to computers and more
particularly to a technique for assigning I/O controllers to hosts
in a cluster.
BACKGROUND
[0002] A cluster may include one or more hosts connected together
by an interconnection fabric. In traditional clusters, hosts have
locally attached I/O controllers connected to local buses. FIG. 1
illustrates a typical bus-based computer 100, which includes a
processor 102 connected to a host bus 103 and an I/O and memory
controller (or chipset) 104. A local I/O bus 105 is connected to an
I/O bridge 108. Several I/O devices are attached to the I/O bus,
including I/O controllers 110 and 112 and a Local Area Network
(LAN) Network Interface Card (NIC) 114. The I/O controllers 110 and
112 may be connected to one or more I/O devices, such as storage
devices, hard disk drives, or the like. I/O bus 105 is a
traditional I/O bus, such as a Peripheral Component Interconnect
(PCI bus) a Industry Standard Architecture (ISA) bus or Extended
ISA (EISA) bus, etc. A traditional I/O bus provides attachment
points to which I/O controllers can be attached.
[0003] A bus-based computer, such as that shown in FIG. 1, has a
number disadvantages and drawbacks. All of the I/O controllers on
the I/O bus share the same power and clock domain and share a
common address space. Due to the physical and electrical load
limitations, only a relatively small number of I/O controllers may
be attached to an I/O bus, and must be physically located within
the same cabinet. Thus, the entire I/O bus is physically attached
to a single computer system. Also, in traditional clusters, I/O
controllers are not directly connected to the network or cluster.
Thus, the I/O controllers on the I/O bus of a computer system are
directly visible (or detectable) and addressable only by that
computer system or host, but are not directly visible or
addressable to any other host in the cluster. For example, the I/O
controllers 110 and 112 are visible only to computer 100, and are
not visible or addressable to any other host which may be connected
to LAN 120. Therefore, bus-based computer systems provide a very
inflexible arrangement for I/O resources.
[0004] As a result, there is a need for a technique that provides a
much more flexible arrangement for I/O devices for computer
systems. In addition, under such a flexible arrangement of I/O
resources, a mechanism should be provided that allows for the
efficient and effective coordination and assignment between
controllers and hosts.
SUMMARY
[0005] According to an embodiment of the present invention, a
method is provided for assigning an I/O controller to a host in a
cluster. The cluster includes one or more hosts and one or more I/O
controllers connected by a cluster interconnection fabric. In an
example embodiment, an I/O controller is connected to the cluster
interconnection fabric. The I/O controller connected to the fabric
is detected and a network address is assigned to the I/O
controller. An administrative agent is used to assign the I/O
controller to a host that is connected to the cluster
interconnection fabric. A message is sent to the host informing the
host that the I/O controller is assigned to the host and providing
the network address of the I/O controller.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] The foregoing and a better understanding of the present
invention will become apparent from the following detailed
description of exemplary embodiments and the claims when read in
connection with the accompanying drawings, all forming a part of
the disclosure of this invention. While the foregoing and following
written and illustrated disclosure focuses on disclosing example
embodiments of the invention, it should be clearly understood that
the same is by way of illustration and example only and is not
limited thereto. The spirit and scope of the present invention
being limited only by the terms of the appended claims.
[0007] The following represents brief descriptions of the drawings,
wherein:
[0008] FIG. 1 is a block diagram illustrating a typical bus-based
computer.
[0009] FIG. 2 is a block diagram illustrating an example network
according to an embodiment of the present invention.
[0010] FIG. 3 is a block diagram of a host according to an example
embodiment of the present invention.
[0011] FIG. 4 is a block diagram of a host according to another
example embodiment of the present invention.
[0012] FIG. 5 is a block diagram illustrating an example software
stack for a traditional computer having bus-based I/O.
[0013] FIG. 6 is a block diagram illustrating a software driver
stack for a computer having fabric-attached I/O resources according
to an example embodiment of the present invention.
[0014] FIG. 7 is a flow chart identifying the steps performed by a
host during host initialization according to an example embodiment
of the present invention.
[0015] FIG. 8 is a block diagram of a network including a fabric
manager according to an example embodiment or the present
invention.
[0016] FIG. 9 is a flow chart illustrating the steps performed when
an I/O controller is connected or attached to the cluster
interconnection fabric according to an embodiment of the present
invention.
DETAILED DESCRIPTION
[0017] Network Architecture
[0018] Referring to the figures in which like numerals indicate
like elements, FIG. 2 is a block diagram illustrating an example
network according to an embodiment of the present invention. A
network is shown and may be a storage area network, a system area
network or other type of network. The network includes several
hosts, including host 210 and host 212, which may be personal
computers, servers or other types of computers. A host generally is
capable of running or executing one or more application-level (or
user-level) programs. Also, a host is generally capable of
initiating an I/O request (e.g., I/O reads or writes). Whereas,
many I/O controllers or devices themselves do not typically run
user-level programs and do not usually initiate I/O requests.
Rather, I/O controllers and devices usually only perform some task
or function in response to an I/O command or other request from a
host.
[0019] The network includes one or more input/output units (I/O
units) including I/O unit 1 and I/O unit 2. I/O unit 1 includes one
or more I/O controllers connected thereto, including I/O controller
222. I/O unit 2 includes I/O controllers 232 and 242 connected
thereto. The I/O units include components to interface the I/O
controllers to the fabric 202. Each I/O controller operates to
control one or more I/O devices. For example, I/O controller 222 of
I/O unit 1 is connected to I/O devices 223 and 224. For I/O unit 2,
I/O controller 232 is connected to I/O device 233, while I/O
controller 242 is connected to I/O device 243. The I/O devices may
be any of several types of I/O devices, such as storage devices
(e.g., a hard disk drive, tape drive) or other I/O device.
[0020] The hosts and I/O units (and their attached I/O controllers
and devices) may be organized into groups known as clusters, with
each cluster including one or more hosts and typically one or more
I/O units (each I/O unit including one or more I/O controllers).
The hosts and I/O units are interconnected via a cluster
interconnection fabric 202. Cluster interconnection fabric 202 is a
collection of routers, switches and communication links (such as
wires, connectors, cables, etc.) that connects a set of nodes
(e.g., connects a set of hosts and I/O units) of one or more
clusters. As shown in the example network of FIG. 2, the example
fabric 202 includes switches A, B and C, and links connected
between the switches.
[0021] In addition, each I/O unit includes an I/O controller-fabric
(IOC-fabric) adapter for interfacing between the fabric 202 and the
I/O controllers. For example, IOC-fabric adapter 220 interfaces the
controllers of I/O unit 1 to the fabric 202, while IOC-fabric
adapter 230 interfaces the controllers of I/O unit 2 to the fabric
202.
[0022] The number and arrangement of hosts, I/O units, I/O
controllers, I/O devices, switches and links illustrated in FIG. 2
is provided only as an example. A wide variety of implementations
and arrangements are possible.
[0023] Two embodiments of an example host (e.g., host 210) are
illustrated in FIGS. 3 and 4. FIG. 3 is a block diagram of a host
according to an example embodiment of the present invention.
Referring to FIG. 3, a host 210A includes a processor 302 coupled
to a host bus 303. An I/O and memory controller 304 (or chipset) is
connected to the host bus 303. A main memory 306 is connected to
the controller 304. An I/O bridge 308 operates to bridge or
interface between the I/O and memory controller 304 and an I/O bus
305. Several I/O controllers are attached to I/O bus 305, including
an I/O controllers 310 and 312. I/O controllers 310 and 312
(including any I/O devices connected thereto) are traditional
bus-based I/O resources.
[0024] A host-fabric adapter 325 is also connected to the I/O bus
305. Host-fabric adapter 325 may be considered to be a type of a
network interface card (e.g., usually including hardware and
firmware) for interfacing the host 210A to cluster interconnection
fabric 202. The host-fabric adapter 325 provides fabric
communication capabilities for the host 210A. For example, the
host-fabric adapter 325 converts data between a host format and a
format that is compatible with the fabric 202. For data sent from
the host 210, host-fabric adapter 325 formats the data into one or
more packets, including a header. The host-fabric adapter 325 may
provide reliability guarantees that the packets have reached the
intended target or destination through a series of zero or more
switches (in the fabric 202). In this embodiment shown in Fig.3,
the host-fabric adapter 325 is attached to a slot of I/O bus 305.
I/O bus 305 may be any type of I/O bus, such as a PCI bus for
example.
[0025] FIG. 4 is a block diagram of a host according to another
example embodiment of the present invention. Referring to FIG. 4, a
host 210B is illustrated and includes many of the same components
as the host 210 of FIG. 3. Only the differences will be described.
In FIG. 4, the host-fabric adapter 325 is connected directly to the
chipset or I/O and memory controller 304, rather than being
attached to an existing I/O bus. By connecting the host-fabric
adapter 325 to the chipset or I/O memory controller, this can free
or relieve the host-fabric adapter 325 of the limitations of the
I/O bus 305. There are different ways in which the host-fabric
adapter 325 can be connected to host 210. FIGS. 3 and 4 illustrate
two examples of how this may be done.
[0026] According to one example embodiment or implementation, the
components or units of the present invention are compatible with
the Next Generation Input/Output (NGIO) Specification. Under such
specification, the cluster interconnection fabric 202 is an NGIO
fabric, the host-fabric adapter 325 is a Host Channel Adapter
(HCA), and the IOC-fabric adapters are Target Channel Adapters
(TCA). However, NGIO is merely one example embodiment or
implementation of the present invention, and the invention is not
limited thereto. Rather, the present invention is applicable to a
wide variety of networks, hosts and I/O controllers.
[0027] As noted above, in traditional clusters the I/O controllers
are not directly connected to the network or fabric, but are only
attached as part of a host computer. However, according to an
embodiment of the present invention (e.g., as shown in FIGS. 3 and
4), the I/O units and their I/O controllers are not connected to
the fabric as a part of a host. Rather, the I/O units and I/O
controllers are directly and separately connected to the cluster
interconnection fabric 202 (and typically not as part of a host).
For example, I/O unit 1 including controller 222 and I/O unit 2
including I/O controllers 232 and 242 are directly (or
independently) connected to fabric 202. In other words, the I/O
units (and their connected I/O controllers and I/O devices) are
attached as separate and independent I/O resources to fabric 202 as
shown in FIGS. 2-4, rather than as part of a host.
[0028] According to an embodiment, this provides a very flexible
approach in which I/O units, I/O controllers (and I/O devices)
connected to a cluster interconnection fabric can be assigned to
one or more hosts in the cluster (rather than having a
predetermined or fixed host assignment based upon being physically
connected to the host's local I/O bus). The I/O units, I/O
controllers and I/O devices which are attached to the cluster
interconnection fabric 202 may be referred to as fabric-attached
I/O resources (i.e., fabric-attached I/O units, fabric attached I/O
controllers and fabric-attached I/O devices) because these are
directly attached to the fabric 202 rather than being connected
through (or as part of) a host.
[0029] In addition, according to an embodiment, the hosts in a
cluster can detect and then directly address I/O units and I/O
controllers (and attached I/O devices) which are directly attached
to the cluster interconnection fabric 202 (i.e., the
fabric-attached I/O controllers). However, a mechanism must be
provided that allows a host to detect and address fabric-attached
I/O controllers and devices, while preferably being compatible with
many currently available operating systems.
[0030] A Fabric Bus Driver: Providing a Bus Abstraction to the OS
for the Cluster Interconnection Fabric
[0031] In many current operating systems, such as Windows 2000, all
I/O controllers are assumed to be attached to an I/O bus. In
Windows 2000, for example, there is separate kernel-mode software
driver for each I/O bus, known as an I/O bus driver, which
understands the specific characteristics, syntax, commands (or
primitives), format, timing, etc. of that particular I/O bus. Under
Windows 2000, all bus drivers provide an interface or translation
between the host operating system and the I/O controllers connected
to the I/O bus for detecting or identifying the I/O controllers
which are connected to the I/O bus, and reporting the I/O
controllers to the operating system.
[0032] The operating system kernel uses one standard set of
primitives or commands and syntax for communicating with each I/O
bus driver, for example, to identify or enumerate the I/O
controllers connected to each I/O bus, to configure the connected
I/O controllers on each I/O bus, and other control functions. For
example, the I/O bus drivers assist the host operating system in
managing dynamic addition and removal of I/O controllers on that
bus if the underlying bus hardware supports its. In addition, the
I/O bus drivers assist the operating system in power managing
(e.g., powering down I/O devices during non-use) the I/O
controllers on that bus if the underlying bus and I/O controllers
support it.
[0033] To allow communication between the operating system kernel
and each of several different I/O buses, each of the I/O bus
drivers translates between the I/O specific primitives and syntax
to a standard set of primitives and syntax used by the operating
system. The operating system can invoke or call specific or well
known or standard commands or entry points in the bus driver to
query the capabilities of the I/O bus and the attached I/O
controllers (e.g., to request a list of attached I/O controllers
and devices and to configure the I/O controllers) and to power
manage the I/O controllers.
[0034] FIG. 5 is a block diagram illustrating an example software
stack for a traditional computer having bus-based I/O. A host
operating system 500 is in communication with an I/O bus, such as a
PCI bus 525 for example. Several I/O devices are attached to the
slots of the PCI bus 525, including I/O controller 110 for a
storage device S1, an I/O controller 112 for a storage device S2,
and a LAN NIC 114.
[0035] The host operating system 500 includes a kernel 504 and an
I/O manager 507 for managing the I/O buses and attached I/O
resources (I/O controllers and devices). The operating system 500
also includes a PCI bus driver 520 (as an example I/O bus driver)
which translates between the PCI specific primitives and syntax to
a standard set of primitives and syntax used by the kernel 504 or
I/O manager 507. The PCI bus driver 520 is provided for detecting
or enumerating the I/O controllers and devices attached to the PCI
bus 525, to configure the attached I/O controllers and devices, to
inform the I/O manager 507 when controllers or devices have been
added or removed, and for handling power management commands issued
from the operating system to power manage the PCI controllers and
devices (if power management is supported by those devices).
[0036] However, the PCI bus driver 520 is not aware of the
different features and capabilities of the different I/O
controllers. Therefore, operating system 500 includes an I/O
controller driver (or function driver) for each I/O controller,
including a S1 I/O controller driver 505 (for storage device S1
connected to I/O controller 110), an S2 I/O controller driver 510
(for storage device S2 connected to controller 112) and a LAN
driver 515, as examples. Each I/O controller driver is provided for
translating I/O requests (e.g., reads and writes to the I/O device)
from a common or standard set of primitives and syntax used buy the
host operating system to the primitives and syntax used by each I/O
controller (e.g., after the I/O bus driver is used to identify and
configure the I/O controller). Thus, an I/O controller driver is
provided to handle reads and writes to the I/O devices connected to
an I/O controller. There may typically be a different type of I/O
controller driver for each type of I/O controller.
[0037] As noted above, a limitation with this current operating
system and software driver stack is that all I/O controllers are
assumed to be attached to a local I/O bus. The host can only detect
and address I/O devices that are attached to the local I/O bus. In
other words, even if one provides one or more fabric-attached I/O
controllers, current operating systems do not allow a host to
detect the presence or directly communicate with such
fabric-attached I/O controllers and devices because all I/O
controllers are presumed to be attached to a local I/O bus of the
host, and current operating systems also do not support direct
communication with a remote (or fabric-attached) I/O
controller.
[0038] FIG. 6 is a block diagram illustrating a software driver
stack for a computer having fabric-attached I/O resources according
to an example embodiment of the present invention. Referring to
FIG. 6, the host operating system 600 includes a kernel 504, an I/O
manager 507, and a plurality of I/O controller drivers for
interfacing to various I/O controllers, including I/O controller
drivers 605 and 610. These components are the same as or similar to
the currently available operating system illustrated in FIG. 5.
According to an example embodiment, the host operating system 600
is Windows 2000, and the I/O manager 507 is a Plug-n-Play
manager.
[0039] In addition, according to an embodiment of the invention, a
fabric bus driver 620 (or pseudo bus driver) is provided for the
cluster interconnection fabric 202. A traditional bus driver
translates between the I/O bus specific primitives and syntax to a
standard set of primitives and syntax used by the operating system.
Likewise, the fabric bus driver 620 accepts the same set of
standard operating system commands or primitives and syntax and
translates them into fabric specific primitives, syntax and format,
etc. The fabric bus driver 620 also provides the same set of
services to the operating system as provided by other I/O bus
drivers, and communicates with the kernel 504 or I/O manager 507
through the same common or standard set of primitives and syntax
used by the operating system. Therefore, the fabric bus driver 620
abstracts or generalizes or presents the fabric 202 to the
operating system 600 as a locally attached I/O bus, even though the
cluster interconnection fabric 202 is not a bus and is not local.
Thus, it can be said that the fabric bus driver 620 provides a bus
abstraction to the operating system 600 (or to the I/O manager 507)
for the cluster interconnection fabric 202. Thus, the fabric bus
driver 620 may be thought of as a bus abstraction component.
[0040] A device driver 625 is provided for the host-fabric adapter
325. The device driver 625 controls the host-fabric adapter 325
(which is usually a card or hardware). The fabric bus driver 620
uses the communication services provided by the device driver 625
for the host-fabric adapter 325 to send and receive commands and
information over the fabric 202. Thus, the host processor 302 can
issue I/O requests (e.g., I/O reads, writes) to fabric-attached I/O
controllers through the fabric bus driver 620, the device driver
625 and the host-fabric adapter 325. The host-fabric adapter 325
translates I/O requests between a host compatible format and a
fabric compatible format. In this manner, the host processor 302
can communicate with fabric attached I/O controllers. The host
processor 302 and the host operating system 600 do not have to be
aware that the fabric-attached I/O controllers are not attached to
a local I/O bus since the I/O bus abstraction and the host/fabric
translations are transparent to the processor 302 and operating
system 600.
[0041] Although the fabric 202 is not a "bus" in the traditional
sense, it is also advantageous for the fabric bus driver 620 to
provide a bus abstraction to the I/O manager 507 so that the fabric
attached I/O controllers can participate in the overall Plug-n-Play
procedures (e.g., dynamic addition and removal of I/O controllers)
and power management functions implemented by the host operating
system. In order to provide the bus abstraction, the fabric bus
driver 620 (like the other bus drivers) communicates to the kernel
504 or I/O manager 507 using the standard set of primitives and
syntax used and expected by the kernel and I/O manager 507 of
operating system 600. The fabric bus driver 620 provides a standard
set of services to the operating system 600 provided by bus
drivers. Thus, the fabric bus driver 620 presents the cluster
interconnection fabric 202 to the I/O manager 507 as a local I/O
bus, and presents one or more fabric-attached I/O controllers as
local (or bus-based) I/O controllers. In this manner, the operating
system does not have to be aware that the I/O resource (or fabric)
behind the fabric bus driver 620 is not a local bus, but rather is
a cluster interconnection fabric 202 including one or more remote
fabric attached I/O controllers. The existence of the cluster
interconnection fabric 202 and the remote location of the fabric
202 and fabric-attached I/O controllers are preferably transparent
to the host operating system 600. Thus, according to an embodiment,
no changes are necessary in the kernel 504 or I/O manager 507 to
allow the host 210 to identify and communicate with fabric-attached
I/O controllers.
[0042] The operating system 600 (or the I/O manager 507 of OS 600)
uses a standard set of primitives and syntax to query each I/O bus
driver to identify the I/O controllers attached to the bus. In the
same fashion, using these same standard set of primitives and
syntax, the I/O manager 507 can query the fabric bus driver 620 to
identify the fabric-attached I/O controllers that are assigned to
the host (as if the fabric bus driver 620 was just another I/O bus
driver). Many I/O controllers may be attached to the fabric 202.
However, according to an embodiment of the invention, the
fabric-attached I/O controllers (and their I/O devices) may be
allocated or assigned to different hosts. According to one
embodiment, each I/O controller can be assigned or allocated to
only one host. Alternatively, an I/O controller can be assigned to
multiple hosts (or shared among hosts). The fabric bus driver 620
then identifies the fabric-attached I/O controllers that are
assigned to the host and reports this list of I/O controllers to
the I/O manager 507 using the standard set of primitives and syntax
used by other local I/O bus drivers to communicate with the I/O
manager 507. Thus, the fabric bus driver 620 presents the fabric
202 as a local I/O bus to the operating system 600 (or to I/O
manager 507) and presents the fabric-attached I/O controllers as
local I/O controllers.
[0043] The fabric bus driver 620 can identify the list of
fabric-attached I/O controllers assigned to the host in many
different ways. A list of I/O controllers assigned to the host may
be locally stored and accessed by the fabric bus driver 620, or the
fabric bus driver 620 may query an external database or other host
attached to the fabric 202 to obtain the list of I/O controllers
assigned to this particular host, as examples.
[0044] FIG. 7 is a flow chart identifying the steps performed by a
host during host initialization according to an example embodiment
of the present invention. At block 705, kernel 504 is loaded into
main memory (e.g., at power-on with the assistance of the execution
of the Basic Input Output System or BIOS). The Kernel 504 then
executes to perform several other tasks or functions for
initialization. At block 710, the I/O manager 507 is loaded into
main memory and executed.
[0045] At block 715, each of the I/O bus drivers are loaded into
main memory and executed. These "I/O bus drivers" loaded into main
memory include the I/O bus drivers for the local I/O buses (e.g., a
PCI bus driver) and the fabric bus driver 620. As noted above, the
fabric bus driver 620 is presented to the operating system 600 as a
local I/O bus driver.
[0046] At block 720, the I/O manager 507 queries (or requests) each
bus driver (including the fabric bus driver 620) to identify any
connected I/O controllers.
[0047] At block 725, each "bus driver" identifies each connected
I/O controller. The local I/O bus drivers identify each I/O
controller connected to the corresponding local I/O buses.
Similarly, the fabric bus driver 620 identifies each
fabric-attached I/O controller which is assigned to the host.
[0048] At block 730, each I/O bus driver (including the fabric bus
driver 620) reports to the I/O manager 507 a list of the connected
I/O controllers as requested.
[0049] At block 735, the I/O manager 507 loads an I/O controller
driver (specific to each type of I/O controller) into main memory
for each type of I/O controller reported to the I/O manager 507.
This allows the processor 302 to communicate with each reported I/O
controller to issue I/O requests (e.g., reads and writes) to one or
more I/O devices connected to the I/O controller, etc. According to
an embodiment, where there are several I/O controllers of one type,
an instance of the corresponding I/O controller driver may be
loaded into main memory for each instance of the I/O
controller.
[0050] Assigning I/O Controllers to Hosts
[0051] A cluster includes one or more hosts and one or more I/O
units (each including one or more I/O controllers) connected
together by a common cluster interconnection fabric. According to
an embodiment of the present invention, the fabric-attached I/O
units and fabric-attached I/O controllers are visible or detectable
to all hosts (or to at least multiple hosts) in the cluster that
are in the same cluster membership (i.e., which are a part of the
same cluster). Also, according to an embodiment, the cluster
interconnection fabric appears as a large I/O "bus" that runs or
extends through all (or at least multiple) hosts in the cluster.
This differs from the typical host model in which I/O controllers
are physically attached to a single host through one or more local
I/O buses and are visible or detectable only to that single host
(i.e., a traditional I/O bus does not span multiple hosts). With
such an arrangement, a mechanism is needed by which hosts can
determine which I/O controllers they are allowed to access so that
all do not attempt to use all I/O controllers that are visible or
detectable to them. An ownership conflict or data conflict could
arise if two hosts are using (i.e., reading and writing to) the
same I/O controller, but are unaware that another host is using the
I/O controller.
[0052] FIG. 8 is a block diagram of a network including a fabric
manager according to an example embodiment or the present
invention. The network includes an example cluster, including a
host 210 and an I/O unit 2 which includes connected I/O controllers
232 and 242. The host 210 and the I/O unit are coupled via a
cluster interconnection fabric 202. Although a cluster can include
multiple hosts and multiple I/O units, only one host (host 210) and
one I/O unit (I/O unit 230) are illustrated in this example
cluster.
[0053] The network in FIG. 8 also includes a fabric manager 805 for
managing the cluster interconnection fabric and a human
administrator 802 which may control certain aspects of the network
using a computer 803. The fabric manager 810 includes a fabric
services 815 for performing several administrative functions or
services for the fabric or network, and an I/O controller manager
820 for controlling the fabric-attached I/O controllers in the
network.
[0054] Fabric services 815 (of fabric manager 810) is responsible
for detecting I/O units attached to the fabric 202 and then
assigning them a network address (such as a Medium Access Control
or MAC address). According to an embodiment, each different port of
an I/O unit is assigned a unique MAC address, and each I/O
controller being connected to a different port of the I/O unit.
Thus, under such an embodiment, the hosts can address the I/O
controllers using the MAC address of their connected I/O unit and
the port number identifying the port on I/O unit where the I/O
controller is attached (for example). According to another
embodiment of the invention, the fabric services 815 can assign a
different network address or MAC address to each I/O controller.
Fabric services also initializes the I/O unit, which may include
transitioning the ports to an active state.
[0055] Once the I/O unit has been initialized and has been assigned
a MAC address, the I/O controller manager 820 enumerates or
identifies the I/O controllers connected to the I/O unit. The I/O
controller manager 820 can identify the I/O controllers attached to
each I/O unit by sending a query (or query message) to each I/O
unit (or to the new I/O unit), and with each I/O unit responding
with a list of I/O controllers identifying the type of controller,
the port number which the controller is attached, etc. The response
from the I/O unit to the I/O controller manager 820 may also
include an identification and type of each I/O device connected to
each I/O controller.
[0056] After enumerating or identifying the I/O controllers for
each I/O unit, the I/O controller manager 820 then decides which
hosts are allowed to access each of the fabric-attached I/O
controllers (e.g., assigns I/O controllers to hosts). This decision
can be made with input from one or more sources. For example, a
database that stores persistent associations between controllers
and hosts can be used, a human administrator assigning I/O
controllers to hosts or by using some other mechanism. A human
administrator 802 making the I/O controller assignment decisions
can input the decisions on a computer 803 which may be attached to
the fabric 202 (either within the cluster or in another cluster),
or may be remotely located from the network.
[0057] After the I/O controller assignment decision is made for
each I/O controller (e.g., after the I/O controllers have been
assigned to hosts), the I/O controller manager 820 sends a message
to the affected hosts (i.e., sends a message to each host that will
have access to a fabric-attached I/O controller). If sharing is
desired, the I/O controller manager 820 reports the I/O controller
to more than one host, and may indicate that the I/O controller is
shared among multiple hosts. If sharing is not desired for an I/O
controller, the I/O controller manager 820 reports the I/O
controller to a single host attached to the fabric 202 and may
indicate that the access to the I/O controller is exclusive.
[0058] FIG. 9 is a flow chart illustrating the steps performed when
an I/O controller is connected or attached (i.e., plugged in) to
the cluster interconnection fabric. The steps 1-6 described in the
flow chart of FIG. 9 are also illustrated by arrows in FIG. 8.
Steps 2, 3 and 5 include sending messages over the fabric 202. As a
result, the arrows in FIG. 9 for these steps are therefore shown as
passing through the fabric 202.
[0059] At step 1 of FIG. 9, an I/O unit (including one or more
connected I/O controllers ) are connected or attached to the
cluster interconnection fabric 202.
[0060] At step 2 of FIG. 9, fabric services 815 detects the new I/O
unit attached to the fabric 202. This can be done in several
different ways, including using a fabric trap mechanism or during
the next sweep of the fabric 202 by the fabric services 815, as
examples. In an example trap mechanism, the switch connected to a
new I/O unit sends a message to the fabric manager 810 to indicate
that a change has occurred (e.g., the I/O unit has been added to
the fabric). A message would also be sent from an existing I/O unit
to fabric manager 810 when an I/O controller is added or removed.
The fabric manager 810 may then send a message to the I/O unit to
enumerate or identify the I/O controllers connected to the new I/O
unit (including the new I/O controller). The fabric manager 810 may
periodically sweep the fabric by sending out messages to all I/O
units and hosts connected to the fabric requesting that each I/O
unit or host reply with a message indicating their presence or
connection to the fabric. The fabric manager could then send
messages to enumerate the I/O controllers connected to each I/O
unit.
[0061] After detecting the new I/O unit attached to the fabric,
fabric services 815 then assigns a MAC or network address to the
new I/O unit and initializes the I/O unit. Initialization may
include setting ports of the I/O unit to an active state. According
to an embodiment, a MAC or network address is assigned to each I/O
unit port. One or more I/O controllers can be reached through an
I/O unit port.
[0062] At step 3 of FIG. 9, the I/O controller manager 820
enumerates or identifies the I/O controllers connected to the I/O
unit or contained in the I/O unit. A standard technique can be used
to identify the I/O controllers connected to each I/O unit. For
example, the I/O controller manager 820 may query each I/O unit to
obtain the I/O unit profile identifying each I/O controller
connected to the I/O unit.
[0063] At step 4 of FIG. 9, the new I/O controller is assigned to
one or more hosts in the cluster (i.e., to one or more hosts
connected to the cluster interconnection fabric 202). According to
an embodiment of the invention, the I/O controller manager 820
assigns controllers to one or more hosts by: a) looking up the I/O
controllers or I/O units in a database lookup table to identify or
match the corresponding host(s); or, b) consulting a default policy
for I/O controller assignment; or c) consulting a human
administrator 802. Assigning I/O controllers to hosts using a
database lookup method allows an administrator to store persistent
(e.g., permanent or semi-permanent) mappings or correspondence
between I/O units or I/O controllers and hosts and allows this I/O
controller assignment to occur quickly and automatically without
human intervention. If a human administrator is consulted, the
human administrator may be located within the network or cluster,
or may be remotely located from the cluster or cluster
interconnection fabric 202 (i.e., not directly connected to the
cluster interconnection fabric 202).
[0064] At step 5 of FIG. 9, the I/O controller manager 820 sends a
message to the host-fabric adapter 325 of the host which the new
I/O controller has been assigned. This message to the host informs
the host of the presence of the new I/O controller, and will
(explicitly or implicitly) provide authorization for the host to
access the new I/O controller. In the event that an I/O controller
has been removed from the cluster or fabric or has been reassigned
to another host, a similar message will be sent to the host (the
previous owner) indicating that the I/O controller is not available
or is no longer assigned to the host. This allows the administrator
802 and/or the I/O controller manager 820 to dynamically add,
remove or reassign I/O controllers in the cluster and quickly
inform the affected hosts of this change in the assignment or
ownership of the I/O controllers.
[0065] According to an embodiment, the host-fabric adapter 325 has
its own unique network address or MAC address. Thus, the VO
controller manager 820 (or the fabric manager 810) can thus send
messages to a host that is connected to the fabric 202 by sending a
message or packet to the address of the host's host-fabric adapter
325. The device driver (or control software) 625 for the
host-fabric adapter 325 at the host receives the message from the
I/O controller manager 820. The message from the I/O controller
manager 820 reports the new I/O controller to the host (e.g.,
implicitly assigning the new controller to the host), and includes
enough information to allow the host to use or access the I/O
controller and the connected I/O devices.
[0066] For example, the message from the I/O controller manager 820
will typically provide the network address or MAC address of the
I/O unit, or the network address of the IOC-fabric adapter for the
I/O unit. Each IOC-fabric adapter will typically include a unique
network or MAC address (e.g., the network address of the IOC-fabric
adapter 230 for I/O unit 2). This network address, in effect, is
the address of the I/O unit, and provides the address where the I/O
controllers and I/O devices for the I/O unit can be accessed
(according to one embodiment of the invention). As an example
message or report from the I/O controller manager 820 to host 210,
the message could include information that identifies the address
of the I/O unit, and information identifying the I/O controllers
connected to the I/O unit which may be accessed or used by the host
210. This information may, for example, indicate that there are two
I/O controllers connected to the I/O unit (including I/O
controllers 242 and 232) and identify the type of each controller.
The message will also typically identify whether the 1/O controller
is exclusively assigned to (or owned by) host 210, or whether the
I/O controller is shared among multiple hosts. Using this
identification information, the host 210 will be able to access the
new fabric-attached I/O controller (e.g., read or write to the I/O
controllers and I/O devices).
[0067] At step 6 of FIG. 9, the device driver 625 on the host
causes the host operating system to load the I/O controller
specific driver 605 for the new I/O controller into main memory of
the host 210.
[0068] I/O controller manager 820 is shown in FIG. 8 as being a
part of the fabric manager 810. Alternatively, if centralized
administration is suitable or desirable for the cluster, the I/O
controller manager 820 can be implemented as a software agent
running on one of the hosts in the cluster. If centralized
administration is not desirable (or if a distributed administration
is desired), the I/O controller manager 820 can be provided as a
distributed set of software agents residing on several hosts in the
cluster or attached to the cluster interconnection fabric 202.
Under either arrangement, not all hosts are required to participate
in the administrative function of assigning fabric-attached I/O
controllers to hosts.
[0069] Therefore, according to an embodiment of the invention, a
centralized technique is provided for assigning I/O controllers to
hosts in a cluster. The present invention allows I/O controllers to
be dynamically added, removed or reassigned to a new host. If a new
I/O controller is connected or attached to the cluster
interconnection fabric, the I/O controller manager 820 assigns the
new I/O controller to one or more hosts. This assignment decision
can be made with input from one or more sources including a
database or lookup table, based upon input from a human
administrator or using some other mechanism. The I/O controller 820
then sends a message to each affected host informing the host of
the new I/O controller that has been assigned to it. In the event
that an I/O controller has been removed or reassigned, the I/O
controller will send messages to the affected host or hosts
informing them that the I/O controller is no longer assigned to the
host (or is unavailable).
[0070] An advantage of the centralized technique of the present
invention is that the technique or mechanism does not require the
involvement of all hosts in the cluster or network to decide
controller assignment. This technique also saves network bandwidth
if there are a large number of hosts in the cluster and many I/O
controllers being added or removed to the fabric because a central
agent (e.g., the I/O controller manager 820) assigns or reassigns
the I/O controllers. In addition, network bandwidth is also saved
because messages are sent regarding the new controller assignments
only to the affected hosts. A more distributed approach which that
requires all (or many) hosts in the network to communicate with
each other to negotiate controller assignments may use a
non-negligible percentage of the available cluster or network
bandwidth. The centralized controller assignment of the present
invention reduces the complexity of the software or code necessary
to implement the assignment process. Much more complicated
algorithms would be necessary if all or many hosts were required to
negotiate the I/O controller assignments. Detection and recovery
from failures of individual hosts is also simpler using a
centralized administrative agent. In addition, the centralized
technique of the present invention is faster than the use of a
distributed mechanism, and also allows the administrator or
administrative agent (such as the I/O controller manager 820) to be
located or hosted on a separate system or network. Thus, one
administrative agent can remotely manage multiple clusters at the
same time.
[0071] Several embodiments of the present invention are
specifically illustrated and/or described herein. However, it will
be appreciated that modifications and variations of the present
invention are covered by the above teachings and within the purview
of the appended claims without departing from the spirit and
intended scope of the invention. For example, while the present
invention has been described with reference to a network, the
various aspects of the present invention are applicable to a wide
variety of networks, including system area networks, storage area
networks, Local Area Networks (LANs), Wide Area Networks (WANs),
the Internet, etc.
* * * * *