U.S. patent application number 11/847568 was filed with the patent office on 2009-03-05 for arrangements for auto-merging processing components.
This patent application is currently assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION. Invention is credited to Marcus A. Baker, Marlene J. Gillingham, Mark V. Kapoor, Sheldon J. Sigrist, Karen A. Taylor, Steven J. Zaharias.
Application Number | 20090063662 11/847568 |
Document ID | / |
Family ID | 40409221 |
Filed Date | 2009-03-05 |
United States Patent
Application |
20090063662 |
Kind Code |
A1 |
Baker; Marcus A. ; et
al. |
March 5, 2009 |
Arrangements for Auto-Merging Processing Components
Abstract
In some embodiments a method for auto-configuring a network is
disclosed. The method can include communicating with at least one
node in a processing complex, receiving node connection data from
the at least one node querying a node to verify at least a portion
of the node connection data and auto-configuring system
partitioning in response to the node connection data. In some
embodiments the method can also include verifying the node
connection data by transmitting a request for a universally unique
identifier and a node identifier. The node identifier can be
associated with the universally unique identifier. The node
identifier can be utilized in data transmitted between nodes.
Inventors: |
Baker; Marcus A.; (Apex,
NC) ; Gillingham; Marlene J.; (Bellevue, WA) ;
Kapoor; Mark V.; (Durham, NC) ; Sigrist; Sheldon
J.; (Cary, NC) ; Taylor; Karen A.; (Cary,
NC) ; Zaharias; Steven J.; (Issaquah, WA) |
Correspondence
Address: |
IBM COPORATION (RTP);C/O SCHUBERT OSTERRIEDER & NICKELSON PLLC
6013 CANNON MOUNTAIN DRIVE, S14
AUSTIN
TX
78749
US
|
Assignee: |
INTERNATIONAL BUSINESS MACHINES
CORPORATION
Armonk
NY
|
Family ID: |
40409221 |
Appl. No.: |
11/847568 |
Filed: |
August 30, 2007 |
Current U.S.
Class: |
709/220 |
Current CPC
Class: |
H04L 41/0886 20130101;
H04L 41/0806 20130101; H04L 41/0866 20130101 |
Class at
Publication: |
709/220 |
International
Class: |
G06F 15/177 20060101
G06F015/177 |
Claims
1. A method for configuring a network comprising: communicating
with at least one node in a processing complex; receiving node
connection data from the at least one node; querying another node
to verify at least a portion of the node connection data; and
auto-configuring a scalable processing in response to the node
connection data.
2. The method of claim 1, further comprising verifying the node
connection data by transmitting a request and receiving a
universally unique identifier from the another node.
3. The method of claim 2, further comprising associating a node
identifier to the universally unique identifier.
4. The method of claim 3, further comprising transmitting data
utilizing the node identifier.
5. The method of claim 1, further comprising flagging node
information associated with unverifiable node connection data.
6. The method of claim 1, wherein auto-configuring comprises
reconfiguring a basic input output system (BIOS) system.
7. The method of claim 1, further comprising determining a change
in a connection and revising the node connection data.
8. The method of claim 1, wherein the node connection data
comprises port connection data.
9. The method of claim 1, further comprising storing node
connection data at each node.
10. The method of claim 1, further comprising querying ports that
are indirectly connected to the at least one node.
11. The method of claim 1, further comprising sorting universal
unique identifiers into a hierarchal order.
12. An apparatus comprising: a first processing component; a first
communication port coupled to the first processing component; and a
controller coupled to the first communication port, the controller
to discover connection data of other controllers and to utilize the
connection data to set up a communication structure with multiple
controllers where the communication supports different operating
systems.
13. The apparatus of claim 12, wherein the controller is a
baseboard management controller.
14. The apparatus of claim 12, further comprising a sorter to sort
the connection data.
15. The apparatus of claim 12, further comprising a compare module
to compare the self discover connection data with the existing
connection data.
16. A machine-accessible medium containing instructions to
configure a processing system which, when the instructions are
executed by a machine, cause said machine to perform operations,
comprising: communicating with at least one node in a processing
complex; receiving node connection data from the at least one node;
querying a node to verify at least a portion of the node connection
data; and configuring system partitioning in response to the node
connection data.
17. The machine-accessible medium of claim 16, wherein the
operations further comprise: verifying the node connection data by
transmitting a request for a universally unique identifier.
18. The machine-accessible medium of claim 16, wherein the
operations further comprise: associating a node identification
number to the universally unique identifier.
19. The machine-accessible medium of claim 16, wherein the
operations further comprise: transmitting data utilizing the node
identifier.
20. The machine-accessible medium of claim 16, wherein the
operations further comprise: flagging node information that is
unverifiable.
Description
FIELD
[0001] The present disclosure relates generally to computing
systems and more particularly to auto-configuring a scalable
computing system.
BACKGROUND
[0002] As computing needs for organizations have increased, and as
organizations plan for growth, one common way to plan for, and
obtain economical computing is to purchase computing systems that
are scalable. A system or architecture is scalable when it can be
upgraded or increased in size or reconfigured to accommodate
changing conditions. For example, a company that plans to set up a
client/server network may want to have a system that not only works
with the number of people who will immediately use the system, can
be easily and economically expanded to accommodate the number of
employees who may be using the system in one year, five years, or
ten years. In another example, a company that runs a server farm
and hosts web pages or applications via the Internet may continue
to grow, and this company would desire a scalable system where they
can economically add servers as needed to accommodate growth and
can re-partition as needed.
[0003] Accordingly a scalable system can typically merge or
integrate a number of scalable servers or chassis having one or
more processors to create a "larger" unitary system having
processing nodes. Thus, a collection of scalable servers can
function like a single larger server when properly merged. Although
multiple servers are merged they can also be partitioned using
hardware partitioning. A system with a single partition can run a
single instance of an operating system (OS) and all the nodes of
the system are thus conceptually combined. Thus, in effect the user
will experience a single, more powerful computing system
functioning as one "scaled up" node, instead of a number of less
powerful nodes running independently.
[0004] A traditional approach to combining multiple nodes of a
system into a single-partition merged system running a single
instance of an OS is to have a trained technician manually
integrate and configure each node as a system is built or as
computing resources (nodes) are added. Traditionally, a trained
technician or administrator must configure each node with the
proper partition configuration information, specifying one of the
nodes as the primary, or boot node, and the other nodes as
secondary nodes to the primary node. This approach is cumbersome,
and requires trained technicians to build and configure such a
system. When there are more than a few nodes to manually configure,
configuring can get complex and such configuring is prone to
connection and configuration errors and omissions.
[0005] Another approach is to have dedicated hardware that is
responsible for configuring the nodes as a single-partition merged
system running a single instance of an OS. In this approach an
administrator can interact with the dedicated hardware, which may
be, for example, a dedicated management console. The dedicated
hardware can be responsible for ensuring that the nodes operate as
a single-partition merged system. It can be appreciated that this
approach requires costly dedicated hardware, and may require
modification to preexisting systems that do not allow for the
addition of such functionality.
[0006] Power up for a scalable system also can create difficulties.
One approach to address this issue is to have a "luck-of-the-draw"
or timing-based approach programmed into the nodes of the system.
When a node boots up, it can determines whether a single-partition
merged system is already running, and if so, join the system. If
the node does not find a preexisting system to join, it starts one,
and becomes the primary node for the new system. The node thus
becomes the primary node due to timing issues and the luck of the
draw. Such an approach, however, can be complex, and does not
provide the administrator with control over which node becomes the
primary node. Generally, systems in a scalable environment don't
automatically know that they are cabled together and can work as
one system. These scalable systems have to be told (i.e. configured
by technician) such that they know that they are cabled to other
nodes and must be configured regarding how they can communicate
with other nodes. There are many current designs available that
utilize this manual configuration approach. One design uses a
network such as an Ethernet connection between nodes and utilizes a
Remote Supervisor Adapter (RSA) to facilitate set up of the
system.
[0007] The RSAs can communicate with each other on Ethernet
(embedded service on each RSA) and can instruct the scalable
components to work together with a set partitioning. This system,
among other things, requires a user to input the Internet Protocol
(IP) addresses of each RSA in the RSA interface before the scalable
systems can work as a single entity. This process can be cumbersome
for a user to discover and enter the RSA IP address for each
component. This IP detection process can include booting each
scalable component and after the component is connected to the
network the user can request and find the IP address in the BIOS
menu of the component. Another traditional arrangement uses a
Service Location Protocol (SLP) discovery routine to detect all
scalable components via the system's RSA Ethernet network
connections. Then the arrangement can iterate through the list of
SLP scalable systems and an application can send a message (ping)
through each scalable system port and detect received messages on
another scalable system port. Each scalable system relies on RSA
Ethernet protocol to initiate and detect how other scalable systems
interconnect. In the end, all scalable connections are determined
for all SLP scalable systems.
[0008] This Ethernet based arrangement does not get comprehensive
system information and such a detection depends on
intercommunication of scalable components via RSA Ethernet. This
solution uses SLP to discover the communication mechanism, which
can find a large number of systems. Only the number of RSAs
connected to that network limits the number of discovered systems.
Often not all of these detected systems can be operate as a single
identity. This can cause extra time filtering through the systems
not capable of scalability.
[0009] Another approach is to connect the servers and configure
virtual building blocks. These building blocks can be broken down
to any level, but this is only supported in a Linux environment.
Traditional systems require intensive user configuring the system
utilizing a remote service administrator (RSA). Traditional systems
utilize a relatively complex set-up. Such as set up can also
require extensive hardware and software overhead. For example, such
a system can require an Ethernet communication system to
communicate set up commands to all of the subsystems. Further, it
is expensive to require a trained technician be present at every
installation or every system expansion.
SUMMARY OF THE INVENTION
[0010] The problems identified above are in large part addressed by
the systems, arrangements, methods and media disclosed herein to
provide auto-configuring of a multi-node processing system. In some
embodiments the method can include communicating with at least one
node in a processing complex, receiving node connection data from
the at least one node querying a node to verify at least a portion
of the node connection data and auto-configuring system
partitioning in response to the node connection data. In some
embodiments the method can also include verifying the node
connection data by transmitting a request for a universally unique
identifier and a node identifier. The node identifier can be
associated with the universally unique identifier. The node
identifier can be utilized in data transmitted between nodes.
[0011] In some embodiments an apparatus is disclosed that has a
first processing component, a first communication port coupled to
the first processing component and a baseboard management
controller coupled to the first communication port. The baseboard
management controller can query other baseboard management
controllers regarding existing connection data and query other
communication ports to self discover connection data. The apparatus
can also include a sorter to sort the connection data such that all
apparatus that make up a system can sort a table in a similar way.
The system can also include a compare module to compare the self
discover connection data with the existing connection data.
[0012] In yet another embodiment a computer program product is
disclosed that has a computer useable medium with a computer
readable program. The computer readable program when executed on
computer causes the computer to communicate with at least one node
in a processing complex, receive node connection data from the at
least one node query a node to verify at least a portion of the
node connection data, and configure system partitioning in response
to the node connection data.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] Aspects of the invention will become apparent upon reading
the following detailed description and upon reference to the
accompanying drawings in which, like references may indicate
similar elements:
[0014] FIG. 1 is a block diagram of an auto-configuring merged
system having a number of nodes;
[0015] FIG. 2 is a table representing how nodes can be identified
and connected; and
[0016] FIG. 3 is a flow diagram of a method of auto-configuring a
system.
DETAILED DESCRIPTION
[0017] The following is a detailed description of embodiments of
the disclosure depicted in the accompanying drawings. The
embodiments are in such detail as to clearly communicate the
disclosure. However, the amount of detail offered is not intended
to limit the anticipated variations of embodiments; on the
contrary, the intention is to cover all modifications, equivalents,
and alternatives falling within the spirit and scope of the present
disclosure as defined by the appended claims.
[0018] Disclosed herein is an auto-configuring scalable processing
system having nodes that can are auto-merged to form a larger
system that operates as a single entity. The merging of multiple
nodes into a single-partition (unitary) system can be accomplished
via a baseboard management controller (BMC) at each node. In some
embodiments the resources or merged nodes of the processing system
or complex can be partitioned to run a single instance of an
operating system (OS), and in other embodiments the complex can be
partitioned to run multiple instances of an operating system.
[0019] Referring to FIG. 1 an auto-merging processing system 100 is
disclosed. In some embodiments the components of the system 100 can
be cabled together and connected to a power source, and the system
100 can seamlessly integrate itself into a single partition system
without user intervention. In some embodiments additional computing
resources can be connected when the system is operating, and the
system can integrate or merge a newly/hot connected node "on the
fly." In this auto-merging/auto-configuring process the system 100
can set up as a single-partition system that runs a single instance
of an operating system (OS), (hence referred to as a single
partition). The system 100 can include many nodes, such as nodes
104A, 104B 104C and 104D (referred to herein collectively as nodes
104) as where each node can be a processing resource such as a
scalable server. However, this disclosure is not limited to any
fixed number of nodes or any type of processing resource. Thus
nodes 104 can represent any resource that can be scaled.
[0020] As can be appreciated, the nodes 104 may include other
components in addition to and/or in lieu of those depicted.
Furthermore, nodes 104 are meant as representative of one type of
node in conjunction with which embodiments can be implemented.
Embodiments of the system are also amenable to implementation in
conjunction with other types of nodes, as can be appreciated.
[0021] The nodes 104 can be coupled to one another via an
interconnect system 116. The interconnect system 116 can be
implemented via cabling where cables are plugged into "scalable
ports" in the chassis or circuit boards of the nodes 104. A node
104 can send and receive set up information and commands from other
nodes 104 via interconnect 116. The nodes 104 possibly servers may
physically be located within a single rack or the servers could be
distributed over different server racks at various locations within
the range of the interconnect system. Each node 104 can include a
basic input/output system (BIOS) 118A 118B, 118C and 118D
(collectively as BIOS 118) non-volatile random-access memory
(NVRAM) 108A, 108B, 108C, and 108D (collectively 108) and a
baseboard management controllers (BMC)s 126A 126B 126C and 126D
(collectively 126). The nodes 104 may also include components in
addition to and/or in lieu of those depicted. For example, the BMCs
126 can have a compare module to compare self discovered
connections to data in the table 124.
[0022] A system owner, user or system administrator can connect an
input/output device such as a remote service administrator (RSA)
102 to the nodes 104 to monitor configurations of the nodes 104 and
of the system and possibly configure the system 100 or nodes 104 if
needed. The RSA 102 can have a keyboard 103 and a display 104 and
can be connected to any node 104 or to a portion of the
interconnect 116. Via the RSA 102 the user can interact directly
with the BMCs 126 and interact with the BIOS 118 and BIOS
setup/start up instructions 118 of each node 104 if desired. The
BIOS 118 may be a set of instructions or routines that allows each
node 104 to load instructions that dictate basic operating
functions and the BIOS 116 routine can provide an interface between
the operating system (OS) and the hardware, and can control at
least some of the functionality each node 104 and the system
100.
[0023] For example, the BIOS 116 can control a power-on self test
(POST), a node interconnect detection system, and can read
partition configuration information from the BMC and reconfigure
itself in preparation for partition merging. The BIOS 116 can
control the above mentioned functionality and additional
functionalities and can operate at each mergable node 104. The
NVRAM 108 can retain the BIOS contents even when power is not
supplied to the node 104. The NVRAM 108 can also store the firmware
for the processors 122A, 122B 122C and 122D (collectively 122). The
processors 122 can be a single processor a series of processor or a
multi-core processor.
[0024] As stated above each BMCs 126 can act as facilitators of the
auto-merging/auto-configuring process and can interact with the
BIOS 116 of each node. In some embodiments the nodes 104 can be
connected by cables, including power cables and then, without user
intervention, the BMCs 126 can automatically discover what
connections exist and each BMC 126 can configure the operation of
their respective node 104 such that the system boots and operates
as a unitary/single partition system. The system can auto-configure
in a "similar" way to what is referred to as a "plug and play"
system in personal computer (PC) technology. Plug and play systems
configure newly connected devices with code that is part of an
operating system where the disclosed arrangement are not part of an
operating system but are performed on a software layer that
underlies the operating system.
[0025] Thus, the operation and actions of auto-merge system 100 is
virtually transparent to the operating system, as different
operating systems could operate on the system 100 without affecting
the system configurations settings such as the partitioning. The
BMCs 126 can provide an interface mechanism for automatic
recognition of connections for all nodes 204 for addressing each
node 104. This interface feature can provide the capability of
transmitting or receiving partition configuration data and for
dispersing the processing workload presented to the system. The
BMCs 126 can also control the turn on, or turn off of various
portions of the merged system 100, and the BMCs 126 can reset the
processors 122 within the merged system 100.
[0026] As part of the auto-configuration process, each BMC 126 can
perform an automated self-discovery for the presence of connected
nodes 104 and each port of each node 104. During this connection
discovery process the BMCs 126 can create a connection matrix or
connection table 124 representing the interconnection of the system
100. This self discovery process can commence when power is
connected to the nodes 104 even though the power switch is not
turned on. The BMCs 226 can conduct a "system self discovery" by
automatically mapping the interconnect configuration when power is
applied to each node 104 even through the power switch is in an off
state. Such a mapping configuration can be stored in the form of
interconnect (I/C) tables 124A, 124B, 124C and 124D (collectively
tables 124). The tables 124 can also have a sorter module such that
the connection data entered into each table can be sorted the same
according to a hierarchy. Further, after the mapping is complete
the BMCs 126 can configure the system 100 as a unitary or single
partitioned system according to a default setting. In other
embodiments, a default setting or a user configured setting can
create multiple partitions.
[0027] In some embodiments, the self configuring/self merging
system 100 does not require the user to input any values or provide
any configuration data, and the coupled nodes 104 can communicate
and exchange connection information and can configure a shared
processing configuration and an addressing scheme such that the
system 100 can operate as a homogenous system. It can be
appreciated that this sub-OS plug and play arrangement can operate
with many different operating systems as the self configuring set
up can run "under" or transparently to whatever operating system is
installed on the system 100. Thus, this sub-OS plug and play system
provides a user friendly, hands off solution to scalable system
management of nodes.
[0028] As stated above the system 100 can automatically configure
cabled nodes 204 where multiple nodes can create a system 100 that
appears to operate or work as a single system or single unit. The
system 100 can assign unique identifiers to each port of each node
such that each table 124 is identical. From the factory each
component (that is connected as a node) can have a universally
unique sixteen byte identifier. The BMC 126 can get the identifier
by querying the node or port and organize the table 124 in a
hierarchy according to the numeric, alpha or alpha numeric
magnitude of the identifier. Thus, all tables generated should have
the same components in the same order. The interconnect data can be
codified in a table format. Since a unique sixteen byte identifier
has more bits that are needed for addressing communications between
the small number of ports or nodes of the system 100, after the
table is generated each node can be assigned an index possibly a
four bit unique identifier where the index could start at "00"
counting up where the last node (component with the highest or
lowest factory assigned identifier) could have an "03" index where
four nodes were present in the system 100. In some embodiments
eight or more nodes can be accommodated.
[0029] This unique index can be utilized as an address and allow
all of the scalable sub-components or nodes 104 to communicate with
each other by using the index/address in a header of transmissions.
Each node 104 of the scalable systems can determine and generate a
table where each sub-component should, when connected and operating
properly, build or create the same table during the connection
discovery process. Thus, because the nodes 104 can order the
hardware in the same manner and assign consecutive numbers to the
ordered hardware identical tables 124 can be generated. When tables
124 are not substantially identical, the BMCs 126 can recognize
this and can regenerate the tables until all BMCs 126 generate an
"identical" table.
[0030] In some embodiments, when power is initially applied to a
node 104 (with the power switch off), the BMC 126 can perform a
software load from non-volatile memory and can conduct a self test
by executing the BMC code 126. After a successful self test, each
BMC 126 can initiate a self-configuration procedure or routine.
Each BMC 126 can query other BMCs and check for the presence of
descriptors of connections (populated tables) via scalable
interconnections with other nodes. The BMCs 126 can also monitor
communications and by reading addressing or header information can
detect connection descriptors. The descriptors can indicate that
other BMCs 126 in the system 100 are already cabled together and
communicating. If descriptors are detected during the
initialization communications, the BMCs 124 can begin a
self-configuration process to verify the integrity of the
descriptors. In some embodiments the BMC 124 can check to make sure
its nodes are connected to ports of other nodes and that the
descriptors match what is indicated by the detected
descriptors.
[0031] If a port is unreachable or no connection is found, the port
can be flagged as an un-operable or unconnected port. If there is a
mismatch between the complex descriptor and the system connected to
the local port, the system can proceed to conduct a
self-configuration as described below. When all BMCs 124 determine
that they have matching tables then system partition management may
begin. A complex self-configuration can be performed by each BMC
124 to create the complex descriptor when the system is initially
configured or when a connection is added or changed. Each node 104
can query all the nodes connected directly to its ports and can
update a local copy of the complex descriptor with the discovered
unique identifier.
[0032] In some embodiments each BMC 126 can perform a "one hop"
port query on ports that are "indirectly" connected to the BMC 126.
By querying beyond the directly coupled ports a BMC 126 can capture
remaining unique identifiers for any other nodes in the system 100
or complex. This hopping inquiry process can produce a complete
list of available systems for the complex where each node 104 is
identified by the unique identifier.
[0033] Each node 104 can sort the list of unique identifiers in the
complex descriptor. The index number into the sorted unique
identifier list can become a unique node number that the nodes use
to communicate with specific nodes 204 of the system 100. Each node
204 can search tables 124 for its unique identifier (index) and can
assign its own node number to the index of the entry. The node
number is sufficient for any node to communicate with any other
node in the interconnected complex. Partition IDs can then be
assigned to groups of nodes in the complex so that power control is
coordinated within the partition group, thus permitting the
partition to boot as a single OS. In some embodiments the BMCs can
assign a special value of (0.times.FF in hexadecimal) as a
partition ID such that a communication can be sent to any node in
the system 100. (This concept could use a little more support).
Each BMC 226 can verify the complex descriptor by sending the local
port connections to all the other BMCs 226. Such sharing of
information can ensure consistency in table data.
[0034] As stated above traditional scalable systems can require an
Ethernet connection at each node and can require a processing
subsystem to configure system operation and to perform basic
administrative functions. The disclosed embodiments with multiple
scaled servers can perform without a centralized administrative
system such as an RSA. Functions that formerly where performed by
trained technicians utilizing RSAs can be automated by the BMCs
226. For example node IDs do not need to be generated, entered into
the system and monitored by a trained technician at a RSA terminal.
The disclosed system can automatically generate node IDs and
provide continuous communication regarding the configuration of
neighboring scalable nodes or systems.
[0035] A processing complex (a conceptual whole made up of related
subsystems) can be considered as the set of scalable sub-systems
cabled together where multiple servers can be auto-configured and
run one or more operating systems. Alternately described, the
server complex shown can have one or more partitions or run as one
or more system. The system can utilize a "complex" descriptor to
administrate the auto-connection and auto-configuring of the
complex/system. The complex descriptor can be a data structure that
describes each node of the scalable systems capabilities and how
each node is connected.
[0036] In some embodiments the system can automatically partition
itself utilizing a single partition default setting. A partition
can be considered as a group of the scalable systems cabled
together and configured to run a single entity. For example, a
scaled system having multiple nodes can run a single operating
system.
[0037] Referring to FIG. 2, a table 200 of possible system
interconnections that can be generated by a BMC is illustrated.
Each component connected as a node that can be connected as a node
can have a factory assigned unique sixteen byte identifier. The
component can transmit this identifier in response to query's
transmitted from other BMC or components and all BMCs can store
these unique identifiers and sort them from high to low or low to
high. Accordingly each BMC can organize the table 200 in the same
order (i.e. from highest identifier value to lowest identifier
value or lowest identifier at the top of the table and the highest
valued identifiers at the bottom of the table).
[0038] This can be referred to as a sorting rating or ranking
process. Using this hierarchy each table created by each BMC can be
substantially identical and should have substantially identical
data. In the table 200 provided a four node interconnect is
illustrates where each node has two ports identified as P1 and
P2.
[0039] In the table 200 N's are placed to indicate that this
embodiment does not couple ports at the same node together although
this could be done. In the table 200 node A has been assigned a
unique identifier 00 node B 01 node C 10 and node D 11. Such
identifiers could be utilized for addressing of communications. In
addition the unique connections between ports have been assigned a
unique identifier. For example, the connection between node B port
2 and Node A port 2 has been assigned the unique identifier "07."
As stated above each BMC can access the tables stored by other BMC
to check specific connections or to compare every piece of data in
one table to every piece of data in another table. Many other table
configurations could be utilized without parting from the scope of
this disclosure.
[0040] Further, as additional nodes are added to the system, the
connections of the newly connected components or nodes can be
automatically detected by all BMCs and the BMCs can
configure/reconfigure the table 200 and the BIOS of all connected
nodes such that they can function as one system. Thus, each node
can automatically reconfigure, when a new node is added such that
the newly connected resource can automatically and seamlessly
integrate with existing operating resources. Scalability cables can
be hot plugged, so that servers can be introduced into the complex
and automatically detected while other systems are operational. The
disclosed arrangements can also allow scalable server components to
discover how to communicate with the other scalable systems without
the requirement for the functions traditionally provided by a
processing sub-system that configures according to user input.
[0041] As stated above, traditional systems utilize a relatively
complex and overhead intensive Ethernet connection to communicate
set up information. Further traditional systems require significant
user input or user assistance to configure the system and make it
operable. Generally, the disclosed scalable system is not limited
by any particular operating system because the auto-merging can be
performed transparently to the OS allowing the system to run many
different operating systems.
[0042] The disclosed system can avoid the need to "check" through
systems that cannot be scaled, such as those that appear through
SLP discovery. A monitor for the system is not required, however if
someone wants to view system settings a monitor can be connected to
the system. In addition, the systems and methods can free up
Ethernet connections for other purposes and uses because only one
node needs a terminal when someone wants to view how the system is
set up every node. The system does not need a smart terminal such
as a RSA and does not need an Ethernet connection to communicate
system configuration data.
[0043] Each scalable node can communicate with all the other
scalable nodes through designated ports and cabling to build and
populate the table. In some embodiments, the BMCs can provide
management functions for the nodes and create control settings
based on the contents of the table. To facilitate communication
between the scaled nodes the BMCs can create, assign and coordinate
a unique identifier to each node and each port as shown at the left
margin of the table. Such unique identifiers allow specific
commands to be sent to specific systems and different commands to
be sent to the different nodes. When a message is sent across the
cabling, the nodes in the scalable system can know the intended
recipient of the message by receiving the unique identifier and
index (aka node ID) using the table 200 and by comparing the
received identifiers to an identifier stored locally in the table
200.
[0044] Referring to FIG. 3, a flow diagram 300 of a method of
configuring a scalable processing system or complex is disclosed.
The method disclosed, as well as other methods of embodiments of
the disclosure, may be wholly or partially implemented in
conjunction with a computer-readable medium on an article of
manufacture. The computer-readable medium may be a recordable data
storage medium, a modulated carrier signal, or another type of
computer-readable medium. As illustrated by block 302, power can be
applied to the system (with the power switch off) and during this
interval, a baseboard management controller (BMC) can perform a
software load from non-volatile memory and can conduct a self test
and check to see if system descriptors are present in
communications or in tables stored at nodes. As stated above,
checking for connection data can be part of a
discovery/self-configuration procedure or routine.
[0045] The descriptors of the processing complex can describe if
other BMCs in the complex are already cabled together, partitioned
and communicating. If connection data is available then as
illustrated by block 304, the BMC can request and receive a
universally unique identifier a node number and a complex
descriptor from a connected node.
[0046] If there is information available (i.e. a new piece of
hardware is being connected to a functioning system) the BMC can
begin the self-configuration process. In some embodiments, the BMC
can check the connection configuration information to make sure
that the available information is accurate by checking the
connections described by the descriptors. Such a process can be
done by a field programmable gate array (FPGA). All tables can be
reorganized/reconfigured based on a new node, however if the
connection data is not accurate or complete as illustrated by block
307 the BMC can query local ports using the UUID in a first level
search as illustrated by block 308. Also as illustrated by block
307 it can be determined if all tables are matching.
[0047] Referring back to decision block 302, if connection
information is unavailable or a port cannot be verified, a port is
unreachable or no connection is found, the port can be flagged as
an un-operable or unconnected port. Accordingly, if there is a
mismatch between the complex descriptor and the BMC discovered
connections, the local port, the system can proceed to conduct a
self-configuration as illustrated by block 308. Thus, BMCs in the
complex can exchange table data and when BMCs have non- matching
table data BMCs can re-conduct the discovery process.
[0048] When all BMCs in the complex determine that they have
matching tables then the BMC can perform system partition
management. A complex partitioning self-configuration can be
performed by each BMC to create the complex descriptors when the
system is initially configured or when a connection is added or
changed. As illustrated by block 310 each BMC can query all the
BMCs connected directly to its ports and can update a local copy of
the complex descriptor with a unique identifier such as the
UUID.
[0049] Each BMC can then perform a "one hop deeper" remote unique
identifier query on each port and can capture the remaining unique
identifier for any other systems in the system or complex as
illustrated by block 310. This hopping inquiry process can produce
a complete list of components available to the complex where each
BMC or node can be identified by the unique identifier. As
illustrated by block 312 each BMC can sort the list of unique
identifiers in the complex descriptor. The index number into the
sorted unique identifier list can become a unique node number that
the BMCs utilize to communicate with specific nodes of the
complex.
[0050] Each BMC can search the list stored by other BMCs for its
own unique identifier and can assign its own node number to the
index of the entry. Thus entire into the table can be associated
with the BMC that made the entry. Once node numbers have been
self-assigned, BMCs are then capable of sending or receiving
messages from any other BMC in the scalable complex, regardless of
the setting of partition configuration information. In some
embodiments each BMC can verify the complex descriptors by sending
its local port connection table to all the other BMCs. Such sharing
of information can ensure consistency in table data. As illustrated
by block 314 node numbers can be assigned to each chassis and as
illustrated by block 316 all tables can be updated. The process can
end thereafter.
[0051] Reference to particular configurations of hardware and/or
software, those of skill in the art will realize that embodiments
of the present invention may advantageously be implemented with
other equivalent hardware and/or software systems. Aspects of the
disclosure described herein may be stored or distributed on
computer-readable media, including magnetic and optically readable
and removable computer disks, as well as distributed electronically
over the Internet or over other networks, including wireless
networks. Data structures and transmission of data (including
wireless transmission) particular to aspects of the disclosure are
also encompassed within the scope of the disclosure.
[0052] Each process disclosed herein can be implemented with a
software program. The software programs described herein may be
operated on any type of computer, such as personal computer,
server, etc. Any programs may be contained on a variety of
signal-bearing media. Illustrative signal-bearing media include,
but are not limited to: (i) information permanently stored on
non-writable storage media (e.g., read-only memory devices within a
computer such as CD-ROM disks readable by a CD-ROM drive); (ii)
alterable information stored on writable storage media (e.g.,
floppy disks within a diskette drive or hard-disk drive); and (iii)
information conveyed to a computer by a communications medium, such
as through a computer or telephone network, including wireless
communications. The latter embodiment specifically includes
information downloaded from the Internet, intranet or other
networks. Such signal-bearing media, when carrying
computer-readable instructions that direct the functions of the
present invention, represent embodiments of the present
disclosure.
[0053] The disclosed embodiments can take the form of an entirely
hardware embodiment, an entirely software embodiment or an
embodiment containing both hardware and software elements. In a
preferred embodiment, the invention is implemented in software,
which includes but is not limited to firmware, resident software,
microcode, etc. Furthermore, the invention can take the form of a
computer program product accessible from a computer-usable or
computer-readable medium providing program code for use by or in
connection with a computer or any instruction execution system. For
the purposes of this description, a computer-usable or computer
readable medium can be any apparatus that can contain, store,
communicate, propagate, or transport the program for use by or in
connection with the instruction execution system, apparatus, or
device.
[0054] The medium can be an electronic, magnetic, optical,
electromagnetic, infrared, or semiconductor system (or apparatus or
device) or a propagation medium. Examples of a computer-readable
medium include a semiconductor or solid state memory, magnetic
tape, a removable computer diskette, a random access memory (RAM),
a read-only memory (ROM), a rigid magnetic disk and an optical
disk. Current examples of optical disks include compact disk-read
only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD. A
data processing system suitable for storing and/or executing
program code can include at least one processor, logic, or a state
machine coupled directly or indirectly to memory elements through a
system bus. The memory elements can include local memory employed
during actual execution of the program code, bulk storage, and
cache memories which provide temporary storage of at least some
program code in order to reduce the number of times code must be
retrieved from bulk storage during execution.
[0055] Input/output or I/O devices (including but not limited to
keyboards, displays, pointing devices, etc.) can be coupled to the
system either directly or through intervening I/O controllers.
Network adapters may also be coupled to the system to enable the
data processing system to become coupled to other data processing
systems or remote printers or storage devices through intervening
private or public networks. Modems, cable modem and Ethernet cards
are just a few of the currently available types of network
adapters.
[0056] It will be apparent to those skilled in the art having the
benefit of this document that the present disclosure contemplates
methods, systems, and media that provide a driver with situational
awareness information. It is understood that the form of the
invention shown and described in the detailed description and the
drawings are to be taken merely as examples. It is intended that
the following claims be interpreted broadly to embrace all the
variations of the example embodiments disclosed.
* * * * *