U.S. patent application number 11/280332 was filed with the patent office on 2006-04-13 for high availability/high density system and method.
This patent application is currently assigned to Trillium Digital Systems, Inc.. Invention is credited to Robert Penn Cagle, Michael Hamilton Coward.
Application Number | 20060080469 11/280332 |
Document ID | / |
Family ID | 33565349 |
Filed Date | 2006-04-13 |
United States Patent
Application |
20060080469 |
Kind Code |
A1 |
Coward; Michael Hamilton ;
et al. |
April 13, 2006 |
High availability/high density system and method
Abstract
A system and method for providing high availability for
telecommunications and data communications by implementing a
network bus architecture at a card level. The network bus
architecture, which may be a combination of hardware, software, and
APIs, replaces the conventional midplane/backplane as the system
bus for PCI purposes. The system provides physical redundancy to
the system by connecting ports of the various system cards using
dual/redundant ethernet switches. Further, since the system cards
are connected through network connections, failure of any component
is interpreted and addressed as a network failure instead of as a
hardware failure.
Inventors: |
Coward; Michael Hamilton;
(La Jolla, CA) ; Cagle; Robert Penn; (Claremont,
CA) |
Correspondence
Address: |
MORGAN LEWIS & BOCKIUS LLP
1111 PENNSYLVANIA AVENUE NW
WASHINGTON
DC
20004
US
|
Assignee: |
Trillium Digital Systems,
Inc.
|
Family ID: |
33565349 |
Appl. No.: |
11/280332 |
Filed: |
November 17, 2005 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10916536 |
Aug 12, 2004 |
|
|
|
11280332 |
Nov 17, 2005 |
|
|
|
09688859 |
Oct 17, 2000 |
|
|
|
10916536 |
Aug 12, 2004 |
|
|
|
Current U.S.
Class: |
709/250 |
Current CPC
Class: |
H04L 49/65 20130101;
H04L 49/552 20130101; H04L 69/12 20130101; H04L 29/12235 20130101;
H04L 61/2023 20130101; H04L 61/103 20130101; H04L 29/12028
20130101; H04L 69/32 20130101; H04L 49/40 20130101; H04L 49/351
20130101; H04L 67/1002 20130101; H04L 69/40 20130101 |
Class at
Publication: |
709/250 |
International
Class: |
G06F 15/16 20060101
G06F015/16 |
Claims
1. A high availability telecom/datacom architecture comprising:
dual/redundant ethernet switches; a plurality of I/O cards
connected to the dual/redundant ethernet switches; and at least two
CPUs coupled to each other and to the plurality of I/O cards via
the ethernet switches.
2. The architecture of claim 1 further comprising: a system monitor
and control module coupled to the at least two CPUs and the
plurality of I/O cards via the ethernet switches.
3. A high availability telecom/datacom architecture comprising:
dual/redundant network links; a plurality of I/O cards connected to
the dual/redundant network links; and a system controller coupled
to the plurality of I/O cards via the dual/redundant network
links.
4. The architecture of claim 3 wherein the system controller
comprises multiple CPUs coupled to each other and to the plurality
of I/O cards via the dual/redundant network links.
5. The architecture of claim 3 wherein the dual/redundant network
links are implemented using ethernet switches.
6. The architecture of claim 4 wherein the dual/redundant network
links are implemented using ethernet switches.
7. The architecture of claim 3 further comprising: a system monitor
and control module coupled to the system controller and the
plurality of I/O cards via the network links.
8. The architecture of claim 5 further comprising: a system monitor
and control module coupled to the system controller and the
plurality of I/O cards via the network links.
9. A network bus architecture for providing high availability to
telecom/datacom systems comprising: dual/redundant network links; a
plurality of I/O cards connected to the dual/redundant network
links; and, a system controller coupled to each other and to the
plurality of I/O cards via the dual/redundant network links.
10. The network bus architecture of claim 9 wherein the network
links are implemented using ethernet switches.
11. The architecture of claim 10 wherein the system controller
comprises multiple CPUs coupled to each other and to the plurality
of I/O cards via the ethernet switches.
12. The architecture of claim 9 wherein the system controller
comprises multiple CPUs coupled to each other and to the plurality
of I/O cards via the network links.
13. The architecture of claim 9 further comprising: a system
monitor and control node coupled to the system controller and the
plurality of I/O cards.
14. The architecture of claim 12 further comprising: a system
monitor and control node coupled to the system controller and the
plurality of I/O cards.
15. The architecture of claim 9 further comprising software and
APIs.
Description
[0001] This application is a continuation application of U.S.
patent application Ser. No. 09/688,859 filed Oct. 17, 2000, which
is hereby incorporated by reference.
BACKGROUND OF THE INVENTION
[0002] 1. Field of the Invention
[0003] The present invention relates to telecom/datacom computer
systems. More particularly, the present invention relates to a
system and method for providing high availability to
telecom/datacom computer systems.
[0004] 2. Description of the Related Art
[0005] Providing fault tolerance to high availability systems
typically involves providing redundant configuration and
application data, redundant communication between control nodes and
processing nodes, redundant control nodes, and a dynamically
assignable pool of processing resources. In traditional high
availability systems, providing fault tolerance is often
accomplished by replicating the system components to remove single
points of failure within the system. FIG. 1 shows a 2N-Redundant
platform composed of a primary node 10 and an identical backup node
20 that will provide service if the primary 10 fails. Such a
solution is fairly easy to implement both in terms of design and
the software that must be used to manage it, but can be
prohibitively expensive as every component must be duplicated.
[0006] Furthermore, such a design carries a more significant
drawback in that when the primary node fails all calls in process
are dropped. This failure occurs even if the backup node comes up
quickly. In some applications, such as voicemail, this is
acceptable, as a user can simply call back to replay the messages.
However, when a user who is attempting to manage a Wall Street
conference call with 200 analyst participants, such a situation
results in an enormous inconvenience and lost revenue.
[0007] Another approach to providing high availability systems has
been the duplication of subsets of hardware instead of the entire
node, as shown in FIG. 2. Most often, this translates to
duplication of the CPU 120. Should the primary CPU 120-1 fail, the
backup 120-2 awakens from standby and begins to provide service.
This design addresses the cost issues that arise when compared to
the 2N Redundant architecture since not all of the hardware in a
given node need be replicated. However, while this architecture is
less expensive to implement, the software challenges for such a
platform are formidable. Should the primary CPU 120-1 fail and the
backup 120-2 take over its function, the backup 120-2 must
accurately determine the state and take control of all of the I/O
cards--all without disturbing the state of the cards or the calls
in process at the time of failure. Most operating systems cannot
survive a hardware failure, so any fault takes down the whole
system. To get around this, conventional systems must use hardened
operating systems, hardened device drivers, and even hardened
applications to protect against the failure of an I/O processor or
any peripheral. Another consideration is the immaturity of such
systems and the lack of standardization. Furthermore, the
conventional system bus (e.g., PCI or CompactPCI) remains a single
point of failure. A single misbehaving I/O card can take down the
entire system much like a single bulb in a string of Christmas
lights.
[0008] In addition to system failures, inefficiencies within the
system are also of concern. For example, each processor in the
system has an IP address. Manually setting the IP address of each
processor makes the site install process time consuming and error
prone. Existing mechanisms for automatically setting the IP address
do not take into account geographic location or board replacement.
Existing mechanisms for automatically assigning IP addresses, such
as the Dynamic Host Configuration Protocol (DHCP) or Reverse
Address Resolution Protocol (RARP), rely on a unique hardware
address permanently programmed into all computer hardware. This
address moves with the hardware, so maintaining a specific IP
address for a specific computer location is impossible, given that
all hardware will move and/or be replaced at some point in
time.
SUMMARY OF THE INVENTION
[0009] The present invention has been made in view of the above
circumstances and has an object to provide a cost effective system
and method for providing high availability and improved efficiency
to telecom/datacom systems. Further objects of the present
invention are to provide a system and method that can simplify the
programming necessary to implement high availability successfully,
avoid potential problems caused by fluid standards, allay concerns
brought up by single points of failure of busses, and leverage open
standards and the increasingly intelligent I/O cards and peripheral
processors that have become available. Additional objects and
advantages of the invention will be set forth in part in the
description that follows, and in part will be apparent from the
description, or may be learned by practice of the invention. The
objects and advantages of the invention will be realized by means
of the elements and combinations particularly pointed out in
written description and claims hereof as well as the appended
drawings.
[0010] It should be understood that both the foregoing general
description and the following detailed description are exemplary
and explanatory only and are not restrictive of the invention as
claimed. It will be apparent to those skilled in the art that
various modifications and variations can be made without departing
from the scope or spirit of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] The accompanying drawings, which are incorporated in and
constitute a part of this specification, illustrate embodiments of
the invention and together with the description, serve to explain
the principles of the invention.
[0012] FIG. 1 provides a block diagram of a prior art system
addressing high availability through full redundancy.
[0013] FIG. 2 provides a block diagram of a prior art system
addressing high availability through duplication of subsets of
hardware instead of the entire node.
[0014] FIG. 3 provides a block diagram of a high availability
telecom/datacom computer system in accordance with one embodiment
of the present invention.
[0015] FIG. 4 provides a block diagram of one embodiment of an
ethernet switch in accordance with the present invention.
[0016] FIG. 5 provides a block diagram of a high availability
telecom/datacom computer system in accordance with another
embodiment of the present invention.
[0017] FIG. 6 provides a block diagram of one embodiment of a
continuous control node in accordance with the present
invention.
[0018] FIG. 7 provides a block diagram showing software components
incorporated in accordance with one embodiment of the present
invention.
[0019] FIG. 8 illustrates the role of NETRACK during typical
component failures occurring in the system for one embodiment of
the present invention.
[0020] FIG. 9 illustrates PSRM starting and monitoring an
application in accordance with one embodiment of the present
invention.
[0021] FIG. 10 provides a block diagram illustrating disk mirroring
in accordance with one embodiment of the present invention.
[0022] FIG. 11 provides a block diagram illustrating IP addressing
in accordance with one embodiment of the present invention.
[0023] FIG. 12 provides a block diagram illustrating IP failover in
accordance with another embodiment of the present invention.
[0024] FIGS. 13-14 provide flowcharts of the enhanced DHCP server
operation in accordance with an embodiment of the present
invention.
[0025] FIG. 15 provides a flowchart of the enhanced RARP server for
an embodiment of the present invention.
[0026] FIG. 16 provides a block diagram of a client-server
embodiment of the present invention.
[0027] FIG. 17 provides a block diagram of an independent server
embodiment of the present invention.
[0028] FIG. 18 provides a block diagram of a diskless system
embodiment of the present invention.
DETAILED DESCRIPTION
[0029] The present invention finds applicability in a variety of
high availability telecom/datacom computer systems that may include
platforms for Media Gateway Controllers, SS7 Gateways, OAM&P,
Billing, and Call Processing. Other applications will be known to
those skilled in the art and are within the scope of the present
invention. FIG. 3, illustrating an embodiment of the present
invention, is a block diagram of a high availability system
implementing a network bus architecture. FIG. 3 has been simplified
for purposes of explanation and is not intended to show every
feature of the system. As used herein, structure will be referred
to collectively or generically using a three-digit number and will
be referred to specifically using an extension separated from the
three-digit number by a hyphen. For example, I/O cards 110-1 to
110-N will be referred to collectively as I/O cards 110 and an
individual I/O card will be referred to generically as I/O card 110
or specifically, for example, as I/O card 110-1.
[0030] As shown in FIG. 3, multiple system controllers, CPUs 120,
are connected through dual/redundant Ethernet switches 130 to each
other and to multiple I/O cards 110. The ports on the cards of the
CPUs 120 and I/O cards 110 are connected directly to the
dual/redundant ethernet switches 130 using the ethernet links 100,
leaving the conventional midplane/backplane (not shown) simply to
provide power to the various system components. The system
controllers, CPUs 120, provide a redundant platform for control and
also may provide transparent IP disk mirroring and boot and
configuration services to the I/O processors 110. The
dual/redundant ethernet connections allow the CPUs 120 to maintain
synchronized states and to have simultaneous access to the I/O
cards. These connections, therefore, permit a standby CPU, or an
active CPU also serving in a standby capacity, to take over quickly
for a failed CPU. In one embodiment the CPUs 120 may be realized
using an UltraSPARC.TM./Solaris.TM. implementation. Other
implementations will be known to those skilled in the art and are
within the scope of the present invention. Further, it will be
known to those skilled in the art that the system controller
functionality is really a role and not a physical implementation,
and as such, the system controller functionality can reside on any
of the cards, including the I/O cards.
[0031] The pair of ethernet switches 130 ensures that the network
is not a single point of failure in the system. While the Ethernet
is used in illustrating the present invention, the architecture
also can be extended to faster networks, such as Gigabit Ethernet
or Infiniband. In one embodiment, the switches 130 may be realized
using the 24+2 CompactPCI Ethernet switch 300 manufactured by
Continuous Computing Corporation, illustrated in FIG. 4. The 24+2
CompactPCI Ethernet switch 300 is a 26-port, non-blocking, fully
managed Ethernet switch with 24 10/100 Mbps autosensing, Fast
Ethernet ports, and two Gigabit 1000Base-SX ports. The switch 300
provides full wire-speed Layer 2 switching supporting up to 16 k
MAC address, 256 IEEE 802.1Q Virtual Lans, IP multicasting, full-
and half-duplex flow control and IEEE 802.1Q Quality of Service.
The switch 300 enables high-speed communications between elements
without external hubs that often block airflow. Moreover, switch
300 can operate in a conventional slot or in an isolated backplane,
and can support TCP/IP and serial management interfaces. Other
switches will be known to those skilled in the art and are within
the scope of the present invention. In one embodiment of the
present invention, the network uses TCP/IP running on a 100 Mbit
Ethernet. TCP/IP is the industry name for a suite of network
protocols that includes, but is not limited to: IP, TCP, UDP, ICMP,
ARP, DHCP, SMTP, SNMP. Other network protocols and mediums will be
known to those skilled in the art and are within the scope of the
present invention.
[0032] The intelligent I/O processors 110 are application dependant
and may include DSP cards, CPU cards, and voice-processing cards
combining DSP, telephony, and RISC resources. These processors 110
may be configured with 2N redundancy, or N+M redundancy, or no
redundancy at all. However, to take full advantage of the network
bus architecture the processors 110 should have multiple Ethernet
interfaces and be able to configure multiple IP addresses on an
interface, to run a small heartbeat client, and to boot from
onboard flash or over the network. Using the network bus
architecture of the present invention, the I/O processors 110 are
not limited to conventional PCI or CompactPCI I/O slot processors.
Because there is no conventional bus, standard system slot
processor boards can be used in the I/O slots of the system 140,
considerably increasing the range of cards available.
[0033] The high availability system also may include dual power
supplies (not shown) with either power supply capable of running
the whole system. In one embodiment the power supplies are realized
using Continuous Computing Telecom Power Supply (CCTPS)
manufactured by Continuous Computing Corporation, which offers 350
watts of hot-swappable, load sharing power to the system with dual
input feeds and 48V DC input. Other power supplies will be known to
those skilled in the art and are within the scope of the present
invention.
[0034] FIG. 5 illustrates another embodiment of the high
availability architecture of the present invention. In this
embodiment, the system includes a Continuous Control Node (CCN) 150
to monitor and control the CPUs 120, I/O cards 110, and the power
supplies. The CCN 150, which may be connected to the other system
components via the ethernet links, provides presence detect, board
health, and reset control for the high availability system. These
functions may be accomplished using a set of logic signals that are
provided between each system board and the CCN 150. The logic
signals may be, for example, the negative-logic BDSEL#, HEALTH#,
and RST# signals described in the Compact PCI Hot Swap
Specification and incorporated herein by reference.
[0035] In one embodiment of the present invention, the CCN 150 can
power up and power down individual slots and provide network access
for any boards that have serial consoles. The power control may be
accomplished using the set of logic signals mentioned above;
however, the only requirement is a single-ended logic signal of
either logic polarity which, when asserted, will override power
control on the I/O board and effect the power down of the board.
The software requirements are, at a minimum, a function executed on
the CCN 150 that causes the hardware to read and set the logic
level of the pertinent signals.
[0036] Serial console access to the I/O boards 110 is provided in
hardware by cabling or making direct midplane connection of the I/O
board serial port signals to the CCN serial port signals or in
software by configuring one or more of the serial ports onboard,
and relaying the serial data stream to the network communications
port designated for this console access.
[0037] The CCN 150 also may function as a single point of contact
for management and provisioning of the entire system. The CCN 150,
which for one embodiment is always powered, may allow a remote
technician to check the system status, power cycle the system or
its subcomponents, and/or access the console. The CCN 150 may be
realized using the Continuous Control Node manufactured by
Continuous Computing Corporation, the Continuous Control Node
incorporated herein by reference.
[0038] In a further embodiment of the present invention illustrated
in FIG. 6, all of the CCNs from various high availability systems
communicate over a redundant out-of-band network allowing the
monitoring and control of a large number of systems from a single
network management station. The CCN 150 may offer numerous
interfaces for system management, including a serial command line
interface, a JAVA GUI and a C/C++ API for interfacing user
applications to the control network. The combination of these
features provides a powerful tool for system maintenance.
[0039] As illustrated in FIG. 7, the network bus architecture of a
further embodiment of the present invention also may incorporate a
set of software components and APIs (collectively referred to as
SOFTSET) that detects failures in the system, routes around the
failure, and alerts a system manager that a repair is required. The
set of software components and APIs may be realized, for example,
using upSuite.TM. by Continuous Computing Corporation, upSuite.TM.
incorporated herein by reference.
[0040] The first role of SOFTSET is to guarantee communication
among all the components in the system. To accomplish this, a
heartbeat manager NETRACK 200 of SOFTSET runs on every component on
the system and is designed to keep track of which components and
services are up and which are down. Heartbeats provided by the
individual system components enable NETRACK 200 to detect component
failures and to build a map of working network connections. By
comparing the network map to the system configuration, NETRACK 200
can detect network failures.
[0041] Local network applications provide a second source of
information for NETRACK 200. For example, an orderly shutdown
script could inform NETRACK 200 that a system was going down.
NETRACK 200 would then relay that information to the other NETRACKs
200 whose systems might take action based on that information. In
particular, another SOFTSET software component, a process
starter/restarter/monitor (PSRM) 210, is designed to talk to
NETRACK 200.
[0042] Finally, applications can connect to NETRACK 200 to get
status information. For example, a standby media gateway controller
application may ask NETRACK 200 for notification if the other
server, or even just the other application, goes down. Applications
can receive all status information, or they can register to receive
just the state changes in which they are interested.
[0043] FIG. 8 illustrates the role of NETRACK during typical
component failures occurring in the system. Each CPU or I/O
component has two network addresses. For example, the CPU may be
communicating over network A to the first I/O card using the
address of the I/O card on network A which is 10.1.1.2. NETRACK
constantly sends small packets over both networks to all components
and listens for packets from the other components. In FIG. 8 there
are three points of failure: [0044] 1) Network A fails. In this
case, NETRACK on the CPU detects that the first I/O card is no
longer responding via 10.1.1.2, but is responding via 20.1.1.2.
NETRACK immediately instructs the application to stop using
10.1.1.2 on network A and to start using 20.1.1.2 on network B.
[0045] 2) The first I/O card's link to network A fails. Again,
NETRACK on the CPU detects that 10.1.1.2 has stopped working but
that 20.1.1.2 is working, and instructs the application as in (1).
[0046] 3) The first I/O card fails completely. NETRACK on the CPU
detects that the first I/O card is not responding at all, but that
the second I/O card is responding on both networks, and it
instructs the application to use 10.1.1.3 or 20.1.1.3.
[0047] The second role of SOFTSET is to guarantee that a software
service or application is running and available on the network.
This is accomplished using PSRM 210. While an application can be
coded to connect directly to NETRACK 200, for many applications it
makes sense simply to `wrap` the application in PSRM 210. For "off
the shelf" software, this is the only choice. PSRM 210 is designed
to give developers maximum flexibility in configuring the behavior
of their applications.
[0048] FIG. 9 illustrates PSRM starting and monitoring an
application. PSRM starts the application and monitors its health.
PSRM reports the status of the application to NETRACK, which in
turn reports to all the other NETRACKs on the other components of
the system. There are several possible failures:
[0049] 1) The application dies. For example, a Unix or Linux
operating system is running applications as independent processes
that may `die`. Other operating systems will be known to those of
ordinary skill in the art and are within the scope of the present
invention.
[0050] 2) The application stops responding.
[0051] 3) The application is having trouble: it can still
communicate with PSRM, but it may be running slowly, or be unable
to allocate resources, etc.
[0052] In any of these failures, PSRM has the choice of:
[0053] 1) restarting the application
[0054] 2) stopping the application and asking another component
(CPU B in FIG. 9) to start an equivalent application
[0055] 3) doing nothing (e.g. if the application is merely running
slowly)
[0056] In all cases, PSRM and NETRACK make sure that all system
components are able to reach the application over at least one
network interface. PSRM 210 keeps NETRACK 200 informed of the state
of the process at all times. In a more complicated case, PSRM 210
might only restart a process that dies with certain exit codes, or
PSRM 210 might only try a certain number of restarts within a
certain period. This functionality and flexibility is available
through simple configuration--it does not require programming or
any modifications to an application. Finally, applications that
require the highest level of integration may communicate directly
with PSRM 210.
[0057] A third software component of SOFTSET, IPMIR 220, ensures
data reliability by providing application-transparent disk
mirroring over IP. Data written to disk are transparently
`mirrored` over the network via IPMIR 220; "transparently" meaning
the applications are not necessarily affected by the mirroring. As
illustrated in FIG. 10, the two system controllers 120 run IPMIR
220 between each other to mirror disk activity. I/O processors use
NFS to mount the disks from the active server. In the case of a
failure (CPU, disk, network, etc.), the standby server assumes the
IP address of the failed Network File System (NFS) server and
continues serving files from its mirror of the data. Although this
IP failover technique will not work for other IP disk mirroring
packages (Unix administrators may be familiar with the `Stale NFS
Handle` problem), IPMIR 220 is specifically designed to handle this
situation. IPMIR 220 maintains a lookup table for NFS file handles
which avoid this problem.
[0058] A fourth software component SNMP/ag is an SNMP aggregator
that allows an entire system to be managed as a single entity.
Typically, while system components are capable of providing system
status and control via SNMP, there is still the need to map out the
whole system, collect all the MIBs (management information bases),
and figure out how to manage all the components. Because SOFTSET
knows the configuration of the system and the state of the system,
SNMP/ag can provide a simple, comprehensive view to the system
manager. The system manager's SNMP browser talks to SNMP/ag, which
provides a single MIB for the entire system. SNMP/ag, in turn,
talks to all the SNMP agents within the system, and relays requests
and instructions.
[0059] A fifth software component REDIRECT is a rules-based engine
that manages the system based on the input from NETRACK 200 and
SNMP/ag. These rules dictate the actions that should be taken if
any component fails, and provides methods to migrate services, IP
addresses, and other network resources. REDIRECT provides the
high-level mechanisms to support N+1 and N+M redundancy without
complicated application coding.
[0060] An integral part of SOFTSET is the ability to remotely
manage all aspects of the system, including both hardware and
software status and control. As part of this remote management
capability, SOFTSET supports a Java-based GUI that runs on any host
and uses a network connection to talk to the system. In one
embodiment, this management capability is provided by an interface
with the CCPUnet.TM. hardware monitoring platform manufactured by
Continuous Computing Corporation, internal SNMP agents, and other
SOFTSET components to provide a complete "one system" view.
CCPUnet.TM., incorporated herein by reference, provides a
distributed network approach for managing clusters of compute or
I/O nodes. By using CCPUnet.TM. and its associated software and
hardware interfaces, applications can control node power, monitor
critical voltages and temperatures, monitor hardware and software
alarms, access CPU or I/O controller consoles and have remote CPU
reboot and shutdown capabilities.
[0061] A further role of SOFTSET is to guarantee that network
services (applications) are available to external network clients
and to network client software that is inside the system, but which
does not talk directly to NETRACK. SOFTSET does this by using
`public` network addresses and by moving those public addresses
from component to component as appropriate. As illustrated in FIG.
11, every component in the system has two independent paths to
every other component in the system with each interface having a
unique private IP address (10.1.x.y), and each active service
having a unique public IP address (192.168.1.x). In the context of
this discussion, `private IP` refers to an address known only
within the configuration, `public IP` refers to an address that
might be accessed from outside the system, for example, through a
router. This designation is different than the common use of
`public IP`, which designates an address generally available from
the Internet at large. In FIG. 11, an application is providing a
network service at 192.168.1.1. Internal to this system, NETRACK
tells all the components that the application running at public IP
address 192.168.1.1 can be reached via private IP address 10.1.1.2.
This allows network clients running on those components to reach
the application even if those clients do not talk directly to
NETRACK. External to this system, an external router (or routers)
relays packets from external clients to 192.168.1.1 via
10.1.1.2.
[0062] Failover techniques involving the network bus architecture
of the present invention depend on the location of the failed
component. Within the system, network failures are handled via the
redundant links. For example, NETRACK 200 and PSRM 210 on one
server can continue communicating with NETRACK 200 and PSRM 210 on
another server if an Ethernet port, cable, or even an entire switch
fails. For public services (IPMIR 220 on the system controllers,
DSP applications on I/O processors, etc.), failover to a second
Ethernet port and failover to a standby server are both handled by
moving the IP address for that service. The failover actions are
dictated by the system configuration and are coordinated by NETRACK
200 and PSRM 210. Further, in some applications, it may be
desirable to move the MAC address (physical Ethernet address) to a
new port. This requires some care to preserve the private IP
addresses in a working state, but is also supported by SOFTSET.
Note that when the MAC address is moved packets may be sent through
the switch-to-switch link, depending on the capabilities of the
router. Four points of failure within one embodiment of the present
invention are illustrated in FIG. 12: network A fails (1), CPU A's
link to network A fails (2), CPU A fails (3), and the application
on CPU A fails (4).
[0063] If network A fails (1) or CPU A's link to network A fails
(2), NETRACK tells the other components to reach the application
running at 192.168.1.1 through 20.1.1.1. PSRM (or a special process
started by PSRM) tells the external router to reach 192.168.1.1
through 20.1.1.1.
[0064] If CPU A fails (3), NETRACK on CPU B notices the failure and
notifies PSRM. Upon notification, PSRM (or a special process
started by PSRM) reconfigures CPU B to use the public address
192.168.1.1 and starts the application on CPU B. NETRACK and PSRM
tell the other components and any external routers that 192.168.1.1
is now reachable through 10.1.1.2
[0065] Finally, if the application on CPU A fails (4), the two
NETRACKs and PSRMs notice the failure. PSRM on CPU A (or a special
process started by PSRM) reconfigures CPU A to stop using the
public address 192.168.1.1, reconfigures CPU B to use the public
address 192.168.1.1, and then starts the application on CPU B.
NETRACK and PSRM tell the other components and any external routers
that 192.168.1.1 is now reachable through 10.1.1.2
[0066] In a further embodiment of the present invention, a unique
IP addressing scheme is implemented using the physical location of
a board instead of its Ethernet address to determine the IP address
of the board at boot time. In this scheme, the IP address remains
constant even if the board is replaced. The physical location is
determined by mapping the board's current Ethernet address to a
physical port on the switch. This physical port number, acting as a
geographic ID, is then mapped to the desired IP address by modified
DHCP/RARP servers when DHCP/RARP requests are made. To support the
IP addressing scheme, the modified DHCP/RARP servers perform a set
of SNMP queries to the switch to determine the port number of the
requesting client. The Ethernet switch 130 is responsible for
automatically learning the hardware ethernet addresses of any
computer connected to it, and for maintaining this information in a
form that can be queried by outside entities using the SNMP
protocol. Once the switch has been queried and the DHCP and RARP
servers have determined on which port of the Ethernet switch 130 a
given request was received, this information can be used to assign
an IP address based on the physical port number of the switch to
which the requester is connected. The information can be further
mapped to a real geographic location as long as the network wiring
topology is known. The modified servers run on the CCN 150, which
is the only element in the system with a fixed IP address. Using
DHCP/RARP allows the update and maintenance of the
IP-to-physical-port table in one place, the CCN.
[0067] DHCP (Dynamic Host Configuration Protocol) is a protocol,
that allows a network administrator to manage centrally and to
automate the assignment of IP addresses to the computers in a
network. DHCP dynamically assigns an IP address to a computer when
a connection to the Internet is needed. With the enhanced DHCP
servers of the present invention, a set of SNMP queries are made to
the switch 130 to determine the port number of the requesting
client. Flowcharts of the enhanced DHCP server operation are shown
in FIGS. 13-14.
[0068] Referring to FIG. 13, DHCP server receives a request for
internet connection from a device with a MAC address (Media Access
Control, i.e, hardware address) known by the DHCP server (step
400). At step 410, the DHCP queries the Ethernet switch 130 to
determine the physical port to which the MAC address is connected.
If the port number is provided, the IP address is determined from
the port information (step 440); otherwise, the request for
connection is ignored (step 430). Once the IP address has been
determined, the DHCP server checks to ensure that the IP address is
not currently in use on the network (step 450). If the IP address
is already in use, the DHCP server marks the IP address as unusable
and ignores the request for connection (step 460). If the IP
address is not in use, the DHCP server assigns the IP address to
the requesting device (step 470).
[0069] In FIG. 14, the request for connection from the device
includes a request for a desired IP address (step 500). At step
510, the DHCP queries the Ethernet switch 130 to determine the
physical port to which the MAC address is connected. If the port
number is provided, the IP address is determined from the port
information (step 540); otherwise, an error message is generated
(step 530). Once the IP address has been determined, the DHCP
server compares the provided address to the requested address (step
550). If the two addresses do not match, an error message is
generated (step 560). If the addresses are the same, a check is
made by the DHCP server to ensure that the IP address is not
currently in use on the network (step 570). If the IP address is
already in use, the DHCP server marks the IP address as unusable
and generates an error message (step 580). If the IP address is not
in use, the DHCP server assigns the IP address to the requesting
device (step 590).
[0070] RARP (Reverse Address Resolution Protocol) is a protocol by
which a device can request its IP address maintained in an Address
Resolution Protocol (ARP) table typically located in a network
router; the table mapping the MAC address of the device to a
corresponding IP address. When a device is connected to the
network, a client program running on the device requests its IP
address from the RARP server on the router. As with the enhanced
DHCP server, the enhanced RARP server of the present invention
queries the switch 130 to determine the port number of the
requesting client. FIG. 15 provides a flowchart of the enhanced
RARP server.
[0071] As illustrated in FIG. 15, a request is made from a device
with a MAC address known to the RARP server (step 600). The RARP
server queries the switch 130 to determine the physical port to
which the device with the associated MAC address is connected (step
610). If the port is found, the RARP server provides the IP address
associated with the port to the requesting device (step 640). If
the port information is not found, the RARP server ignores the
request (step 630).
[0072] FIGS. 16-18 respectively illustrate client-server,
independent server, and diskless system embodiments of the present
invention.
[0073] Advantages associated with the network bus architecture of
the present invention are numerous. For example, redundant
communication is provided by the network bus architecture since
each component in the system has two separate paths to every other
component. Any single component--even an entire switch--can fail,
and the system continues to operate. Moreover, the boards in the
system are coupled only by network connections, and therefore
should a component fail, the failure is interpreted by the other
components as a network failure instead of hardware failure. This
interpretation simplifies the programming task associated with
failover with more reliable results than in conventional systems.
The failure of a network link merely means failover to a second
link while the failure of a single component merely means dropped
network connections to other components. The system does not
experience a hardware failure in the conventional sense. The CPU
control nodes, in turn, do not have to run hardened operating
systems, but can run rich, popular systems such as Solaris or
Linux.
[0074] The present invention also leverages the increasing
intelligence of available I/O cards. The intelligent I/O processors
work largely independent of the system controller, relieving the
CPU of having to carry out functions that can be assumed by more
peripheral parts of the architecture. While the I/O processors
receive services from the controllers (boot image, network disk,
configuration management, etc.), I/O is directly to and from the
network, instead of over the midplane/backplane from the
controller. Much like an object-oriented programming model, this
allows I/O processors to present themselves as an encapsulated set
of resources available over TCP/IP rather than as a set of device
registers to be managed by the CPU.
[0075] Further, the actual failover process is simplified. The
intelligent I/O processors can continue handling calls even while
system controllers are failing over. Both the active and standby
system controllers know the system state while intelligent I/O
processors tend to know about their own state and can communicate
it upon request. Failover to the standby component can be handled
either by the standby component assuming the network address of the
failed component (IP or MAC failover), by coordination among the
working components, or by a combination of both. The failure of
network components is handled via the redundant ethernet links 130
while communication to equipment outside the system can be handled
transparently via IP failover, or by the standby component
re-registering with a new IP address (e.g. re-registering with a
softswitch). The density of the system is also improved by the
inventive network bus architecture since the combination of system
controllers, I/O processors, or CPUs is not limited by the chassis.
Further, there are no geographical constraints associated with the
network bus architecture; embodiments of the present invention can
be composed of nodes located on opposite sides of a Central Office
space, a building, or even a city to increase redundancy and
availability in case of a disaster. The flexibility of a
fully-distributed architecture via a network dramatically increases
the possibilities for design.
[0076] Hot-swap advantages also are provided by a network bus
architecture. Replacing failed boards or upgrading boards is simply
a matter of swapping the old board out and the new board in. There
are no device drivers to shut down or bring back up. Depending on
the application, however, it may well be wise to provide support
for taking a board out of service and managing power smoothly.
[0077] Bandwidth benefits are also available due to the network bus
architecture of the present invention. While nominal
midplane/backplane bus speeds tend to be higher than nominal
network speeds, a fully non-blocking, full-duplex switch connected
in accordance with the present invention can provide higher
aggregate throughput. For example, a cPCI bus provides 1 Gbit/sec
to a maximum of 8 slots. A full-duplex 100 Mbit ethernet switch
provides 1.6 Gbit/sec to an 8 slot system and 4.2 Gbit/sec to a 21
slot system. In a redundant configuration, those numbers could be
as high as 3.2 and 8.4, respectively, although an application would
have to allow for reduced throughput in the case of a network
component failure.
[0078] Further, efficiency within the system is improved by using
the inventive IP addressing scheme which allows a specific IP
address to be assigned to a specific computer location. This scheme
is accomplished using enhanced DHCP and RARP protocol servers to
determine the geographic location of any computer requesting an IP
address. Once determined, the geographic location is then used to
assign a specific IP address to the requestor; the IP address is
fixed to that particular location, and independent of all hardware
specific ethernet addresses. If computer hardware at one location
is replaced with different hardware, or the computer at that
location is rebooted, the IP address for that computer remains the
same. If computer hardware at one location is moved to another
location, a new IP address will be assigned to the hardware based
on its new location.
[0079] Other embodiments of the invention will be apparent to those
skilled in the art from consideration of the specification and
practice of the invention disclosed herein. It is intended that the
specification and examples herein be considered as exemplary only,
with a true scope and spirit of the invention being indicated by
the following claims.
* * * * *