U.S. patent application number 12/512148 was filed with the patent office on 2010-02-04 for publish and subscribe method for real-time event monitoring in a system for managing a plurality of disparate networks.
Invention is credited to Edmond Baydian, Joseph Fuccillo, Matthew KIEFER.
Application Number | 20100030884 12/512148 |
Document ID | / |
Family ID | 41609449 |
Filed Date | 2010-02-04 |
United States Patent
Application |
20100030884 |
Kind Code |
A1 |
KIEFER; Matthew ; et
al. |
February 4, 2010 |
PUBLISH AND SUBSCRIBE METHOD FOR REAL-TIME EVENT MONITORING IN A
SYSTEM FOR MANAGING A PLURALITY OF DISPARATE NETWORKS
Abstract
Converged network management application and system is provided
that delivers a management platform as a service that can view
and/or manage all managed networks in the aggregate, or any one of
them individually (including individual devices within the managed
networks), in a secure and efficient manner, providing continuously
available intelligence in real time on the managed networks and
systems, and overcoming integration issues including conflicting
address schemas, the need to avoid unnecessary infrastructure, and
the need acquire all necessary information in real time within
applicable memory and bandwidth constraints.
Inventors: |
KIEFER; Matthew; (Staten
Island, NY) ; Baydian; Edmond; (Massapequa, NY)
; Fuccillo; Joseph; (Pleasantville, NY) |
Correspondence
Address: |
HUGHES HUBBARD & REED LLP
ONE BATTERY PARK PLAZA
NEW YORK
NY
10004
US
|
Family ID: |
41609449 |
Appl. No.: |
12/512148 |
Filed: |
July 30, 2009 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61085407 |
Jul 31, 2008 |
|
|
|
Current U.S.
Class: |
709/223 |
Current CPC
Class: |
H04L 29/12301 20130101;
H04L 43/10 20130101; H04L 41/22 20130101; H04L 63/02 20130101; H04L
47/785 20130101; H04L 61/302 20130101; H04L 41/0233 20130101; H04L
63/20 20130101; H04L 41/00 20130101; H04L 41/0213 20130101; H04L
41/5067 20130101; H04L 45/34 20130101; H04L 61/1511 20130101; H04L
29/06312 20130101; H04L 41/0604 20130101; H04L 63/08 20130101 |
Class at
Publication: |
709/223 |
International
Class: |
G06F 15/173 20060101
G06F015/173 |
Claims
1. A method for a management system to access a plurality of
network management processes, for a plurality of network
facilities, comprising (a) making a request to a selected one of
said network facilities to subscribe to a network management
process on said facility; and (b) relaying changed information
concerning the network management process subscribed to said
management system, said relaying being effectuated by said facility
at about the same time that it updates its own internal
representation of said information.
2. A method as described in claim 1, wherein said relaying is done
by the same software routine that updates the internal
representation of the updated information.
3. A method as described in claim 1, wherein, when information is
subscribed to, the middleware process that updates the local table
also copies the new data to a socket,
4. A method as described in claim 1, wherein, wherein, if the
network management processes being accessed reside on different
networks, proxying the query and response by steps comprising: (a)
combining a first unique identifier with the address of each
queried element, to form a combined unique identifier, this step
being done within said element's local domain; and (b) making said
combined unique identifier available to the querying system.
5. A method as described in claim 1, further comprising setting a
filter by time, to get historic and current metric information.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of the filing date of
U.S. Provisional Patent Application No. 61/085,407, filed Jul. 31,
2008, the entire disclosure of which is hereby incorporated by
reference. This application also incorporates by reference the
entire disclosures of the following copending U.S. patent
applications filed on the same date as this application: U.S.
patent application Ser. Nos. 12/______, 12/______, and
12/______.
BACKGROUND OF THE INVENTION
[0002] 1. Field of the Invention
[0003] The present invention is in the field of network management
and support. More specifically, the invention provides a system for
remotely and securely monitoring and managing a plurality of
disparate networks and systems, which, among other capabilities,
can monitor events in real time, selectively or globally,
throughout all managed networks, and access and manage individual
network elements to any internal depth within each managed network,
without requiring special access to the network, and without regard
to the architectures, business purposes or addressing schemas of or
within the managed networks.
[0004] 2. Description of the Related Art
[0005] Modern data and communications networks are highly complex
and require substantial management in order to keep those networks
and the services they provide up and running smoothly. Among the
activities within the scope of "network management" is monitoring a
network and the health of its systems and components in order to
spot problems as soon as possible, preferably before users or
business processes are affected. Other activities within the scope
of such management include operation, administration, maintenance,
and provisioning.
[0006] Numerous systems exist for providing the types of management
and support referenced above, on a network-by-network basis.
[0007] Many organizations require complex networks, but lack the
resources to manage them, lack the budget to acquire a
fully-outfitted management system for their individual network, or
believe that they could better economize if it were possible to
outsource this activity. An organization tasked with managing
networks for a plurality of disparate customers will face
multiplied expenses, however, if it must provide a separate
management infrastructure for each customer. A need therefore
exists for systems capable of remotely but centrally and securely
managing a plurality of disparate networks, meaning networks under
different ownership or management, or otherwise characterized by
having different architectures, different management policies,
different business purposes, and/or different overall design.
[0008] A large number of access methods exist to support network
and network device management within, or directed to, any given
network. Access methods include Simple Network Management Protocol
(SNMP), Command Line Interfaces (CLIs), custom XML, CMIP, Windows
Management Instrumentation (WMI), Transaction Language 1, CORBA,
netconf, the Java Management Extensions (JMX), the Java Messaging
Service (JMS), SOAP, and XML-RPC. These are primarily low-level
protocols that help get management jobs done, but do not address
the issues involved in managing a plurality of disparate
networks.
[0009] As mentioned above, systems currently exist for managing
entire enterprise-level networks. Popular systems include
OpenView.RTM. from Hewlett-Packard Corporation, Unicenter.RTM. from
Computer Associates, and IBM Tivoli.RTM. Framework. However, these
systems were developed primarily to manage individual
enterprise-level networks. They have only limited capabilities for
managing completely disparate networks. Another example of such a
system is the Solarwinds.RTM. Orion.RTM. Network Performance
Monitor. However, the Solarwinds system uses stateless
communications methods and is directed to monitoring rather than
remote management of individual devices within the monitored
networks. A somewhat different approach is that of Jumpnode Systems
LLC, as reflected by U.S. Pat. Pub. No. 2006/0218267 A1, which
provides a hardware appliance that can be installed in a local
network to monitor local network events and communicates the
collected information to a remote management center. However, the
Jumpnode.RTM. appliances track network events locally and are
therefore vulnerable to loss of connectivity and consequent data
loss and security risks. Moreover, each of the hardware appliances
must have its own "Internet drop" (or other access point (such as a
modem port) directly accessible from outside the local network) to
make the requisite connections to the remote management facility,
and the appliances rely on stateless communications and polling,
which does not provide for real-time data acquisition.
[0010] Tools also exist for internetwork communications, such as
proxy servers, remote control software systems such as
GoToMyPC.RTM. (now owned by Citrix Systems), and Alarmnet.TM. (by
Honeywell Security Systems). However, these tools do not provide a
way to communicate beyond the first level of a managed network
without special arrangements, such as special credentials, VPN
access, a special opening in a firewall, etc., or manual
construction of sockets and tunnels, allowing deeper access. They
also do not provide a mechanism for reducing the enormous volume of
data that might result from indiscriminately monitoring all events
across a plurality of managed networks and systems, other than
opting to view only one data source at a time. In addition,
centralized polling is often performed from a management network
separate from end-user community networks, resulting in a lack of
fidelity of that end-user's local perspective of the availability
of polled resources. Furthermore, measuring from a distance can
introduce artificial statistics in the actual measurements taken,
such as latency.
[0011] Similarly, tools such as Network Address Translation (NAT)
exist to isolate the inner workings and resources of networks from
outside view and access, and NAT systems can be configured to
forward messages to specified internal network destinations and
resources. Examples of this approach are reflected in U.S. Pat. No.
6,581,108 (assigned to Lucent Technologies, Inc.) and U.S. Pat.
Pub. Nos. 2005/0271047 A1 and 2006/0029083 A1. However, such
facilities are of limited utility for remote management. NAT
connections initiated from inside the NAT domain are session based.
Special provision can be made to forward connections initiated from
the outside. However, externally managing networks through a NAT
firewall is impractical, because one would have to configure the
NAT to make each network element within the NAT accessible from
outside.
[0012] Systems that have attempted to manage multiple networks have
not satisfactorily dealt with a number of issues, including: [0013]
Overlapping private address spaces among the managed networks.
Disparate networks may well utilize the same private address
allocation, resulting in conflicts. Existing workarounds have
involved assigning different network schemas, which can be
prohibitively inconvenient and expensive, particularly in light of
the need to change the entire schema at once; attaching to one
network at a time, through VPN or static routing, thus creating
time gaps in monitoring or providing multiple management
infrastructures at great duplication and expense. Another approach,
as reflected in U.S. Pat. No. 7,302,469, assigned to Ricoh Company,
Ltd., is to use instead a schema presumed to be globally unique,
such as one based on MAC addresses. However, such a system, while
providing a monitoring capability, does not provide any means for a
remote facility, external to the devices' local network, to address
the devices individually in order to manage them. [0014] Need for
special arrangements to access and manage processes and resources
within each network. No general method has existed for remotely
managing network processes and resources without providing some
"special" means of access, such as a VPN, a hole in a firewall or
the like. All of the prior approaches involve expense,
inconvenience or security compromises that are unacceptable to many
potential customers for network management services. [0015]
Overwhelming amount of network event information. Each network is
capable of generating a very high volume of event information for
purposes of monitoring. The volume of this information multiplies
when a plurality of networks are aggregated for management.
Existing systems have not adequately dealt with the issue of how to
limit the event information to what is relevant, without
compromising the continuous ability to monitor relevant
information.
[0016] Accordingly, there is a need for a practical and effective
methodology for managing and servicing a plurality of disparate
networks from a single, common infrastructure, in a manner
supported by prevailing customer firewall and security practices
without extensive or inconsistent provisions for special access,
and for a converged network management application that takes
advantage of those techniques and delivers a management platform as
a service that can view and/or manage all managed networks in the
aggregate, or any one of them individually.
SUMMARY OF THE INVENTION
[0017] It is an object of the invention to provide methods for
managing and servicing a plurality of disparate networks from a
single, common infrastructure, without requiring any owner of the
networks or systems being managed to change any topological
features or elements.
[0018] It is a further object of the invention to facilitate
management and servicing methods for a plurality of disparate
networks by providing a method for overcoming address space
collisions that might exist between managed networks and
systems.
[0019] It is another object of the invention to provide a uniform
and comprehensive method and protocol for routing communications
among management elements, so as to be able to build management
infrastructures extensibly based on a manageable selection of
fundamental management elements.
[0020] It is also an object of the invention to provide a method,
in a system for managing and supporting disparate networks and
systems, for remotely viewing realtime information about multiple
network management processes, without accepting either an
overwhelming amount of irrelevant data or restricting the data view
so as to exclude relevant data.
[0021] An additional object of the invention is to take advantage
of the technologies to meet the individual objectives above in
order to provide a converged network management application that
delivers a management platform as a service that can view and/or
manage all managed networks in the aggregate, or any one of them
individually.
[0022] To achieve these objectives, the present invention, in one
embodiment, provides a system for monitoring and managing a
plurality of disparate networks and systems from a centralized
physical location that is separate from the location of any of the
managed networks or systems, in which the operations are effected
without requiring the owner of any network or systems being managed
to change any topological features or elements and without
requiring a dedicated connection to any of the managed networks.
This system can be provided as service, by which a user can view
and/or manage all managed networks in the aggregate, or any one of
them individually.
[0023] To facilitate the ability to manage a plurality of disparate
networks and systems, the invention further provides, in said
embodiment, the ability to manage network topologies with
overlapping IP address schemas for their respective elements, by
combining, within each element's local domain, a unique identifier
with the address of the element, and making the combined unique
identifier available to the other elements in the management
system.
[0024] In order to facilitate said capability, the invention
further provides, in an embodiment in which such capabilities are
provided through modular software components, a method for routing
commands among such components by expressly or implicitly
specifying a route; specifying a command; invoking a socket with
said route and command as parameters; routing the command and
parameters in accordance with said route; executing the command
with its parameters at the route target; returning any results of
said execution back through said route; and closing down said route
when said execution is completed.
[0025] In said embodiment, the invention provides a method for the
management system to access a plurality of network management
processes, for a plurality of network facilities, by making a
request to a selected one of said network facilities to subscribe
to a network management process on said facility; and, at about the
same time that said facility updates its own internal
representation of said information, relaying to said management
system changed information concerning the network management
process subscribed to. This mechanism, referred to herein as
"publish and subscribe", is used to support a rich variety of
information outputs and displays, for both the aggregate and the
individual managed networks, for purposes of management.
[0026] Other aspects and advantages of the invention will be
apparent from the accompanying drawings, and the detailed
description that follows.
BRIEF DESCRIPTION OF THE DRAWINGS
[0027] For a more complete understanding of the present invention
and the advantages thereof, reference is now made to the following
description taken in conjunction with the accompanying drawings,
wherein like reference numerals represent like parts, in which:
[0028] FIG. 1 is a block diagram showing the various components in
an exemplary deployment of one embodiment of the invention, and the
interconnections of those components.
[0029] FIG. 2 is a block diagram showing socket and channel
connections for the routing method and protocol used in one
embodiment of the invention.
[0030] FIG. 3 is a block diagram showing an exemplary set of server
components and client applications that use one embodiment of a
publish and subscribe mechanism in accordance with the invention to
display data on the clients.
[0031] FIG. 4 is a depiction of a top-level screen display of an
exemplary network management application, showing a plurality of
disparate networks under management.
[0032] FIG. 5 is a depiction of a screen display of an exemplary
network management application, directed to the monitoring and
management of a selected one of the managed networks.
[0033] FIG. 6 is an exemplary depiction of a screen display showing
event lists for a selected managed network being monitored in
accordance with one embodiment of the invention.
[0034] FIG. 7 is an exemplary depiction of a screen display showing
the monitoring of port usage on a selected network over time, in
accordance with one embodiment of the invention.
[0035] FIG. 8 is an exemplary depiction of a screen display showing
a "dashboard" view of a managed network, comprising a network map
and a display of elements.
[0036] FIG. 9 is an exemplary depiction of a screen display showing
health metrics for a central Communication Manager (CM)
processor.
[0037] FIG. 10 is an exemplary depiction of a screen display
showing phone traceroutes with QOS display.
[0038] FIG. 11 is an exemplary depiction of a screen display
showing QOS detail for one phone traceroute.
[0039] FIG. 12 is an exemplary depiction of a screen display
showing a policy setup module.
[0040] FIG. 13 is an exemplary depiction of a screen display
showing current service levels over time, plus a rolling average
display.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0041] The following is a detailed description of certain
embodiments of the invention chosen to provide illustrative
examples of how it may preferably be implemented. The scope of the
invention is not limited to the specific embodiments described, nor
is it limited by any specific implementation, composition,
embodiment or characterization depicted in the accompanying
drawings or stated or described in the invention summary or the
abstract. In addition, it should be noted that this disclosure
describes a number of methods that each comprise a plurality of
steps. Nothing contained in this written description should be
understood to imply any necessary order of steps in such methods,
other than as specified by express claim language.
[0042] Certain terms should be understood in a specific defined
manner for purposes of understanding this specification and
interpreting the accompanying claims:
[0043] "Disparate networks" means networks under different
ownership or management, or otherwise characterized by having
different architectures, different management policies, and
possibly mutually conflicting addressing schemas.
[0044] "Socket" means an end point in a bidirectional communication
link. A TCP/IP socket is a socket, but other sockets exist (and are
used in the context of the invention) that are not TCP/IP sockets,
or, although instantiated from the same abstract base class as a
TCP/IP socket, do not have the full functionality of TCP/IP
sockets.
[0045] Exemplary System Architecture
[0046] FIG. 1 is a high-level block diagram showing an overview of
the various components in an exemplary deployment of one embodiment
of the invention, and the interconnections of those components.
This figure shows networks 101, 102, etc., through 10x, belonging
to Customer Business Units 1, 2, etc., up to Customer Business Unit
x. The Customer Business Units may be completely unrelated business
organizations, that have in common only the fact that they use the
same service provider to manage their respective networks. Network
101 of Customer Business Unit 1 is shown in greater detail than the
others, although it should be understood that the others may have
networks of comparable, greater or lesser complexity, which is not
shown in FIG. 1. Customer Business Unit 1 is shown as having three
locations, 111 (the main location), 112 and 113. Within the network
infrastructure at each location is a Remote Intelligence Gateway
(RIG). RIG CL1-RIG.sub.1 is at Location 111, RIG BU1-RIG.sub.2 is
at location 112, and RIG BU1-RIG.sub.3 is at location 113. A
Central Intelligence Platform (CIP) is provided within Data Center
120. Data Center 120, in this embodiment, is a single facility
maintaining connections 121, 122 and 12x over the SRSTP (Secure
Remote Session Transport Protocol, as will be described in further
detail below) with each of Customer Business Units 1-x, and more
particularly (as shown by the continuation on the customer side of
the dashed lines for 121, 122 and 12x) with the RIG at what is
regarded for network management purposes as the principal facility
of the Customer Business Unit. Each of those RIGs is similarly
connected via SRSTP to the RIG at the immediate downstream Customer
Location, as shown by dashed lines 131, 132. CIP 120 operates based
on a software construct that extends the class on which RIGs are
based, and thus, in addition to considerable added functionality,
CIP 120 contains all of the functionality and attributes of a
RIG.
[0047] Overcoming Address Space Collisions Between Disparate
Systems
[0048] Enterprise networks may use global or private IP addressing.
Because of the shortage of globally unique IP addresses, many
enterprises opt for one of the private address spaces defined by
RFC 1918 or in accordance with other widely accepted conventions.
These provide ranges of addresses privately useable within an
organization, yet not routed through the public network, and
therefore need not necessarily be globally unique. Thus, it
entirely possible that two or more of Customer Business Units
101-10x may have adopted private address schemas that overlap, and
if connected directly together, would conflict. For example,
Customer Business Unit 1 (network 101) and Customer Business Unit 2
(network 102) may each have independently adopted a 172.16.0.0/12
private addressing schema. A device may exist within each network
having the identical address, for example, 172.16.7.33. In order to
be able to centrally manage both systems, a means is necessary to
differentiate two nodes in disparate networks under management that
have been natively assigned the same address.
[0049] The most widely used method for communicating with a
privately-addressed node from outside its own addressing domain is
"Network Address Translation" (NAT). However, NAT is a
session-based protocol in which sessions are generally initiated
from the inside. This is not sufficient for management, where
contact must often be initiated from outside the managed network.
Another approach is for the NAT router, or a proxy server, to
forward communications in accordance with special data entries, but
this effectively leaves a "hole" in the enterprises firewall, and
thus poses administrative burdens and security risks. Another
workaround would be to reassign all affected networks to a large
address space, such as 5.0.0.0/8. However, such a change requires
that everything on the network be migrated to the new address
schema all at once, which can be prohibitively resource-intensive
and expensive.
[0050] One embodiment of the present inventions solves this problem
by the following technique: [0051] Deploying a system (e.g., a RIG)
local to the topology being managed [0052] Abstracting and tagging,
on the RIG, the names and attributes in the infrastructure local to
the RIG [0053] Naming the RIG with a unique ID (e.g., CL1-RIG1),
plus a timestamp (e.g., 2008-0601-21:33:17.04) [0054] Combining
said name with the private address of each infrastructure element
to form a new "address" for purposes of common management of the
networks [0055] Publishing the management addresses in an element
list on the RIG in a manner accessible to the upstream
registrar
[0056] In this way, an upstream parent (either another RIG or the
CIP) can query any downstream RIG (based on authentication and
applicable policies), for directory information. The upstream
parent can then use those addresses to direct commands to elements
internal to the RIG's local network. All such commands will go
through the local RIG, which to that extent acts as a proxy. The
same addressing scheme will also enable the upstream parent to
communicate with additional RIGs downstream of the first RIG. For
example, CIP 120 can send a command destined for a device in the
local network infrastructure of RIG 130. CIP 120 "knows" the
address of the destination device because the directory of RIG 130
is published to RIG 110 and in turn published to CIP 120, and can
thus address a command to a device local to RIG 130 by sending that
command through RIG 110 (however, how that command is routed is a
function of the SRSTP protocol (discussed below), and not of the
addressing per se).
[0057] Routing Method and Protocol
[0058] Another issue presented by the architecture of FIG. 1 is
routing, as already suggested by the above discussion of
addressing. The issue is how to route commands, and the results of
executing commands, in a system in which there have been deployed a
plurality of software modules, e.g., modules for local network
management, for the purposes of obtaining the effective ability to
manage the entire collection of modules (and associated elements)
centrally. This requires a flexible, network-enabled mechanism for
routing commands in a modular software system. More generally, in
order to fully realize the functionality necessary to manage a
network as depicted in FIG. 1, a method is needed for intermodule
communication and management that is capable of navigating
arbitrarily complex topologies without comparably complex
pre-arrangements for communications and management.
[0059] For example, it can be seen with reference to FIG. 1 that in
order to manage networks 101, 102, etc. it is necessary to be able
to route various management commands to all areas of the network,
and the network may be "layered" through a depth of RIGs. This is
shown in most simple form in FIG. 1 as the chain of RIGs 110 and
130, but of course this structure could be extended to arbitrary
depth, and the entire infrastructure would have to be subject to
management.
[0060] Most typically, commands are executed in a network
environment with protocols such as RPC, RMI, Corba, JMS (Java
messaging service), SOAP, XML-RPC (and other similar protocols).
However, these are point-to-point protocols and have no routing
other than the routing otherwise provided in the environment in
which the command is invoked. In the present case, such routing
does not necessarily exist. For the reasons discussed in general
above, it is not desirable to have to establish such general
routing, where it is not otherwise required, simply to enable
management functions. In addition, when managing centrally, there
is a need to maintain a separation of different customer networks,
for security purposes.
[0061] Commands could be routed in a complex system by chaining a
series of interactive protocols, such as telnet or SSH, and
"hopping" to the destination device. Similarly, one could manually
construct the requisite sockets and tunnels. However, making
provisions for such communications has the administrative and
security drawbacks previously discussed.
[0062] A type of distribution similar in some ways to what is
contemplated here was historically done for mail routing, with the
Unix-to-Unix Copy (UUCP) mail delivery protocol. A mail message
destined for a user on machine box3 which was not local but
connected through machine box2 would be addressed to box2!box3!user
(referred to as "bang" protocol). However, the UUCP protocol was
unidirectional. If used to address a command, it could not return
the result of executing the command, and thus would be lacking for
network management.
[0063] FIG. 2 is a block diagram showing socket and channel
connections for the routing method and protocol used in one
embodiment of the invention. Channel Master Instances 201, 202 and
203 represent RIGs. Channel Master Instance 203 is a specialized
RIG that primarily functions to provide a control console and GUI
interface. Channel Master Instance 201 may be an ordinary RIG or a
CIP (with additional functional elements not shown). In addition,
Channel Master Instances may be chained to greater depths than
shown in FIG. 2, by adding Channel Master Instances and connecting
them to additional Channel Connections on an upstream Channel
Master Instance, e.g., an additional Channel Connection (not shown)
similar to Channel Connections 221, 222.
[0064] Modules 1, 2 and 3 shown on each of Channel Master Instances
201 and 202 represent devices local to their respective Channel
Master Instances. ComStruc Interfaces 231, 232 are the respective
interfaces between the Channel Master Instances 201, 202 and the
associated Modules.
[0065] Each Channel Master Instance has one or more Channel
Connections, e.g., Channel Connections 221, 222, 225 and 226 to
other Channel Master Instances. Preferably, the actual connections
between these elements is by way of an SSL tunnel, though
encryption is not strictly necessary. Each Channel Master Instance
other than one having full GUI facility will usually have an
associated Command Line Interface, e.g., 241, 242, referred to in
FIG. 2 for historical reasons only as a "Maritime Terminal".
[0066] Each Channel Master Instance also has communications
interfaces called CSockets (251, 252, etc.), through which it
communicates with external devices and interfaces. Some of the
CSockets, e.g., 252, 253, are connected in sets of a plurality of
CSockets to the corresponding channel connection, reflecting the
fact that a number of different management processes can be routed
through the same channel connection.
[0067] The routing system underlying FIG. 2 is command based.
Ultimately, each message routed delivers a command to be executed
on the receiving end of the routing chain. These commands are
forwarded through CSockets. The result is a hybridization of a
command with a bi-directional socket.
[0068] The commands employed in the exemplary system comprise a
large number of total commands, and are arranged in a tree
structure, similar in some respects to the Microsoft.RTM. NT.TM.
NET command, but with more options. They are called "ComStruc"
commands. A list of a number of exemplary ComStruc commands, which
illustrate the functionality and syntax of this command hierarchy,
is set forth in the Appendix attached hereto.
[0069] As seen in Table 1 of the Appendix, in the preferred
embodiment, the ComStruc commands form a tree structure, with the
"leaves" of the tree being actual commands, and the "branches"
being containers (or categories) for commands. The command is fully
specified by concatenating the strings from the root to the desired
leaf, and adding any necessary parameters. An example of such a
command (absent the routing path element) is "tools restart". In
this example, "tools" is a container, and "restart" is a target
(and a ComStruc command). An address would be given as a parameter.
The effect of the command would be to restart the service at the
address specified. As can be seen, many other commands are
provided. Examples of parameters are: IP addresses, names of
devices, user names, port designations, etc.
[0070] The objective is to pass commands down recursively to the
desired target module. The routing is specified together with the
desired command, in the SRSTP protocol. The routing path is a
"bang" ("!")-delimited series of server (RIG) names.
[0071] The SRSTP protocol has the following general structure (the
format of the following description will be readily appreciated by
those familiar with BNF and/or "man pages"):
SRSTP Packet: [!SERVER1NAME] [!SERVER2NAME . . . ] ComStruc Command
[PARAMS]
[0072] ComStruc Command: container+ComStruc Command.parallel.target
PARAMS: string* string: nonspacestring.parallel.nonspacestring+
[0073] CSocket extends the Java Socket class, but this is done for
purposes of compatibility rather than communications functionality.
CSocket is based on the simplest, non-implementation invoking
variant of Socket. Communications functionality similar to that of
a Socket is provided, but independently and not by inheritance.
[0074] A CSocket's constructor accepts a ComStruc command as a
parameter. If the command has no routing explicitly specified, it
is passed to local Channel Master Instance, which passes it to the
to the local ComStruc tree to find the target and execute it if
possible (locally). If routing is specified, the command is still
passed to the Channel Master Instance (e.g., 201), but is then
passed to the Channel Connection (e.g., 222) whose name matches the
first routing command. It strips off its own name (the first name
in the routing string received) and passes it across an SSL
connection to a peered Channel Connection (e.g., 225). That Channel
Connection then passes the command to its local Channel Master
Instance (in this example, 202). The same process is then repeated
on this Channel Master instance, forwarding the packet again if
necessary, otherwise executing it locally. Since each Channel
Master Instance has the same core functionality, this process may
be continued indefinitely in a recursive manner, to traverse the
entirety of the network, to the extent that Channel Master
Instances have been deployed.
[0075] Results of command execution are passed back in the same
manner as for an ordinary Socket (but not using the implementation
of Socket, using CSocket's own implementation instead). A
completion message is also sent from the target to close down the
CSocket connections.
[0076] In more general terms, the preferred embodiment described
above provides a method for routing commands in a modularized
software system, comprising: [0077] Expressly or implicitly
specifying a route [0078] Specifying a command [0079] Invoking a
socket with said route and command as parameters [0080] Routing the
command and parameters in accordance with said route [0081]
Executing the command with its parameters at the route target
[0082] Returning any results of said execution back through said
route [0083] Closing down said route when said execution is
completed
[0084] The commands in the foregoing method may also be provided in
a hierarchy of containers and commands. The links of the route are
tunneled, preferably over SSL.
[0085] It can also be seen, in light of the foregoing discussion,
that a system for implementing the SRSTP protocol as described
above generally provides [0086] An application that implicitly or
explicitly specifies route and command and invokes the socket with
the route and command as parameters [0087] One or more local
facilities, each comprising [0088] A channel master that sets up
routing by matching specified routing with open channel connections
[0089] A channel connection that communicates the rest of the route
and command to another channel connection, and [0090] A target
within the last one of said instances that executes the command
[0091] In addition, it should be noted before moving on to the next
topic of discussion that one of the ComStruc commands provided in
the preferred embodiment, as set out in Table 1 of the Appendix, is
the localConnect command. Using localConnect on each end of a
CSocket chain established over SRSTP allows virtually any service
or network operation (e.g., maintenance) to be tunneled through the
SSL connections set up between the sockets, without the need for a
VPN. For example, this mechanism can easily be used to establish a
telnet or SSH interactive session between a CIP console and a
resource deep within a managed network, or a Remote Desktop
Protocol (RDP) session to remotely control a computer in that
network (including without limitation conducing any local network
management operations through that computer), and so forth.
[0092] In addition, in a similar manner, the entire communications
structure reflected in FIG. 2 could be deployed in tandem with an
operational support system (OSS) to serve as a proxy server
providing a means for the OSS to access the serviced networks.
[0093] It should be apparent from the foregoing that SRSTP provides
a flexible foundation for a network management application,
particularly for remotely and centrally managing and supporting
disparate networks.
[0094] In addition, the distributed information gathering provided
by the present invention allows network managers to understand the
operational state of managed elements, which may be geographically
distributed across a given network, from the local perspective of
the observed element. Furthermore, such distributed information
gathering avoids introducing measurement artifacts, such as
artificial latency.
[0095] "Publish and Subscribe" Mechanism
[0096] We turn now to methods by which a management system for a
plurality of disparate networks can remotely view real time
information about multiple network management processes. This
capability is important for a range of applications, and most
fundamentally, in order to be able to effectively monitor events in
the networks being serviced.
[0097] Prior solutions to this problem, to the extent even
attempted, were to continuously refresh a global display or
database of all network events, or to limit the event data
acquisition to refresh one source at a time. Neither approach is
fully satisfactory. The former approach is not selective and is not
scalable. The latter approach inherently concedes any ability for
real time monitoring.
[0098] The present invention, in one embodiment, uses what might be
called a "publish and subscribe" (or alternatively, a "subscribe
and push") mechanism for remotely monitoring events in a plurality
of disparate networks.
[0099] FIG. 3 is a block diagram showing an exemplary set of server
components and client applications that implement a publish and
subscribe mechanism to acquire event data from remote networks in
real time, and to display the data in a management application.
GXListClient 301 is a client application, for example a management
console application on CIP 120 (as in FIG. 1), or an upstream RIG.
GXListServer System 310, GXDataSource 311, ComStrucTargets 312 and
ListSessions 313, etc. all reside on managed network 320.
GXListClient 301 communicates with the managed network 320 over
ComStruc Tunnel 303, in the manner discussed above in connection
with FIG. 2. The routing is the same as discussed in connection
with FIG. 2, but for simplicity FIG. 3 shows the ComStruc Tunnel
terminating in ComStruc Targets 312, which is the command tree
discussed in connection with FIG. 2 (and shown in FIG. 2 as
ComStruc Interface 232). A table is maintained in GXDataSource 311
to hold status information on each monitored process on managed
network 320. A GXListServer System, e.g., 313, exists for each such
table.
[0100] To initiate the publish and subscribe procedure, a
GXListClient, e.g., 301, sends a ComStruc DATA GXSTREAM CONNECT
message over ComStruc Tunnel 303 to ComStruc Targets 312. That
command goes to GXListServer System 310. GXListServer System 310
instantiates a List Session, e.g., ListSession 313.
[0101] (Phase 1) On instantiation, ListSession 313 goes into a
loop, listening for a request to change tracks (track change)--a
request for certain columns using a certain filter. The requester,
in this case, GXListClient 301, then sends a track change request
(GXQUERY). GXListClient uses a CSocket (as in FIG. 2) to make the
track change request.
[0102] ListSession 313 receives the GXQUERY query command and goes
into "dumping mode"--whereby it collects all responsive information
for the element subscribed to and sends it back to the requester
(301) through ComStruc tunnel 303, and also reports its progress to
the requester. ListSession 313 also maintains a record of the
current query. At this point, a "subscription" for specified
updates on a specified network process has been established.
[0103] (Phase 2) GXListServer 310 is responsible for maintaining
the relevant table. Database updates, destined for GXDataSource
311, go through GXListServer 310. Each database update request also
goes to each and every ListSession object, 313, etc. Within
ListSession object 313, etc., the update request is matched against
a filter and requested column names. If there is a match (i.e., if
the database server is updating data that has been subscribed to)
the update information (which can be add, remove or change) is sent
to the GXListClient (e.g., 301), at about the same time as the
actual database update is made. In other words, after information
has been subscribed to, the "middleware" process that updates the
local table (i.e., GXListServer 310) also copies the new data to a
socket (i.e., the CSocket established by the ComStruc message),
directed to the subscriber. To avoid any overflow, the update
transmission goes through a queue. In this manner, the requested
information is "published" (or "pushed") to the requester.
[0104] At any time while the socket is open, the GXListClient 301
can request a new filter and new columns, in which case there will
be a new dump and then updates (phase 2).
[0105] FIGS. 4-13 show selected screen displays of an exemplary
"management console" application that may be run from CIP 120,
taking advantage of the "publish and subscribe" mechanism described
above, as well as the addressing and routing techniques discussed
herein. In the examples shown, the network in question handles
voice-over-IP (VOIP) telephony, as well as data communications.
[0106] FIG. 4 shows a typical GUI screen 400 of the management
console. The upper left-hand panel 401 of the screen shows a list,
411, 412, etc. of disparate networks under management, belonging to
different companies. Below that, in area 407, is a status summary
showing the number of servers at each of a plurality of status
levels, and the associated icon. As can be seen, in this example
all five servers being observed are in a "good" status, with the
corresponding "green light" icon 408 shown next to the
corresponding entries 411, 412, etc. in upper left-hand panel 401.
The right-hand panel 402 (divided into upper and lower sections,
403 and 404) shows a summary of "alarms" that require an operator
response for all customers. The displayed alarms can also be
filtered through filter box 405. For each alarm there is shown a
set of data in tabular form, including the server on which the
alarm occurred (421), the top node of the chain of resources
("dependency tree") that depend on the server (422), the alert
level (e.g., 0-5) (423), the status (e.g., New, Responded to,
Closed) (424), a response field (425) indicating who responded to
the alarm, a diaryEntry field (426), which is a link to a table
with a more detailed description, and other information. The
top-right panel (403) summarizes all current alarms that have not
been responded to; the bottom-right panel (404) shows alarms
responded to. When an alarm has been resolved, its display
disappears from this display. By clicking a mouse on one of the
network entries 411, 412, etc. in upper left-hand panel 401 a user
of the management console can select one of the managed
networks.
[0107] FIG. 5 shows a screen that is displayed after the user of
the management console has selected one of the networks as
discussed above in connection with FIG. 4. From this screen, the
user may view the status of the network using a variety of tools or
use the RIGs' ability to temporarily bridge the client computer
with the remote network to use desktop sharing applications or run
management applications. By default, this view, in the embodiment
shown, displays an event summary for the selected network, in this
case, HHR (511). The content of this display is provided through
"publish and subscribe" mechanism discussed above. The content is
dynamic, and continually refreshes in real time. A plurality of
other displays can be swapped in and out of panel 502 by clicking
on the icons 531 etc. in main menu 530 on the upper right panel
503. The event summary display shown can also be reached by
clicking the Views button 532 and then clicking on "Summary" (541).
The listed event lines 561 etc. are each color-coded, corresponding
the "Max Alert" level on the device in question. Max Alert means
the highest alert level in the device's dependency chain. For each
event, there is a time display 571, a "text_time" display 572,
which is specified local to the device being reported, the eventID
573, which specifies the event type, the local device name,
referred to in this view as subDeviceName 574, the network,
referred to in this view as deviceName 575 (because the network is
a "device" to the upstream RIG or CPI), and other information. In
this embodiment, events are "merged" if possible. This means that
events considered "mergeable", such as successive good pings, just
have their times updated and prior event time shown, rather than
cluttering the display with new events. In such cases, there is an
entry in last_text_time 577 for the time of the preceding merged
event. The row of items in panel 503 beginning with Summary 541 are
links to other displays, including a number of the displays
discussed below.
[0108] FIG. 6 shows a management console screen for monitoring
events on one of a plurality of disparate networks being
simultaneously monitored. When a particular customer network is
selected, right-hand panel 504 in FIG. 5 displays a top control bar
503 and a lower screen in space 504, which contains the component
view selected by the user from the control bar. The user selects
(for instance) "Views" 532 from the main menu 530, then "Events"
542 from the submenu 540, and the Event Viewer component would
replace the "summary view" component in the component view section,
panel 504. The management system has subscribed to events on a
number of the managed networks, but FIG. 6 as shown reflects a
display limited to one particular customer network ("HHR" 511). The
event lists shown in FIG. 6 are dynamic, and update automatically
in real time, in accordance with the methods illustrated in FIG. 2.
The "Filter" element 605 is an all-column filter to enable a quick
filter based on a key that can appear in any column. The upper
display panel 603 contains a list of events not acknowledged yet,
and for each, a time 661, eventID 673, local device name
(deviceName) 674, the service, if any affected 678, the relevant
agent IP address (agentIp), if any 679, and other information. The
bottom pane 604 shows a list of all events in a time range
adjustable by drop-down control 691, shown here as six hours.
Columns in panels 603 and 604 (and in similar other displays) can
be moved left and right by GUI controls. The left-most column acts
as a sort key for the list. By default, the sort key is the time
column.
[0109] FIG. 7 shows a "system monitor" type graphic display,
showing a display of the volume of port usage on the managed system
as a function of time. This screen is also reachable from the
Monitors link 543 shown in FIG. 5. The displays appear as moving
graphs that scroll from right to left, updating from the right in
real time, again in accordance with the methods illustrated in FIG.
2. This particular display shows the usage of port 8 in slot 1 of a
device over a selected time range (per drop-down control 791) of 12
hours. The Y-axis 751 is in bits per second. Lower panels 705 and
706 show where the current view (703, 705) fits into a longer
time-line 706. The view time frame may also be adjusted by
click-and-drag action on panels 705 and 706. The reported
bits-per-second numbers 709, etc., displayed in semi-transparent
windows in this illustration, may alternately be displayed to the
right of where the dynamic display traces begin, so as not to
overlap the traces.
[0110] FIG. 8 is an exemplary depiction of a screen display showing
a "dashboard" view of a managed network, comprising a network map
and a display of elements. Left panel 801 shows a network map, with
lines 821, 822, etc. reflecting lines of communication and control.
In this case, a CM 831 is shown, connected to Local Survivable
Processors (LSPs) 841, 842, etc. LSPs 841, 842, etc. are programmed
to assume control of themselves if CM 831 is disabled or loses
connectivity. In such an event, the upstream RIG (not shown), which
is normally connected to CM 831, will attach directly to the LSPs
841, 842, etc. and the former lines of control 822, etc. from CM
831 will disappear. The right-hand panel 802 of FIG. 8 shows the
top-level network elements (each of which is a dependency tree),
with icons for their status. Links 851, 852, etc. along the bottom
of right-hand display panel 802 are links to other "dashboard"
displays for panel 802, or which may be displayed in their own
windows, which each provide a panel of concentrated, high-level,
real-time information about the monitored network(s).
[0111] FIG. 9 is an exemplary depiction of a screen display showing
health metrics for a central Communication Manager (CM) processor.
It can be selected from Processor link 854 in FIG. 8. It shows
percent processor idle (961), percent processor service maintenance
(962), percent processor usage for phone calls (963), and other
information.
[0112] FIG. 10 is an exemplary depiction of a screen display
showing phone traceroutes with QOS display. This screen can be
reached by clicking Phone QOS 545 in FIG. 5 and then "Traces" on an
intermediate screen (not shown) listing the phones. Double-clicking
on an entry in that phone list will bring up the display shown in
FIG. 11 below. FIG. 10 shows graphical traceroute depictions for
all phones. The phones can be filtered through filter control 1005.
The lines of each traceroute 1041, 1042, etc. will change color in
accordance with the current quality of service (QOS), which is a
function (calculated in accordance with methods well known in the
art) of packet loss, round-trip delay and interarrival jitter.
[0113] FIG. 11 is an exemplary depiction of a screen display
showing QOS detail for one phone traceroute, including round-trip
delay 1151, packet loss 1152a and 1152b, and jitter 1153a and
1153b. The upper and lower displays, 1103 and 1104 of jitter and
packet loss reflect the corresponding metrics at each end of the
traced route (e.g., a media processor and a phone).
[0114] FIG. 12 is an exemplary depiction of a screen display
showing a policy setup module. "Policies" can be put in place to
trigger standardized actions based on events, such as reports,
event handling, and the like. A policy is programmed as a flow
chart and functions in the nature of a script. A policy is authored
through GUI controls accessible by mouse (in the example shown, by
right-click-accessible menus) in display panel 1203. Each created
policy is listed in panel 1202. This screen is reached from the
Setup tab 419 in FIG. 4 (Setup->Policy). The policy shown in the
displayed flowchart 1210 is for phone recording to "virtual"
extensions (because physical phones are not needed for message
recording). The policy generates a new event to cancel an event
representing a failure for a range of virtual extensions unless,
per "IF" conditions 1211, 1212, a status of in-service/on-hook or
in-service/off hook is observed, in which case the event is
cancelled. The policy causes the active event list to be scanned
for softphone failures, and checks to see if the softphone is
failed. If not, it sends a new event to cancel the "failed" event.
Thus, once established, each policy continuously enforces its
specified conditions based on events monitored in real time in
accordance with the protocols described in connection with FIG.
2.
[0115] FIG. 13 is an exemplary depiction of a screen display
showing a service level monitor. This display is reachable by
clicking View->Service Level starting at the View link 532 in
FIG. 5. The display of FIG. 13 can appear in a separate window or
in panel 504 of FIG. 5. FIG. 13 shows current service levels (1311)
over a time frame selectable by controls 1391, plus a rolling
average display 1312 of the monitored service level over the time
range (per control 1392), and other information. Again, this
display dynamically shows service levels for the monitored
network(s) and resource(s) in real time.
[0116] It should be apparent that the operational example
illustrated in FIGS. 4-13, incorporating the technologies disclosed
above in connection with FIGS. 1-3, fully realizes a converged
monitoring and management platform, in accordance with the objects
of the invention, provided in the form of a service, that can view
events in and/or manage a plurality of disparate networks in the
aggregate, or any one of them individually, overcoming obstacles
that prevented such a system from being offered in the past, such
as addressing conflicts, inability to route within the constituent
networks without substantial network changes or undesired
additional infrastructure, artifacts arising from remote network
measurements and observations, and gaps in knowledge resulting from
the lack of continuous connectivity to each managed network.
[0117] Although the present invention has been described in detail,
it should be understood that various changes, substitutions, and
alterations may be readily ascertainable by those skilled in the
art and may be made herein without departing from the spirit and
scope of the present invention as defined by the following
claims.
APPENDIX A
TABLE-US-00001 [0118] TABLE 1 ComStruc Command Hierarchy Command
Further Description Policy Policy Module policy data The data
repository for policy policy data savenow Save this data source (in
some cases) policy data size [list_name] get the number of rows in
a list policy data list [list_name] print a specific table to the
output policy data proxyStream Advanced query socket policy data
describe List the tables in the repository policy data gxstream
Realtime feed stream interface policy data gxstream connect
table_name policy data importCsv [list] [filename] Import a CSV
file and add it to the list policy data exportCsv [list] [filename]
Export a CSV file from the list policy data clearAll Clear the
server's contents policy data switchAndCopyDataSource [data source]
Switch the data source for this server to this data source policy
data proxy Service connections from the GXProxyServer object policy
data proxy query query a specific table for an index array policy
data proxy tableMeta get the table names in this module policy data
proxy tableColumns [tableName] Get the fields for this table policy
data proxy tableEntryMeta get the table names in this module policy
data proxy update updates an entry in the list policy data proxy
entry gets an entry from the list policy data proxy deleteRow
deletes a row in the server policy data proxy addRow add an entry
to the list policy data oneWayReplicate The UDP Replication Branch
policy data oneWayReplicate list List the current replication rules
policy data oneWayReplicate flushAll Flush all replicated tables to
remote servers policy data oneWayReplicate prepareAll Send table
preperations to remote server policy data oneWayReplicate add
[destination] [port] [module] [tableName] policy data
oneWayReplicate remove [index] policy data oneWayReplicate
flushOnly Flush one of the tables to remote servers policy data
implementation more commands related to the implementation of the
lists policy addPolicy [policyName] [policyType] policy
removePolicy [policyName] policy enablePolicy [policyName] policy
disablePolicy [policyName] policy addNode [policyName] [type]
[xposition] [yposition] policy addDoNode [policyName] [type]
[xposition] [yposition] policy removeNode [index] policy moveNode
[index] [x] [y] policy updateConnection [index ofConnection]
[newEndNode connection index] policy addNodeOutput [policyName]
[nodeIndex] policy removeNodeOutput [policyName] [nodeIndex]
[nodeOutputIndex] policy moveNodeOutputParamUp [policyName]
[nodeIndex] [nodeOutputIndex] policy moveNodeOutputParamDown
[policyName] [nodeIndex] [nodeOutputIndex] policy
updateNodeOutputParams [policyName] [nodeIndex] [nodeOutputIndex]
[param] [description] policy getPolicyCycleTime [policyName] policy
setPolicyCycleTime [policyName] [cycleFrequency] policy
getParam1Value [nodeIndex] policy getParam2Value [nodeIndex] policy
getParam3Value [nodeIndex] policy updateParam1Value [nodeIndex]
[value] policy updateParam2Value [nodeIndex] [value] policy
updateParam3Value [nodeIndex] [value] policy changeDoNodeType
[policyName] [index] [newType] policy updatePolicyParameter [index]
[value] [valIsVar] policy dispatchCyclePolicy [policyName] chat
Conet Chat Module chat data The data repository for chat chat data
savenow Save this data source (in some cases) chat data size
[list_name] get the number of rows in a list chat data list
[list_name] print a specific table to the output chat data
proxyStream Advanced query socket chat data describe List the
tables in the repository chat data gxstream Realtime feed stream
interface chat data gxstream connect table_name chat data importCsv
[list] [filename] Import a CSV file and add it to the list chat
data exportCsv [list] [filename] Export a CSV file from the list
chat data clearAll Clear the server's contents chat data
switchAndCopyDataSource [data source] Switch the data source for
this server to this data source chat data proxy Service connections
from the GXProxyServer object chat data proxy query query a
specific table for an index array chat data proxy tableMeta get the
table names in this module chat data proxy tableColumns [tableName]
Get the fields for this table chat data proxy tableEntryMeta get
the table names in this module chat data proxy update updates an
entry in the list chat data proxy entry gets an entry from the list
chat data proxy deleteRow deletes a row in the server chat data
proxy addRow add an entry to the list chat data oneWayReplicate The
UDP Replication Branch chat data oneWayReplicate list List the
current replication rules chat data oneWayReplicate flushAll Flush
all replicated tables to remote servers chat data oneWayReplicate
prepareAll Send table preperations to remote server chat data
oneWayReplicate add [destination] [port] [module] [tableName] chat
data oneWayReplicate remove [index] chat data oneWayReplicate
flushOnly Flush one of the tables to remote servers chat data
implementation more commands related to the implementation of the
lists chat chatConnection [userId] InfoSpot The server information
hub InfoSpot Site Get the configured site for this server InfoSpot
Customer Get the configured customer for this server InfoSpot
Description Get the configured description for this server InfoSpot
Reload Reload the properties from the file InfoSpot SetProp [key]
[value] set an info spot property and save it to file InfoSpot
GetProperty [key] get the value of the property InfoSpot
StartConnection Start (or restart) the connection to mother
permission getOrg get the organization this user is in permission
getAuthentication [username] [password] get the authentication for
this cluster system permission add Add a new user/group permission
add user [username] [password] permission add group [groupname]
permission remove Remove user/group permission remove user
[username] permission remove group [groupname] permission change
Remove user/group permission change org [username] [organization]
permission removeGroup [index] permission addGroupToUser [username]
[group] permission removeGroupFromUser [index] permission permit
Permit user/group access to command permission permit user
[username] [command] permission permit group [groupname] [command]
permission deny Deny user/group access to command permission deny
user [username] [command] permission deny group [groupname]
[command] permission groupadd [groupname] [username] tools Tools
used for server administration tools restart Restart the server
tools machineRestart Restart the entire machine tools MyName Return
the name of the server hosting this ComStruc tools MyIP Returns the
ip address of this server tools MyPhys Returns the physical address
of this server tools tail [filename] Tail a log file tools
memProfile Shows the memory profile for this server tools
localConnect [destination] [port]
* * * * *