U.S. patent application number 11/169097 was filed with the patent office on 2005-12-22 for method and apparatus for publishing and monitoring entities providing services in a distributed data processing system.
Invention is credited to Chess, David M., Snible, Edward Charles, Whalley, Ian Nicholas.
Application Number | 20050283484 11/169097 |
Document ID | / |
Family ID | 31993022 |
Filed Date | 2005-12-22 |
United States Patent
Application |
20050283484 |
Kind Code |
A1 |
Chess, David M. ; et
al. |
December 22, 2005 |
Method and apparatus for publishing and monitoring entities
providing services in a distributed data processing system
Abstract
A method, apparatus, and computer instructions for providing
identification and monitoring of entities. A distributed data
processing system includes one or more distributed publishing
entities, which publish computer readable announcements in a
standard language. These announcements may contain a description of
a monitoring method that may be used to monitor the behavior of one
or more distributed monitored entities. These announcements also
may include information used to identify a monitoring method that
may be used by the distributed monitored entity to monitor its own
behavior or by a distributed consumer entity to monitor the
behavior of the distributed monitored entity. The monitoring also
may be performed by a third-party distributed monitoring
entity.
Inventors: |
Chess, David M.; (Mohegan
Lake, NY) ; Snible, Edward Charles; (New York,
NY) ; Whalley, Ian Nicholas; (Pawling, NY) |
Correspondence
Address: |
DUKE. W. YEE
YEE & ASSOCIATES, P.C.
P.O. BOX 802333
DALLAS
TX
75380
US
|
Family ID: |
31993022 |
Appl. No.: |
11/169097 |
Filed: |
June 28, 2005 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11169097 |
Jun 28, 2005 |
|
|
|
10252816 |
Sep 20, 2002 |
|
|
|
Current U.S.
Class: |
1/1 ;
707/999.01 |
Current CPC
Class: |
H04L 29/06 20130101;
H04L 67/26 20130101; H04L 69/329 20130101; H04L 67/10 20130101 |
Class at
Publication: |
707/010 |
International
Class: |
G06F 017/30 |
Claims
1-20. (canceled)
21. A method for providing testing in a distributed data processing
system, the method comprising: responsive to a request from a
client for information for a selected service, identifying a
registered entity providing the selected service; and sending a
reply to the client, wherein the reply includes information
identifying the registered entity providing the selected service
and a monitoring method for the entity, wherein the information is
in a computer readable format, and wherein the information allows
the client to monitor the registered entity providing the selected
service.
22. The method of claim 21, wherein the identifying and sending
steps are performed in at least one of the registered entity and a
distributed publishing entity.
23. The method of claim 21 further comprising: receiving a request
to register the selected service from the registered entity,
wherein the request includes information about the selected service
and how the registered entity can be monitored for proper operation
of the selected service; and responsive to receiving the request,
registering the selected service, wherein the registered entity
providing the selected service may be identified in response to the
request from the client.
24. The method of claim 21, wherein the client is the registered
entity providing the selected service.
25. The method of claim 21, wherein identifications of registered
entities are stored in a directory.
26. The method of claim 21 further comprising: storing monitoring
information on registered entities.
27. The method of claim 21, wherein the monitoring information
includes at least one of registered entities currently being
monitored, registered entities previously monitored, and registered
entities expected to be monitored.
28. The method of claim 21 further comprising: sending the
monitoring information to the client in response to a request from
the client for the monitoring information.
29. The method of claim 21, wherein the monitoring method includes
at least one of an identification of a monitoring interface in the
registered entity, sending a request to the monitoring interface in
which a response indicates that the registered entity is
functioning correctly, sending a request to the monitoring
interface in which a response indicates that at least one service
in the registered entity is functioning correctly, sending an
invalid request to the registered entity in which a selected error
is expected, sending a pattern of data to a port in the registered
entity in which a particular pattern is expected in response to the
pattern of data, sending a request to the registered entity in
which a response is expected within a selected period of time to
indicate that the registered entity is functioning correctly, a
program, a PERL script, a RMI client, a RMI stub, and a binary
executable.
30-57. (canceled)
58. A data processing system for providing testing in a distributed
data processing system, the data processing system comprising:
identifying means, responsive to a request from a client for
information for a selected service, for identifying a registered
entity providing the selected service; and sending means for
sending a reply to the client, wherein the reply includes
information identifying the registered entity providing the
selected service and a monitoring method for the entity, wherein
the information is in a computer readable format, and wherein the
information allows the client to monitor the registered entity
providing the selected service.
59. The data processing system of claim 58, wherein the identifying
and sending means are performed in at least one of the registered
entity and a distributed publishing entity.
60. The data processing system of claim 58 further comprising:
receiving means for receiving a request to register the selected
service from the registered entity, wherein the request includes
information about the selected service and how the registered
entity can be monitored for proper operation of the selected
service; and registering means, responsive to receiving the
request, for registering the selected service, wherein the
registered entity providing the selected service may be identified
in response to the request from the client.
61. The data processing system of claim 58, wherein the client is
the registered entity providing the selected service.
62. The data processing system of claim 58, wherein identifications
of registered entities are stored in a directory.
63. The data processing system of claim 58 further comprising:
storing means for storing monitoring information on registered
entities.
64. The data processing system of claim 58, wherein the monitoring
information includes at least one of registered entities currently
being monitored, registered entities previously monitored, and
registered entities expected to be monitored.
65. The data processing system of claim 58, wherein the sending
means is a first sending means and further comprising: second
sending means for sending the monitoring information to the client
in response to a request from the client for the monitoring
information.
66. The data processing system of claim 58, wherein the monitoring
method includes at least one of an identification of a monitoring
interface in the registered entity, sending a request to the
monitoring interface in which a response indicates that the
registered entity is functioning correctly, sending a request to
the monitoring interface in which a response indicates that at
least one service in the registered entity is functioning
correctly, sending an invalid request to the registered entity in
which a selected error is expected, sending a pattern of data to a
port in the registered entity in which a particular pattern is
expected in response to the pattern of data, sending a request to
the registered entity in which a response is expected within a
selected period of time to indicate that the registered entity is
functioning correctly, a program, a PERL script, a RMI client, a
RMI stub, and a binary executable.
67-84. (canceled)
85. A computer program product in a computer readable medium for
providing testing in a distributed data processing system, the
computer program product comprising: first instructions, responsive
to a request from a client for information for a selected service,
for identifying a registered entity providing the selected service;
and second instructions for sending a reply to the client, wherein
the reply includes information identifying the registered entity
providing the selected service and a monitoring method for the
entity, wherein the information is in a computer readable format,
and wherein the information allows the client to monitor the
registered entity providing the selected service.
86-87. (canceled)
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] The present invention is related to the following
applications entitled: "Method and Apparatus for Automatic Updating
and Testing of Software", Ser. No. ______, attorney docket no.
YOR920020174US1; "Composition Service for Autonomic Computing",
Ser. No. ______, attorney docket no. YOR920020176US1;
"Self-Managing Computing System", Ser. No. ______, attorney docket
no. YOR920020181US1; and "Adaptive Problem Determination and
Recovery in a Computer System", Ser. No. ______, attorney docket
no. YOR920020194US1; all filed even date hereof, assigned to the
same assignee, and incorporated herein by reference.
BACKGROUND OF THE INVENTION
[0002] 1. Technical Field
[0003] The present invention relates generally to an improved
distributed data processing system, and in particular, to a method
and apparatus for monitoring entities in a distributed data
processing system. Still more particularly, the present invention
provides a method and apparatus for identifying and monitoring
entities providing services in a network data processing
system.
[0004] 2. Description of Related Art
[0005] Modern computing technology has resulted in immensely
complicated and ever-changing environments. One such environment is
the Internet, which is also referred to as an "internetwork". The
Internet is a set of computer networks, possibly dissimilar, joined
together by means of gateways that handle data transfer and the
conversion of messages from a protocol of the sending network to a
protocol used by the receiving network. When capitalized, the term
"Internet" refers to the collection of networks and gateways that
use the TCP/IP suite of protocols. Currently, the most commonly
employed method of transferring data over the Internet is to employ
the World Wide Web environment, also called simply "the Web". Other
Internet resources exist for transferring information, such as File
Transfer Protocol (FTP) and Gopher, but have not achieved the
popularity of the Web. In the Web environment, servers and clients
effect data transaction using the Hypertext Transfer Protocol
(HTTP), a known protocol for handling the transfer of various data
files (e.g., text, still graphic images, audio, motion video,
etc.). The information in various data files is formatted for
presentation to a user by a standard page description language. The
Internet also is widely used to transfer applications to users
using browsers. Often times, users of software packages may search
for and obtain updates to those software packages through the
Internet.
[0006] Other types of complex network data processing systems
include those created for facilitating work in large corporations.
In many cases, these networks may span across regions in various
worldwide locations. These complex networks also may use the
Internet as part of a virtual private network for conducting
business. These networks are further complicated by the need to
manage and update software used within the network. Often times,
interaction between different network data processing systems
occurs to facilitate different transactions. These transactions may
include, for example, purchasing and delivery of supplies, parts,
and services. The transactions may occur within a single business
or between different businesses.
[0007] Such environments are made up of many loosely-connected
software components. These software components are also referred to
as "entities". In a modern complex network data processing system,
innumerable situations exist in which a need arises to test or
monitor the operation of another entity, such as, a particular
running process or a particular service. Currently, a human
operator must test and monitor the proper functioning of entities,
such as important system services, to detect and correct faults and
failures in these entities. In many cases, a service may depend on
other services for its correct functioning. In this case, it is
important to determine whether those other services are functioning
correctly, in order to take steps or produce alerts when the
services are not functioning correctly. For example, a purchasing
entity used for ordering supplies may infrequently require a
selected component from a particular provider. Although this
component is needed infrequently, it is essential to be able to
obtain the component quickly when the need arises. If the provider
changes its inventory and no longer offers the component or if the
order entity used at the provider to generate the order is
unavailable, it is crucial for the purchasing entity to be able to
locate another service. Currently, a human operator is required to
identify a process to test the order entity to determine whether
the order entity is functioning correctly. In this example, the
order entity is functioning correctly if the order entity offers
the selected component as being available in inventory. After
identifying this process, the human operator must monitor the order
entity.
[0008] Currently, the testing and monitoring of computing entities
is performed primarily on an ad hoc basis. A human operator needing
to monitor a particular service will write a monitoring program for
that service or manually search for such a program that someone
else has written to perform monitoring. The monitoring program will
be deployed and configured manually, and the human operator will
manually inspect its output. In some cases the human operator may
wrap the monitoring program in a shell that will automatically take
some action, such as restarting the service, when a problem is
detected.
[0009] Existing maintenance and administration tools such as the
IBM Tivoli Enterprise Console include features such as
administration consoles that display the monitoring status and test
results from a number of different entities, including detected
faults and generated alerts, and allow administrators to specify
actions that should be taken automatically when certain alerts
occur. IBM Tivoli Enterprise Console is available from
International Business Machines Corporation. Standards, such as the
Simple Network Management Protocol (SNMP), specify well-documented
ways of communicating alerts and other system events between
entities. Some modern computing systems, both in hardware and in
software, are designed with testability in mind, and in some cases
either the original manufacturer or one or more third parties
provide specific testing tools or algorithms for testing specific
products.
[0010] Even with these types of maintenance and administration
tools, a human operator is required to identify entities and
methods that are to be used to monitor those entities. Such a
system is time consuming and often may require extensive research
to identify how a service is to be monitored. Therefore, it would
be advantageous to have an improved method, apparatus, and computer
instructions for identifying and monitoring entities providing
services.
SUMMARY OF THE INVENTION
[0011] The present invention provides a method, apparatus, and
computer instructions for providing identification and monitoring
of entities. A distributed data processing system includes one or
more distributed publishing entities, which publish computer
readable announcements in a standard language. These announcements
may contain a description of a monitoring method that may be used
to monitor the behavior of one or more distributed monitored
entities. These announcements also may include information used to
identify a monitoring method that may be used by the distributed
monitored entity to monitor its own behavior or by a distributed
consumer entity to monitor the behavior of the distributed
monitored entity. The monitoring also may be performed by a
third-party distributed monitoring entity.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] The novel features believed characteristic of the invention
are set forth in the appended claims. The invention itself,
however, as well as a preferred mode of use, further objectives and
advantages thereof, will best be understood by reference to the
following detailed description of an illustrative embodiment when
read in conjunction with the accompanying drawings, wherein:
[0013] FIG. 1 depicts a pictorial representation of a network of
data processing systems in which the present invention may be
implemented;
[0014] FIG. 2 is a block diagram of a data processing system that
may be implemented as a server in accordance with a preferred
embodiment of the present invention;
[0015] FIG. 3 is a block diagram illustrating a data processing
system in which the present invention may be implemented;
[0016] FIG. 4 is a diagram illustrating message flows used in
monitoring entities in accordance with a preferred embodiment of
the present invention;
[0017] FIG. 5 is a diagram illustrating message flow used to
monitor entities in which a third-party distributed monitoring
entity is present in accordance with a preferred embodiment of the
present invention;
[0018] FIG. 6 is a flowchart of a process used for identifying and
monitoring an entity in accordance with a preferred embodiment of
the present invention;
[0019] FIG. 7 is a flowchart of a process used by a third-party
distributed monitoring entity to monitor an entity in accordance
with a preferred embodiment of the present invention; and
[0020] FIG. 8 is a diagram illustrating a data structure used in
publishing monitoring methods for an entity in accordance with a
preferred embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
[0021] With reference now to the figures, FIG. 1 depicts a
pictorial representation of a network of data processing systems in
which the present invention may be implemented. Network data
processing system 100 is a network of computers in which the
present invention may be implemented. Network data processing
system 100 contains a network 102, which is the medium used to
provide communications links between various devices and computers
connected together within network data processing system 100.
Network 102 may include connections, such as wire, wireless
communication links, or fiber optic cables.
[0022] In the depicted example, server 104 is connected to network
102 along with storage unit 106. In addition, clients 108, 110, and
112 are connected to network 102. These clients 108, 110, and 112
may be, for example, personal computers or network computers. In
the depicted example, server 104 provides data, such as boot files,
operating system images, and applications to clients 108-112.
Clients 108, 110, and 112 are clients to server 104. Network data
processing system 100 may include additional servers, clients, and
other devices not shown. Server 104 and clients 108-112 may contain
different distributed entities, which may communicate with each
other through network 102. A "distributed entity" is any entity in
a network data processing system that is able to perform functions,
including without restriction, autonomic elements, agents, brokers,
aggregators, monitors, consumers, suppliers, resellers, and
mediators.
[0023] In the depicted example, network data processing system 100
is the Internet with network 102 representing a worldwide
collection of networks and gateways that use the Transmission
Control Protocol/Internet Protocol (TCP/IP) suite of protocols to
communicate with one another. At the heart of the Internet is a
backbone of high-speed data communication lines between major nodes
or host computers, consisting of thousands of commercial,
government, educational and other computer systems that route data
and messages. Of course, network data processing system 100 also
may be implemented as a number of different types of networks, such
as for example, an intranet, a local area network (LAN), or a wide
area network (WAN). FIG. 1 is intended as an example, and not as an
architectural limitation for the present invention. The mechanism
of the present invention may be implemented in any network data
processing system containing different data processing systems,
which communicate with each other.
[0024] Referring to FIG. 2, a block diagram of a data processing
system that may be implemented as a server, such as server 104 in
FIG. 1, is depicted in accordance with a preferred embodiment of
the present invention. Data processing system 200 may be a single
or a symmetric multiprocessor (SMP) system including a plurality of
processors 202 and 204 connected to system bus 206. Alternatively,
a single processor system may be employed. Also connected to system
bus 206 is memory controller/cache 208, which provides an interface
to local memory 209. I/O bus bridge 210 is connected to system bus
206 and provides an interface to I/O bus 212. Memory
controller/cache 208 and I/O bus bridge 210 may be integrated as
depicted.
[0025] Peripheral component interconnect (PCI) bus bridge 214
connected to I/O bus 212 provides an interface to PCI local bus
216. A number of modems may be connected to PCI local bus 216.
Typical PCI bus implementations will support four PCI expansion
slots or add-in connectors. Communications links to clients 108-112
in FIG. 1 may be provided through modem 218 and network adapter 220
connected to PCI local bus 216 through add-in boards.
[0026] Additional PCI bus bridges 222 and 224 provide interfaces
for additional PCI local buses 226 and 228, from which additional
modems or network adapters may be supported. In this manner, data
processing system 200 allows connections to multiple network
computers. A memory-mapped graphics adapter 230 and hard disk 232
may also be connected to I/O bus 212 as depicted, either directly
or indirectly.
[0027] Those of ordinary skill in the art will appreciate that the
hardware depicted in FIG. 2 may vary. For example, other peripheral
devices, such as optical disk drives and the like, also may be used
in addition to or in place of the hardware depicted. The depicted
example is not meant to imply architectural limitations with
respect to the present invention. The data processing system
depicted in FIG. 2 may be, for example, an IBM eServer pSeries
system, a product of International Business Machines Corporation in
Armonk, N.Y., running the Advanced Interactive Executive (AIX)
operating system or LINUX operating system.
[0028] With reference now to FIG. 3, a block diagram illustrating a
data processing system is depicted in which the present invention
may be implemented. Data processing system 300 is an example of a
client computer, such as client 108 in FIG. 1.
[0029] Data processing system 300 employs a peripheral component
interconnect (PCI) local bus architecture. Although the depicted
example employs a PCI bus, other bus architectures such as
Accelerated Graphics Port (AGP) and Industry Standard Architecture
(ISA) may be used. Processor 302 and main memory 304 are connected
to PCI local bus 306 through PCI bridge 308. PCI bridge 308 also
may include an integrated memory controller and cache memory for
processor 302. Additional connections to PCI local bus 306 may be
made through direct component interconnection or through add-in
boards. In the depicted example, local area network (LAN) adapter
310, SCSI host bus adapter 312, and expansion bus interface 314 are
connected to PCI local bus 306 by direct component connection. In
contrast, audio adapter 316, graphics adapter 318, and audio/video
adapter 319 are connected to PCI local bus 306 by add-in boards
inserted into expansion slots. Expansion bus interface 314 provides
a connection for a keyboard and mouse adapter 320, modem 322, and
additional memory 324. Small computer system interface (SCSI) host
bus adapter 312 provides a connection for hard disk drive 326, tape
drive 328, and CD-ROM drive 330. Typical PCI local bus
implementations will support three or four PCI expansion slots or
add-in connectors.
[0030] An operating system runs on processor 302 and is used to
coordinate and provide control of various components within data
processing system 300 in FIG. 3. The operating system may be a
commercially available operating system, such as Windows XP, which
is available from Microsoft Corporation. Instructions for the
operating system and applications or programs are located on
storage devices, such as hard disk drive 326, and may be loaded
into main memory 304 for execution by processor 302.
[0031] Those of ordinary skill in the art will appreciate that the
hardware in FIG. 3 may vary depending on the implementation. Other
internal hardware or peripheral devices, such as flash read-only
memory (ROM), equivalent nonvolatile memory, or optical disk drives
and the like, may be used in addition to or in place of the
hardware depicted in FIG. 3. Also, the processes of the present
invention may be applied to a multiprocessor data processing
system.
[0032] The depicted example in FIG. 3 and above-described examples
are not meant to imply architectural limitations. For example, data
processing system 300 also may be a notebook computer or hand held
computer in addition to taking the form of a PDA. Data processing
system 300 also may be a kiosk or a Web appliance.
[0033] The present invention provides an improved method,
apparatus, and computer instructions for identifying and monitoring
services provided by entities in a network data processing system.
In particular, the mechanism of the present invention takes
advantage of standards, such as Web Services Description Language
(WSDL) and systems such as Universal Description, Discovery, and
Integration (UDDI), which allow a program to locate entities that
offer particular services and to automatically determine how to
communicate and conduct transactions with those services. WSDL is a
proposed standard being considered by the WorldWide Web Consortium,
authored by representatives of companies, such as International
Business Machines Corporation, Ariba, Inc., and Microsoft
Corporation. UDDI version 3 is the current specification being used
for Web service applications and services. Future development and
changes to UDDI will be handled by the Organization for the
Advancement of Structured Information Standards (OASIS). The
mechanism of the present invention uses these standards to publish
additional information not normally provided. This information
includes an identification of a method or process that may be used
to monitor an entity. The monitoring may include, for example,
testing the entity to determine whether the service is functioning.
The monitoring may be performed by the client that uses the entity
providing the service or by the entity itself to test its own
functionality and availability. This information also may be used
by a third-party monitoring entity to monitor the entity for the
client.
[0034] Turning now to FIG. 4, a diagram illustrating message flows
used in monitoring entities is depicted in accordance with a
preferred embodiment of the present invention. The message flow in
FIG. 4 flows between distributed monitored entity 400, distributed
publishing entity 402, and distributed consumer entity 404. A
distributed monitored entity is a distributed entity for which
there exists at least one method or algorithm that can be used to
establish, with some probability, that at least some portion of the
entity is working correctly or is able to carry out at least one
function. A distributed publishing entity is a distributed entity
that publishes or otherwise makes available certain information in
such a way that it can be accessed by at least one distributed
entity. A distributed consumer entity is a distributed entity,
which depends upon at least one other distributed entity in order
to properly, or optimally, perform its functions. In these
examples, the entities are software components or processes
executing on a data processing system, such as data processing
system 200 in FIG. 2 or data processing system 300 in FIG. 3.
Depending on the particular implementation, these entities may all
be located on different data processing systems or some or all of
the entities may be located on the same data processing system.
[0035] In this example, distributed monitored entity 400 sends
registration message 406 to distributed publishing entity 402,
which functions as a directory service. Information about
registered entities may be stored in directory 408. Directory 408
allows distributed publishing entity 402 to provide information
about distributed monitored entity 400 as well as other entities
registered with distributed publishing entity 402. In particular,
directory 408 provides a mechanism to allow searching for
registered entities matching selected criteria. In these examples,
the selected criteria are a selected service. Other criteria may
include, for example, a geographic location of the computer on
which a distributed monitored entity is located or a particular
protocol used to communicate with a distributed monitored entity.
Registration message 406 includes both information about the
services provided by distributed monitored entity 400 and
information about how distributed monitored entity 400 may be
automatically monitored for proper operation. This registration
information may include a description of a monitoring method used
to monitor distributed monitored entity 400. For example,
distributed monitored entity 400 may include a monitoring interface
specifically designed to enable monitoring of the entity. For
example, the monitoring method may describe the particular commands
and parameters to initiate a test to monitor distributed monitored
entity 400. The interface may simply accept a request and provide a
response indicating that it is able to respond to requests. The
interface may be more complex and generate a data stream
continuously or on some periodic basis based on some request sent
to its monitoring interface. Distributed monitored entity 400 may
be verified as functioning correctly if the data stream is received
or if specific data is returned in the data stream. The response
may be data generated from the particular service or set of
services requested by the client, distributed consumer entity 404.
Alternatively, the request sent may be an invalid request in which
an expected error message is to be received. In another type of
monitoring method, a particular universal resource endpoint on
distributed monitored entity 400 may be provided to which a Simple
Object Access Protocol (SOAP) request may be sent. In response to
this request, a particular reply may be specified as one that is to
be expected if distributed monitored entity 400 is functioning
correctly. Another monitoring method may involve sending a
particular pattern of data to a specified port in distributed
monitored entity 400. A response to this data pattern should have
some selected corresponding pattern if distributed monitored entity
400 is functioning correctly. In another monitoring method, a
particular request or class of requests may be sent to distributed
monitored entity 400 with a reply being received within a selected
period of time if distributed monitored entity 400 is functioning
correctly. This specific period of time may be specified in the
request that is sent. In other cases, the monitoring method may be
a particular program or program fragment that is to be used to test
distributed monitored entity 400. This type of program may take
various forms, such as, for example, a Practical Extraction Report
Language (PERL) script, a Remote Method Invocation (RMI) client, a
RMI stub class, and a binary executable. Of course, other types of
monitoring methods may be implemented depending on the particular
implementation.
[0036] Further, information about how distributed monitored entity
400 may be automatically monitored for proper operation may be sent
to distributed publishing entity 402 for entry into directory 408
by an entity other than distributed monitored entity 400. For
example, this registration information may be sent through a
testing expert agent or even a human operator inputting data. In
some cases, the information about how distributed monitored entity
400 may be monitored is retained at distributed monitored entity
400 and not entered into directory 408. In this case, the client,
such as distributed consumer entity 404, would obtain the method
for monitoring distributed monitored entity 400 directly from
distributed monitored entity 400. Directory 408 includes
identifications of entities, and services provided by entities, as
well as monitoring methods for monitoring an entity. This directory
also may include other information, such as, for example,
distributed monitored entities currently being monitored,
previously monitored, or expected to be monitored in the
future.
[0037] Later, when distributed consumer entity 404 needs to locate
an entity to provide a particular service, this entity sends query
message 410 to distributed publishing entity 402. Alternatively,
this query may be a broadcast message that is received by a number
of distributed publishing entities with one or more of these
entities providing a reply. Distributed publishing entity 402
locates the appropriate service in directory 408 and returns
information about entities providing the particular service in
reply message 412 to distributed consumer entity 404. This reply
may contain information about a number of different entities, which
provide the particular service. If more than one entity is included
in reply message 412, distributed consumer entity 404 may select
one or more of these entities with which to operate or communicate.
In this example, distributed consumer entity 404 selects
distributed monitored entity 400. The information in reply message
412 includes information on how to contact distributed monitored
entity 400 as well as how this entity may be monitored for proper
operation. This information may include at least one description of
a monitoring method that may be applied to distributed monitored
entity 400. The monitoring method may be a process that is
initiated on the distributed monitored entity on distributed
monitored entity 400. The process in the monitoring method may be
located within distributed monitored entity 400 or at another
location, such as at distributed consumer entity 404. Depending on
the particular implementation, the monitoring information may be
excluded from reply message 412 with this monitoring information
being obtained directly from distributed monitored entity 400 by
distributed consumer entity 404.
[0038] The entities identified and the monitoring methods provided
may be based on particular service level agreements or other
agreements between the different entities. For example, the
monitoring method used to monitor distributed monitored entity 400
may be provided by distributed publishing entity 402 to distributed
consumer entity 404 based on some service level agreement or other
agreement established between distributed monitored entity 400 and
distributed publishing entity 402.
[0039] Distributed consumer entity 404 contacts distributed
monitored entity 400 after receiving reply message 412 to initiate
functional operations 414 using methods and protocols. These are
methods and protocols known in the art, such as, for example, WSDL
and UDDI. This contact is initiated to allow distributed consumer
entity 404 to use a service or services offered by distributed
monitored entity 400. Distributed consumer entity 404 also performs
monitoring operations 416 with distributed monitored entity 400 to
verify that distributed monitored entity 400 continues to operate
properly. These monitoring operations are described in reply
message 412 in these examples. The monitoring operations may be
initiated through different events, such as a periodic event or a
non-periodic event. The periodic event may be an expiration of a
timer that triggers the monitoring operation. The non-periodic
event may be, for example, initiation of a selected operation, such
as a purchase order by distributed consumer entity 404. In these
examples, monitoring operations 416 is a method, such as one or
more tests that may be performed on distributed monitored entity
400. If one or more tests fail during monitoring, distributed
consumer entity 404 may take corrective actions. These corrective
actions may include, for example, performing further diagnostic
tests to determine a cause of the failure, notifying a human
administrator of the test failure, notifying another distributed
entity that a problem exists, contacting distributed publishing
entity 402 to identify a replacement for distributed monitored
entity 400, attempting to restart distributed monitored entity 400,
or executing a selected sequence of actions specified within the
testing method identified in reply message 412.
[0040] In another embodiment of the present invention, distributed
consumer entity 404 carries out the testing operations described in
reply message 412 to verify the proper operation of distributed
monitored entity 400 before distributed consumer entity 404 begins
functional operations 414 with distributed monitored entity 400. In
another embodiment of this invention, distributed consumer entity
404 carries out the testing operations described in reply message
412 only after a service level agreement or other agreement is in
place between distributed monitored entity 400 and distributed
consumer entity 404, and the testing operations are responsive to
that agreement. In one possible embodiment, the testing operations
are used to verify that the service provided by distributed
monitored entity 400 is within the response-time range specified in
the relevant agreement.
[0041] In the preferred embodiment of the present invention,
distributed monitored entity 400 implements at least one monitoring
interface specifically designed to enable monitoring operations
416, initiated by distributed consumer entity 404 to monitor
distributed monitored entity 400. In other embodiments of the
present invention, monitoring operations 416 initiated by
distributed consumer entity 404 to monitor distributed monitored
entity 400 includes sending an invalid request to distributed
monitored entity 400 and verifying that the expected error
indication is received. In still other embodiments, monitoring
operations 416 may include requesting that distributed monitored
entity 400 generate a continuous or periodic stream of messages
directed to distributed publishing entity 402 and verifying that
the stream of messages continues to arrive as expected.
[0042] In another embodiment of the present invention, distributed
monitored entity 400 queries directory 408 in distributed
publishing entity 402 to obtain information about how distributed
monitored entity 400 may be monitored for proper operation. This
information is used by distributed monitored entity 400 to monitor
its own operation, for self-diagnostic purposes. In yet another
embodiment of this invention, distributed consumer entity 404
receives the information about how distributed monitored entity 400
may be monitored for proper operation from distributed monitored
entity 400 itself, rather than from distributed publishing entity
402.
[0043] With reference now to FIG. 5, a diagram illustrating message
flow used to monitor entities in which a third-party distributed
monitoring entity is present is depicted in accordance with a
preferred embodiment of the present invention. In this example, the
monitoring of an entity involves distributed monitored entity 500,
which is the distributed monitored entity, third-party distributed
monitoring entity 502, distributed publishing entity 504, and
distributed consumer entity 506. In some cases, distributed
consumer entity 506 may rely on services provided by another
entity, such as distributed monitored entity 500 for correct or
optimal functioning. In some instances, distributed consumer entity
506 may be unable to perform monitoring functions. Third-party
monitoring also may be used for efficiency reasons. As a result,
another entity, such as third-party distributed monitoring entity
502, may be employed to provide the monitoring function.
[0044] A distributed monitoring entity is a distributed entity,
which makes use of at least one technique or algorithm to
establish, with some probability, that at least some portion of a
distributed monitored entity is working correctly, or is able to
carry out at least one function. A third-party distributed
monitoring entity is a distributed monitoring entity, which
potentially monitors at least one distributed monitored entity upon
which the entity does not itself depend for proper, or optimal,
performance of its own functions. Additionally, third-party
distributed monitoring entity 502 also may accept requests from
other entities other than distributed consumer entity 506 to
monitor distributed monitored entity 500 or other entities. In
other words, third-party distributed monitoring entity 502 may
provide monitoring for multiple clients and multiple distributed
monitored entities. Additionally, a fee may be charged for
monitoring services provided by third-party distributed monitoring
entity 502. Additionally, the type of monitoring, monitoring
method, or parameters used in monitoring, may be changed or
modified based on input received from another entity, such as, for
example, distributed publishing entity 504. A modification may
include, for example, changing entities that are to be notified as
to the results of monitoring of distributed monitored entity 500.
In this example, third-party distributed monitoring entity 502
sends registration message 508 to distributed publishing entity
504, which functions as a directory service. Information about
distributed monitored entity 500 as well as information about
third-party distributed monitoring entity 502 may be stored in
directory 510 within distributed publishing entity 504. In this
example, registration message 508 contains information about the
monitoring method or methods that may be performed on distributed
monitored entity 500. In this example, third-party distributed
monitoring entity 502 sends registration message 512 to register
itself with distributed publishing entity 504 as an entity capable
of performing monitoring operations on entities, such as
distributed monitored entity 500. Registration message 512
identifies the type of monitoring that may be performed by
third-party distributed monitoring entity 502. This information
also may identify entities with which monitoring may be
performed.
[0045] Depending on the particular implementation, third-party
distributed monitoring entity 502 also may include information in
registration message 512 to register monitoring information about
distributed monitored entity 500 with distributed publishing entity
504. Additionally, directory 510 also may contain information about
third-party distributed monitoring entities currently providing
monitoring services, third-party distributed monitoring entities,
which have previously provided monitoring services, and third-party
distributed monitoring entities expected to provide monitoring
services.
[0046] In some embodiments of this invention, distributed
publishing entity 504 or another distributed publishing entity may
provide information including information about which third-party
distributed monitoring entities have in the past monitored
distributed monitored entity 500 and other distributed monitored
entities, and/or about which third-party monitoring entities are
likely in the future to so monitor, because distributed consumer
entities like distributed consumer entity 506 may wish to use this
information in determining which of several possible third-party
distributed monitoring entities to make use of (on, for instance,
the theory that a third-party distributed monitoring entity that
has been used for this purpose in the past may be expected to be
able to do it at present, or that an entity that has indicated that
it is likely to do so in the future may be more prepared to do so
now). Later, when distributed consumer entity 506 desires to locate
an entity to provide a particular service, this entity sends query
message 514 to distributed publishing entity 504. In response to
receiving query message 514, distributed publishing entity 504
identifies entities that can provide services specified in query
message 514. Information about these entities is returned to
distributed consumer entity 506 in reply message 516. This reply
contains information about entities, such as distributed monitored
entity 500. Further, the information returned in reply message 516
to distributed consumer entity 506 also includes, in this example,
information about automatically monitoring distributed monitored
entity 500 for proper operation. In response to receiving reply
message, distributed consumer entity 506 may select one or more
entities with which to operate and communicate. In this example,
the entity is distributed monitored entity 500.
[0047] In addition, distributed consumer entity 506 sends query
message 518 to distributed publishing entity 504 in which query
message 518 requests information about third-party distributed
monitoring entities capable of performing monitoring operations
described in reply message 516. In response to receiving query
message 518, distributed publishing entity 504, identifies one or
more third-party distributed monitoring entities that can perform
monitoring operations on distributed monitored entity 500. As
described above, these monitoring operations may take various
forms, such as tests or methods that may be executed on an entity
to determine whether the entity is properly operating. This
information is returned to distributed consumer entity 506 in reply
message 520. Based on this information, distributed consumer entity
506 selects one or more third-party distributed monitoring entities
for use in monitoring distributed monitored entity 500.
[0048] Thereafter, distributed consumer entity 506 contacts
distributed monitored entity 500 and initiates functional
operations 522 to avail itself of services offered by distributed
monitored entity 500. Distributed consumer entity 506 also contacts
third-party distributed monitoring entity 502 using request 524 to
request monitoring of distributed monitored entity 500. The
monitoring requested is for operations as described in reply
message 516 received from distributed publishing entity 504. For
example, the operations may specify a monitoring method that is to
be applied to distributed monitored entity 500. Alternatively, if
the monitoring method is not specified in request 524, the
information in this request may include information as to how a
monitoring method may be identified. In this instance, third-party
distributed monitoring entity 502 may identify a monitoring method
for use in monitoring distributed monitored entity 500 by examining
published information, such as that provided in directory 510
within distributed publishing entity 504. This request also may
include any certificates, verification information, or delegation
instruments required for third-party distributed monitoring entity
502 to carry out monitoring operations on distributed monitored
entity 500 on behalf of distributed consumer entity 506. As a
result, third-party distributed monitoring entity 502 carries out
monitoring operations 526 on distributed monitored entity 500.
Depending on the results, third-party distributed monitoring entity
502 takes actions, which may include sending notification 528 to
distributed consumer entity 506 if one or more tests performed in
the monitoring operations suggest the existence of a problem or
failure in distributed monitored entity 500. The failure of
particular interest is a failure of the service desired by
distributed consumer entity 506. Other services provided by
distributed monitored entity 500 may not be tested or failures in
those services do not trigger notification 528. Depending on the
particular implementation, services provided by third-party
distributed monitoring entity 502 may be provided with a fee being
charged to distributed consumer entity 506 for the monitoring
service.
[0049] In another embodiment of the present invention, third-party
distributed monitoring entity 502 contacts distributed publishing
entity 504 to obtain information about testing operations to be
performed on distributed monitored entity 500. In still other
embodiments, third-party distributed monitoring entity 502 contacts
distributed monitored entity 500 itself for that information. In
still other embodiments, third-party distributed monitoring entity
502 may infer an appropriate method for monitoring distributed
monitored entity 500 by examining other information about that
entity, derived from distributed publishing entity 504 or from
other sources.
[0050] In yet other embodiments of the present invention,
third-party distributed monitoring entity 502 publishes or
otherwise makes available information concerning which distributed
monitored entities that third-party distributed monitoring entity
502 is already monitoring. With publication of this type of
information distributed consumer entities, such as distributed
consumer entity 506 may elect to request monitoring services of a
third-party distributed monitoring entity that is already engaged
in monitoring of a given distributed monitored entity, such as
distributed monitored entity 500 to receive a discount on the price
charged or for the sake of efficiency.
[0051] Turning now to FIG. 6, a flowchart of a process used for
identifying and monitoring an entity is depicted in accordance with
a preferred embodiment of the present invention. The process
illustrated in FIG. 6 may be implemented in a client, such as
distributed consumer entity 404 in FIG. 4.
[0052] The process begins by identifying a need for a service
(block 600). A list of providers and monitoring methods is
requested from a distributed publishing entity, such as distributed
publishing entity 404 in FIG. 4 (block 602). A list of providers
and monitoring methods is received from the distributed publishing
entity (block 604). One distributed monitoring entity provider is
picked from the list and the monitoring method is stored (block
606). Depending on the particular implementation, more than one
entity may be selected. An agreement is formed with the selected
distributed monitored entity to provide services to the selected
distributed consumer entity (block 608). This agreement may be
formed using various automated negotiation protocols or methods
currently employed. A determination is made as to whether the
agreement terminates (block 610). This agreement may terminate
under various conditions specified in the agreement. For example,
the agreement may terminate after a set amount of time, after a set
amount of time without the agreement being renewed, after some
number of transactions, at the initiation of either party, or based
on some market condition being present, such as the price of a good
or service being above or below some selected value. Another
trigger for termination of the agreement may be the failure of a
test announced using the present invention. With this type of
failure, the process would return to the beginning of FIG. 6 at
block 600. If the agreement does not terminate, the client operates
with the distributed monitored entity (block 612). These operations
may vary depending on the services being provided by the
distributed monitored entity to the distributed consumer entity.
The operations may include, for example, language translation,
stock market quotes, news updates, mathematical calculations,
storage and retrieval of binary data, database searches, provision
of content such as streaming audio or video, and weather
prediction.
[0053] Next, a determination is made as to whether the monitoring
method requires a test of the distributed monitored entity (block
614). If the monitoring method does require a test, the test
request is sent to the distributed monitored entity (block 616),
and a reply is received from the distributed monitored entity
(block 618). A determination is made as to whether to reply as
specified in the monitoring method (block 620). Basically, block
620 is used to determine whether the distributed monitored entity
is performing as expected or an error or failure has occurred. If
the reply is not as specified in the monitoring method, corrective
action is taken (block 622) and the process returns to block 610 as
described above. In some cases, depending on the failure and the
success of the corrective action, the process may be unable to
return to block 610. Such a case may occur if the distributed
monitored entity has suffered a serious failure and corrective
action has failed to fix the failure. In this case, the distributed
monitored entity may be unable to operate normally.
[0054] The corrective action performed in block 622 may take
various forms, including, for example, restarting the distributed
monitored entity, sending a notification to a human operator,
selecting another distributed monitored entity, terminating
execution of the distributed monitored entity, or generating an
entry in a log file. A restive or corrective action taken may be a
particular message being sent based on the results of testing
matching selected criteria. For example, if no response is received
from application of the monitoring method, the message may indicate
that the entity is unavailable. If an error is returned in response
to the testing, the message may indicate that the entity is
functioning improperly. These messages may be sent to various
entities, including, for example, the distributed consumer entity
requesting the monitoring and the distributed publishing entity at
which the distributed monitored entity is registered. Further, the
corrective action also may include executing a program or process
in response to testing matching selected criteria. For example, the
corrective action might consist of starting a standard
problem-determination program or process, giving it parameters
containing sufficient information to identify the entity that
failed the test, and the nature of the test that was failed. One
way the distributed consumer entity would know what corrective
measures to try would be by consulting its own internal policies
about what to do in such a case. Another way the distributed
consumer entity would know what corrective measures to try would be
by finding that information bundled along with the test-method
information that it received from the distributed publishing
entity. Another corrective action may include taking an action with
respect to the distributed monitored entity that is likely to break
an internal deadlock within that entity when the results of testing
match selected criteria. With respect to breaking an internal
deadlock, a request may be sent to the platform on which the
distributed monitored entity is running. This request may be one
asking the platform to terminate any thread of the distributed
monitored entity that has been waiting for a lock for some selected
period of time. Criteria initiating this corrective action may be
based on any policy desired for the particular implementation.
[0055] With reference again to block 620, if the reply is as
specified in the monitoring method, the process returns to block
610 as described above. Turning again to block 614, if the
monitoring method does not require a test, the process returns to
block 610 as described above. With reference again to block 610, if
the agreement terminates, the process terminates.
[0056] With reference now to FIG. 7, a flowchart of a process used
by a third-party distributed monitoring entity to monitor an entity
is depicted in accordance with a preferred embodiment of the
present invention. The process illustrated in FIG. 7 may be
implemented in a third-party distributed monitoring entity, such as
third-party distributed monitoring entity 502 in FIG. 5.
[0057] The process begins by registering with a distributed
publishing entity (block 700). In block 700, the third-party
distributed monitoring entity sends information about monitoring
operations that this entity may perform. The information also may
identify particular entities that may be monitored. Thereafter, the
process waits for requests (block 702). In block 702, the requests
waited for are those from an entity, such as those from a
distributed consumer entity desiring monitoring of an entity
providing a service. A request is received from the distributed
consumer entity to monitor the distributed monitored entity by a
particular monitoring method (block 704). An agreement is formed
with the distributed consumer entity (block 706). This agreement
may be reached through any presently known or used negotiation
protocol. For instance, the distributed consumer entity may propose
one of a set of standard monitoring agreements to the third-party
distributed monitoring entity, and the latter may accept the
proposal. Alternatively, the monitoring agreement may be formed by
any of various automated negotiation protocols or other methods
known to the art. In any case, as part of the agreement, the
distributed consumer entity will provide the third-party
distributed monitoring entity with information sufficient to allow
it to perform the requested monitoring of the distributed monitored
entity.
[0058] Thereafter, a determination is made as to whether the
agreement terminates (block 708). Various factors, as discussed
above, may cause the agreement to terminate. The most common factor
is typically time. If the agreement does not terminate, a
determination is made as to whether the monitoring method
identified for the distributed monitored entity requires a test
(block 710). If the monitoring method does require a test, the test
request is sent to the distributed monitored entity and a reply is
received (block 712).
[0059] Next, a determination is made as to whether the results in
the reply are as specified in the monitoring method (block 716). If
the results in the reply are not as specified in the monitoring
method, a notification is sent to the distributed consumer entity
(block 718) and the process returns to block 708 as described
above. With reference again to block 716, if the results in the
reply are as specified in the test method, the process returns to
block 708 as described above. Turning again to block 710, if the
monitoring method does not require a test yet, the process returns
to block 708 as described above. With reference again to block 708,
if the agreement terminates, the process terminates.
[0060] With reference now to FIG. 8, a diagram illustrating a data
structure used in publishing monitoring methods for an entity is
depicted in accordance with a preferred embodiment of the present
invention. Data structure 800 is an example of a data structure
that may be used to provide information to an entity, such as, for
example, a distributed consumer entity or a third-party distributed
monitoring entity. Section 802 contains lines of description
describing an operation that may be performed on a
language-translation service to determine whether or not it is
correctly performing its basic functions, and the reply that will
be received from that operation if the element is correctly
performing its basic function. Section 804 contains lines of
description describing the fact that a Web service port of a
particular service port type may be tested using the operation and
expected reply described in section 802.
[0061] Prior art methods send data structures such as data
structure 800 without sections 802 and 804 as part of normal WSDL
fragments. The present invention adds information such as
illustrative sections 802 and 804 to provide for the monitoring
mechanisms described above. Section 804 includes an assertion that
the port type may be tested using a particular operation and
expecting a particular message to be returned in response to the
operation. Section 802 defines the operation and the response. The
lines in sections 802 and 804 are provided as extensions to WSDL
with the other portions being standard WSDL coding. The example
illustrated in FIG. 8 uses extensible markup language (XML). This
example is provided as an illustration, but is not intended to
limit the invention to using this particular format. Any other
format may be used depending on the particular implementation.
[0062] Thus, the present invention provides an improved method,
apparatus, and computer instructions for publishing and providing
information to identify and monitor entities in an autonomic
computing system. The mechanism uses standardized languages, such
as WSDL or UDDI, to provide or publish information about monitoring
methods that may be used for particular entities that have been
registered with the mechanism of the present invention. In this
manner, a client, such as a distributed consumer entity, may
request and receive an identification of entities that are able to
provide a desired service. In addition to an identification of the
service, the mechanism of the present invention provides
information indicating how the entity providing the service may be
monitored to verify that the entity is able to provide the service
as required by the client. With this information, the client is
able to monitor the service and take corrective action if
monitoring indicates that the entity is unable to function in the
manner required.
[0063] It is important to note that while the present invention has
been described in the context of a fully functioning data
processing system, those of ordinary skill in the art will
appreciate that the processes of the present invention are capable
of being distributed in the form of a computer readable medium of
instructions and a variety of forms and that the present invention
applies equally regardless of the particular type of signal bearing
media actually used to carry out the distribution. Examples of
computer readable media include record able-type media, such as a
floppy disk, a hard disk drive, a RAM, CD-ROM's, DVD-ROMs, and
transmission-type media, such as digital and analog communications
links, wired or wireless communications links using transmission
forms, such as, for example, radio frequency and light wave
transmissions. The computer readable media may take the form of
coded formats that are decoded for actual use in a particular data
processing system.
[0064] The description of the present invention has been presented
for purposes of illustration and description, and is not intended
to be exhaustive or limited to the invention in the form disclosed.
Many modifications and variations will be apparent to those of
ordinary skill in the art. The embodiment was chosen and described
in order to best explain the principles of the invention, the
practical application, and to enable others of ordinary skill in
the art to understand the invention for various embodiments with
various modifications as are suited to the particular use
contemplated.
* * * * *