U.S. patent application number 13/307574 was filed with the patent office on 2012-06-14 for method for operating a node cluster system in a network and node cluster system.
This patent application is currently assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION. Invention is credited to Utz Bacher, Einar Lueck, Viktor Mihajlovski.
Application Number | 20120151018 13/307574 |
Document ID | / |
Family ID | 46200513 |
Filed Date | 2012-06-14 |
United States Patent
Application |
20120151018 |
Kind Code |
A1 |
Bacher; Utz ; et
al. |
June 14, 2012 |
METHOD FOR OPERATING A NODE CLUSTER SYSTEM IN A NETWORK AND NODE
CLUSTER SYSTEM
Abstract
Operating a node cluster system with a plurality of nodes in a
network, wherein the cluster system appears to be a single node
with only one specific network address to its network environment.
Providing a shared socket database for linking network connection
port identifications of a common set of network connection port
identifications to the individual nodes, assigning a master
function to one of the nodes, sending incoming traffic to all nodes
of the cluster system wherein each node verifies its responsibility
for this traffic individually, exclusive assignment of a network
connection port to the responsible node for the duration of a
connection of the corresponding application process by means of the
corresponding network connection port identification and the link
established by the shared socket database and processing of the
traffic by the responsible node or otherwise by the node having the
master function.
Inventors: |
Bacher; Utz; (Schoenbuch,
GE) ; Lueck; Einar; (Stuttgart, GE) ;
Mihajlovski; Viktor; (Wildberg, GE) |
Assignee: |
INTERNATIONAL BUSINESS MACHINES
CORPORATION
ARMONK
NY
|
Family ID: |
46200513 |
Appl. No.: |
13/307574 |
Filed: |
November 30, 2011 |
Current U.S.
Class: |
709/220 |
Current CPC
Class: |
H04L 61/6063 20130101;
H04L 67/1002 20130101 |
Class at
Publication: |
709/220 |
International
Class: |
G06F 15/177 20060101
G06F015/177 |
Foreign Application Data
Date |
Code |
Application Number |
Dec 14, 2010 |
EP |
EP10194836.2 |
Claims
1. In a communications network including a node cluster having a
plurality of nodes, but said cluster having a single network
address, a method of transmission comprising: providing a shared
socket database for linking network connection port identifiers for
each of a common shared set of network port identifiers for
respective connection to each of said plurality of nodes; assigning
a master function to one of said plurality of nodes; sending
incoming data transmissions to all of said plurality of nodes
wherein only one of the nodes is enabled to accept responsibility
for an incoming data transmission; assigning exclusive connection
to the network to the node accepting responsibility for a required
duration of a data processing function from the identified network
port corresponding to the responsible node to the shared socket
database; and processing data transmissions during said duration
either through said responsible node or said node having said
master function.
2. The method of claim 1, further including, responsive to a
request for said shared socket database, said responsible node
requesting a reservation of the corresponding identified network
port to said shared socket database, and providing a confirmation
of the reservation.
3. The method of claim 2 wherein incoming data is sent through said
exclusive connection, and to said master node.
4. The method of claim 1 further including establishing an outgoing
connection from one of said plurality of nodes and another network
destination through said socket, wherein outgoing data may be
transmitted.
5. The method of claim 4 wherein said outgoing connection is
established, responsive to an application being processed, by
automatically sequentially seeking the next available network
connection port.
6. The method of claim 5 includes assigning different network
connection port ranges to each of said plurality of nodes and by
automatically looking sequentially for the next available network
connection port within said ranges.
7. The method of claim 1 further including reassigning the master
function to another one of said plurality of nodes in the event of
failure of the node to which the master function is originally
assigned.
8. The method of claim 1, further including, in the event of
failure of the node to which exclusive connection is assigned:
removing all entries of said failure node from the shared socket
database; reserving the network port identifications of said
entries for a predetermined time period; and releasing said
reserved port identifications for use by any of said plurality of
nodes after said time period.
9. In a communications network including a node cluster having a
plurality of nodes, but said cluster having a single network
address, a system for transmission comprising: a processor; a
computer memory holding computer program instructions that, when
executed by the processor, perform the method comprising: providing
a shared socket database for linking network connection port
identifiers for each of a common shared set of network port
identifiers for respective connection to each of said plurality of
nodes; assigning a master function to one of said plurality of
nodes; sending incoming data transmissions to all of said plurality
of nodes wherein only one of the nodes is enabled to accept
responsibility for an incoming data transmission; assigning
exclusive connection to the network to the node accepting
responsibility for a required duration of a data processing
function from the identified network port corresponding to the
responsible node to the shared socket database; and processing data
transmissions during said duration either through said responsible
node or said node having said master function.
10. The system of claim 9 wherein the performed method further
includes, responsive to a request for said shared socket database,
said responsible node requesting a reservation of the corresponding
identified network port to said shared socket database, and
providing confirmation of the reservation.
11. The system of claim 10 wherein incoming data is sent through
said exclusive connection and to said master node.
12. The system of claim 9 further including establishing an
outgoing connection from one of said plurality of nodes and another
network destination through said socket wherein outgoing data may
be transmitted.
13. The system of claim 11 wherein said performed method
establishes said outgoing connection, responsive to an application
being processed, by automatically sequentially seeking the next
available network connection port.
14. The system of claim 9 wherein said performed method further
includes reassigning the master function to another one of said
plurality of nodes in the event of failure of the node to which the
master function is originally assigned.
15. The system of claim 9 wherein the performed method further
includes, in the event of failure of the node to which exclusive
connection is assigned: removing all entries of said failure node
from the shared socket database; reserving the network port
identifications of said entries for a predetermined time period;
and releasing said reserved port identifications for use by any of
said plurality of nodes after said time period.
16. A computer usable storage medium having stored thereon a
computer readable program for document transmission in a
communications network including a node cluster having a plurality
of nodes, but said cluster having a single network address, wherein
the computer the readable program, when executed on a computer,
causes the computer to: provide a shared socket database for
linking network connection port identifiers for each of a common
shared set of network port identifiers for respective connection to
each of said plurality of nodes; assign a master function to one of
said plurality of nodes; send incoming data transmissions to all of
said plurality of nodes wherein only one of the nodes is enabled to
accept responsibility for an incoming data transmission; assign
exclusive connection to the network to the node accepting
responsibility for a required duration of a data processing
function from the identified network port corresponding to the
responsible node to the shared socket database; and process data
transmissions during said duration either through said responsible
node or said node having said master function.
17. The computer usable storage medium of claim 16 wherein said
computer readable program, when executed responsive to a request
for said shared socket database, further causes: said responsible
node to request a reservation of the corresponding identified
network port to said shared socket database, and the executed
program to provide a confirmation of the reservation.
18. The computer usable storage medium of claim 16 wherein the
computer program, when executed, further causes the computer to
establish an outgoing connection from one of said plurality of
nodes and another network destination through said socket, wherein
outgoing data may be transmitted.
19. The method of claim 18 wherein said outgoing connection is
established, responsive to an application being processed, by
automatically sequentially seeking the next available network
connection port.
20. The computer usable storage medium of claim 19 wherein the
computer program further causes the computer to assign different
network connection port ranges to each of said plurality of nodes
by automatically looking sequentially for the next available
network connection port within said ranges.
21. The computer readable storage medium of claim 16 wherein the
computer program, when executed, further causes the computer to
reassign the master function to another one of said plurality of
nodes in the event of failure of the node to which the master
function is originally assigned.
22. The computer readable storage medium of claim 16 wherein the
computer program, when executed, further causes the computer, in
the event of failure of the node to which exclusive connection is
assigned to: remove all entries of said failure node from the
shared socket database; reserve the network port identifications of
said entries for a predetermined time period; and release said
reserved port identifications for use by any of said plurality of
nodes after said time period.
Description
FIELD OF THE INVENTION
[0001] The present invention relates to a method for operating a
node cluster system with a plurality of nodes in a network, wherein
the cluster system appears to be a single node with only one
specific network address to its network environment. The invention
also relates to a computer-readable medium, such as a storage
device: a floppy disk, CD, DVD, Blue Ray disk or a random access
memory (RAM), containing a set of instructions that causes a
computer to perform the above-mentioned method. Further, the
invention relates to a computer program product comprising a
computer usable medium including computer usable program code,
wherein the computer usable program code is adapted to execute the
above method. The invention further relates to a corresponding node
cluster system comprising a plurality of nodes, wherein, to a
network environment of the node cluster system, these nodes appear
to be a single node with only one specific network address.
BACKGROUND
[0002] An ensemble of nodes that appear to be a single node to a
network environment with only one specific IP address of this
ensemble, is known. Today's approaches are based on masking the IP
address through a central sprayer or designated node so that an
ensemble of nodes appears as a single entity within a network.
[0003] U.S. Pat. No. 7,051,115 B2 discloses a method of providing a
single system image in a clustered environment. An internet
protocol (IP) address is assigned as a cluster IP address. The
cluster IP address is bound to a node in a cluster. A client
request directed to the cluster IP address is received in the node.
The node multicasts the request to all nodes in the cluster. A
dynamically adjustable workload distribution function filters the
request, wherein the function is configured to allow a single node
to process the client request.
SUMMARY
[0004] It is an object of the invention to provide an operation
method of a node cluster system and a corresponding node cluster
system comprising a plurality of nodes with improved
manageability.
[0005] This object is achieved by the independent claims.
Advantageous embodiments are detailed in the dependent claims.
[0006] The method according to the invention comprises the
following steps: (a) providing a shared socket database for linking
(binding) network connection port identifications of a common
shared set of network connection port identifications to the
individual nodes, (b) assigning a master function to one of the
nodes, (c) sending incoming traffic to all nodes of the cluster
system, wherein each node verifies its responsibility for this
traffic individually, (d) exclusive assignment of a network
connection port to the responsible node for the duration of a
connection of the corresponding application process (application)
by means of the corresponding network connection port
identification and the link established by the shared socket
database and (e) processing of the traffic by the responsible node
or otherwise by the node having the master function. Opening a
socket on an individual node involves binding of network connection
port identifications to said node. The common set of network
connection port identifications is a shared port identification
space; specifically, a shared port number space. Due to the method
of the invention, the node cluster system is advantageously
manageable, like a single node, with only one specific network
address by its system administrator.
[0007] The basic idea of the invention is to operate the node
cluster system comprising the plurality of nodes by setting up
incoming traffic distribution to all nodes in a "mirrored" fashion:
incoming traffic to the cluster system is sent to all nodes of the
cluster system. Each one of the nodes contains the same network
address (more specifically, carries the same MAC address, VLAN and
IP address) and verifies its responsibility for this traffic
separately by use of the shared socket database. The shared socket
database also ensures that a network connection port is never used
by more than one node. The responsibility of the node having the
master function (master node) further includes the processing of
incoming traffic for which no other node is responsible. Processing
of incoming traffic pertaining to diagnostics and/or administration
is/are performed by the node having the master function. This
traffic includes handling related to non-existent sockets
(rejection of sockets), diagnostic and administrative traffic, e.g.
ICMP or ARP handling (ICMP: Internet Control Message Protocol; ARP:
Address Resolution Protocol).
[0008] In data communication, a node (physical network node) is an
addressable electronic unit, e.g. an electronic device, attached to
a network, which unit is capable of sending and/or receiving and/or
forwarding information (traffic) over a communications channel. The
node may either be data circuit-terminating equipment (DCE), such
as a modem, hub, bridge or switch; or data terminal equipment
(DTE), such as a digital telephone handset, a printer or a host
computer, for example a router, a workstation or a server.
[0009] The nodes of the cluster system, according to the invention,
have identical network addresses, especially identical MAC
addresses (MAC: Media-Access-Control). Also, the nodes further have
identical IP (Internet Protocol) and VLAN (Virtual Local Area
Network) settings. A port space for TCP ports and UDP ports
relevant to the IP address of said nodes is shared across all
nodes.
[0010] In a preferred embodiment of the present invention, the
traffic includes a connection request for a listening socket on the
plurality of nodes, wherein the traffic causes the responsible node
to send a request for reservation of a corresponding network
connection port identification to the shared socket database
followed by a check, whether the reservation request is
successful.
[0011] Incoming traffic requesting established connections causes a
processing of said traffic on the corresponding node or otherwise
processing on the node having the master function by sending a
response.
[0012] In a further preferred embodiment of the present invention,
the individual verification of the responsibility for incoming
traffic is performed by a respective operating system of each
individual node.
[0013] Outgoing traffic may be caused by one of said nodes
establishing a socket to another network destination.
[0014] According to another preferred embodiment of the present
invention, the assigning of the network connection port
identification to the node on which an application requests for an
outgoing connection is performed by applying an auto-binding
procedure by looking sequentially for the next available network
connection port.
[0015] According to another preferred embodiment of the present
invention, the assigning of the network connection port
identification to the node on which an application process requests
for an outgoing connection is optimized by assigning different
network connection port ranges to individual nodes and by applying
an auto-binding procedure, by looking sequentially for the next
available network connection port in said network connection port
ranges. This assigning of the network connection port
identification to said node is optimized in a way that each node is
granted a pool of available ports. This reduces the likeliness of
collisions, if two (2) nodes want to open an outgoing connection at
the same time. Also, this allows for optimizations to assume no
collisions will appear, reducing locking requirements.
[0016] According to yet another preferred embodiment of the present
invention, the specific port is a TCP/IP-port, UDP/IP-port,
TCPv6/IPv6-port or UDPv6/IPv6-port with the corresponding network
connection port identification being a TCP/IP-port number,
UDP/IP-port number, TCPv6/IPv6-port number or UDPv6/IPv6-port
number. The Transmission Control Protocol (TCP) and the User
Datagram Protocol (UDP) are members of the Internet Protocol Suite,
the set of network protocols used for the Internet. TCP/IP is named
from the two protocols: the Transmission Control Protocol (TCP) and
the Internet Protocol (IP). UDP/IP is named from the two protocols:
the User Datagram Protocol (UDP) and the Internet Protocol
(IP).
[0017] According to another preferred embodiment, failure of nodes
not holding the master function (worker nodes) is tolerated.
Failure is detected through state of the art mechanisms, e.g.
heartbeating. Upon failure of a worker node, all socket entries of
the failed node are removed from the shared database. The nodes
network connection ports will be kept reserved for some duration of
time, so that existing connections to the failed node will time
out. After some time, the network connection ports will be freed
for use by other nodes.
[0018] According to yet another preferred embodiment, failure of
the node holding the master function (master node) is tolerated.
Failure is detected through state of the art mechanisms, e.g.
heartbeating. Upon failure of the master node, the master function
is taken over by another node. This can be done through following a
well-defined order, according to utilization information, or
following state-of-the art quorum services. Existing sockets of the
failed node will be removed, blocked and made available for use
again after some time, as described in the previous preferred
embodiment.
[0019] According to another preferred embodiment, the master
function is assumable by several nodes simultaneously by
distinguishing between various kinds of responses of the master
function and splitting responsibility according to said kinds of
responses on said nodes. In other words, the master function can be
distributed on several nodes, to reduce the master nodes workloads.
This can be done by traffic type, e.g. that one master node handles
ARP traffic, another master node handles ICMP traffic and yet
another master node handles to-be-rejected socket requests.
[0020] According to another preferred embodiment, a listening TCP
or TCPv6 socket can be implemented by several nodes. All nodes
serving a listening socket have to follow the same pattern (e.g.
through a hashing function) to find out which one node will serve
the socket. This specific node will handle this connection from
this point on, while all other nodes ignore packets for this
connection.
[0021] The invention further relates to a computer-readable medium,
such as a storage device, a floppy disk, CD, DVD, Blue Ray disk or
a random access memory (RAM), containing a set of instructions that
causes a computer to perform an aforementioned method and a
computer program product comprising a computer usable medium
including computer usable program code, wherein the computer usable
program code is adapted to execute the aforementioned method.
[0022] With respect to the node cluster system, the aforementioned
object is achieved by a shared socket database for linking network
connection port identifications of a common shared set of network
connection port identifications to the individual nodes, a role
manager for assigning a master function to one of the nodes and an
interface connecting each of the nodes to the network environment
for passing incoming traffic to each node of the cluster system,
wherein each node is configured to verify its responsibility for
incoming traffic to the node cluster system individually. The node
cluster system is a node cluster system for carrying out the
aforementioned operation method. The exclusive assignment of a port
to the responsible node for the duration of a connection of the
corresponding application process is performed by means of the
corresponding network connection port identification and the link
established by the shared socket database and the processing of the
traffic is performed by the responsible node or otherwise by the
node having the master function. The interface preferably contains
a switch, connecting the outside network to the plurality of nodes.
The node cluster system according to the invention is
advantageously manageable like a single node with only one specific
network address by its system administrator.
[0023] As will be appreciated by one skilled in the art, aspects of
the present invention may be embodied as a system, method or
computer program product. Accordingly, aspects of the present
invention may take the form of an entirely hardware embodiment, an
entirely software embodiment (including firmware, resident
software, micro-code, etc.) or an embodiment combining software and
hardware aspects that may all generally be referred to herein as a
"circuit," "module" or "system." Furthermore, aspects of the
present invention may take the form of a computer program product
embodied in one or more computer readable medium(s) having computer
readable program code embodied thereon.
[0024] Any combination of one or more computer readable medium(s)
may be utilized. The computer readable medium may be a computer
readable signal medium or a computer readable storage medium. A
computer readable storage medium may be, for example, but not
limited to, an electronic, magnetic, optical, electromagnetic,
infrared or semiconductor system, apparatus, device or any suitable
combination of the foregoing. More specific examples (a
non-exhaustive list) of the computer readable storage mediums would
include the following: an electrical connection having one or more
wires, a portable computer diskette, a hard disk, a random access
memory (RAM), a read-only memory (ROM), an erasable programmable
read-only memory (EPROM or Flash memory), an optical fiber, a
portable compact disc read-only memory (CD-ROM), an optical storage
device, a magnetic storage device or any suitable combination of
the foregoing. In the context of this document, a computer readable
storage medium may be any tangible medium that can contain or store
a program for use by or in connection with an instruction execution
system, apparatus or device.
[0025] A computer readable signal medium may include a propagated
data signal with computer readable program code embodied therein,
for example, in baseband or as part of a carrier wave. Such a
propagated signal may take any of a variety of forms, including,
but not limited to, electro-magnetic, optical or any suitable
combination thereof. A computer readable signal medium may be any
computer readable medium that is not a computer readable storage
medium and that can communicate, propagate or transport a program
for use by or in connection with an instruction execution system,
apparatus or device.
[0026] Program code embodied on a computer readable medium may be
transmitted using any appropriate medium, including but not limited
to wireless, wireline, optical fiber cable, RF, etc. or any
suitable combination of the foregoing.
[0027] Computer program code for carrying out operations for
aspects of the present invention may be written in any combination
of one or more programming languages, including an object oriented
programming language, such as Java, Smalltalk, C++ or the like, and
conventional procedural programming languages, such as the "C"
programming language or similar programming languages. The program
code may execute entirely on the user's computer, partly on the
user's computer, as a stand-alone software package, partly on the
user's computer and partly on a remote computer or entirely on the
remote computer or server. In the latter scenario, the remote
computer may be connected to the user's computer through any type
of network, including a local area network (LAN) or a wide area
network (WAN), or the connection may be made to an external
computer (for example, through the Internet using an Internet
Service Provider).
[0028] Aspects of the present invention are described below with
reference to flowchart illustrations and/or block diagrams of
methods, apparatus (systems) and computer program products
according to embodiments of the invention. It will be understood
that each block of the flowchart illustrations and/or block
diagrams and combinations of blocks in the flowchart illustrations
and/or block diagrams, can be implemented by computer program
instructions. These computer program instructions may be provided
to a processor of a general purpose computer, special purpose
computer or other programmable data processing apparatus to produce
a machine, such that the instructions that execute via the
processor of the computer or other programmable data processing
apparatus, create means for implementing the functions/acts
specified in the flowchart and/or block diagram block or
blocks.
[0029] These computer program instructions may also be stored in a
computer readable medium that can direct a computer, other
programmable data processing apparatus or other devices to function
in a particular manner, such that the instructions stored in the
computer readable medium produce an article of manufacture
including instructions that implement the function/act specified in
the flowchart and/or block diagram block or blocks.
[0030] The computer program instructions may also be loaded onto a
computer, other programmable data processing apparatus or other
devices to cause a series of operational steps to be performed on
the computer, other programmable apparatus or other devices to
produce a computer implemented process such that the instructions
that execute on the computer or other programmable apparatus
provide processes for implementing the functions/acts specified in
the flowchart and/or block diagram block or blocks.
BRIEF DESCRIPTION OF THE DRAWINGS
[0031] Preferred embodiments of the invention are illustrated in
the accompanying figures. These embodiments are merely exemplary,
i.e. they are not intended to limit the content and scope of the
appended claims.
[0032] FIG. 1 shows a schematic representation of a node cluster
system according to a preferred embodiment of the present
invention,
[0033] FIG. 2 shows a flowchart of a process for opening (creating)
a listening socket (preparing to receive and process incoming
connection requests),
[0034] FIG. 3 shows a flowchart of a process for opening (creating)
a connection socket (opening a connection for outgoing
traffic),
[0035] FIG. 4 shows a flowchart of processing connections (incoming
traffic not related to connection requests), and
[0036] FIG. 5 shows a flowchart of a process closing a socket.
DETAILED DESCRIPTION
[0037] FIG. 1 shows a node cluster system 100 comprising a
plurality of nodes 102, 104, 106, wherein this node cluster system
100 has a single specific network address being an IP address. The
cluster system 100 further comprises a shared socket database 108
for binding of network connection port identifications to the nodes
102, 104, 106, a role manager 110 (sometimes called master/worker
role manager) for assigning a master function to one of the nodes
102, 104, 106, and the operating system (OS) of the individual
nodes for performing an individual verification of the
responsibility for incoming traffic. Inside the cluster system 100,
the nodes 102, 104, 106, the shared socket database 108, and the
role manager 110 are connected via a sideband connectivity 114 of
the cluster system 100 (separate physics or via different MAC/IP
addresses). The role manager 110 assigns the master function to one
of the nodes 102, and worker function to the other nodes 104 and
106. The shared socket database keeps track of all sockets
regardless of their states (LISTEN, SYN-SENT, SYN-RECEIVED,
ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK,
TIME-WAIT) of the nodes 102, 104, 106 and comprises information on
the TCP, UDP, TCPv6 or UDPv6 port being used and the individual
node (102, 104 or 106) the socket is handled on.
[0038] An interface 112, especially a switch of said interface, is
connecting each of the nodes 102, 104, 106 to other nodes of a
network outside the cluster system 100.
[0039] The nodes 102, 104, 106, the shared socket database 108 and
the role manager 110 are interconnected by sideband connectivity
114 organized as separate physics or via different MAC/IP
addresses.
[0040] In each of the nodes 102, 104, 106 a local agent la with
access to the socket database 108 and corresponding traffic rules
trm, trw (trm: traffic rules for the node being the master; trw:
traffic rules for the node(s) being ordinary workers) is
implemented. The role manager indicates the master function (master
role) by an indicator I. In the example shown in FIG. 1, the first
node 102 has the master function, while nodes 104 and 106 have to
perform the worker function of worker nodes.
[0041] The interface 112 connects each of the nodes 102, 104, 106
to the network environment for passing incoming traffic to each
node 102, 104, 106 of the cluster system 100.
[0042] Each node 102, 104, 106 is configured to verify its
responsibility for incoming traffic to the node cluster system 100
individually. The individual verification of the responsibility for
incoming traffic is performed by a respective OS of each individual
node 102, 104, 106 separately.
[0043] The following examples will show processes for opening
different sockets (FIGS. 2 and 3), a process for processing
connections (incoming traffic not related to connection
requests--FIG. 4) and a process of closing a socket (FIG. 5).
[0044] FIG. 2 shows a flowchart of an application opening a
listening socket, e.g. a Web server starts up and listens for
incoming connections. This application can be performed on each
node 102, 104, 106 individually; however, only the first
application and node to request on the listing socket on the
designated port will succeed.
[0045] Block 200 is representing a step (Step 200) wherein the
application opens and binds a listening socket to a specified TCP
or UDP source port number. This is triggered through the socket
call "bind".
[0046] In step 210, the kernel of the operating system (OS kernel)
prepares to bind the listening socket to the specified TCP or UDP
source port number. The kernel sends a reservation request to the
shared socket database 108, reserving the specified TCP or UDP
source port number for this listening socket on this node.
[0047] Step 220 is a decision step, wherein the kernel checks
whether the reservation request is successful. This depends on
whether the TCP or UDP port number is in use already for this 1P
address on any node 102, 104, 106.
[0048] If the TCP or UDP source port reservation is successful
(path y: yes), the OS kernel continues to bind the socket to the
specified TCP or UDP source port number (step 230). After this, the
socket is bound and can be used for sending and receiving traffic.
The application has returned a value indicating success of the
operation. Otherwise, if the TCP or UDP source port reservation is
not successful (path n: no), the kernel returns an error message to
the application, indicating that the bind operation failed (step
240).
[0049] FIG. 3 shows a flowchart of an application connecting to a
peer (e.g. an application server connects to a database server).
This application can be performed on each node 102, 104, 106
individually.
[0050] Step 300 is a step wherein the application connects to a
peer (specified by a destination IP address and a TCP or UDP
destination port number). This is triggered through the socket
called "connect" and implies that the socket is locally bound to a
local TCP or UDP source port number.
[0051] In step 310, the OS kernel requests reservation of an
available TCP or UDP source port number for this socket from the
shared socket database. This checking and reserving is done
automatically on the shared socket database 108.
[0052] Step 320 is a decision step, wherein the kernel checks
whether the reservation request is successful. This depends on
whether an available TCP or UDP source port number has been found
or all ports are in use by other sockets.
[0053] If the TCP or UDP source port reservation was successful
(path y: yes), the kernel continues to auto-bind and to connect the
socket. This is done by associating the destination IP address and
TCP or UDP destination port number with this socket at the shared
socket database 108. After this, the socket can be used for sending
and receiving traffic. The application has returned a value
indicating success of the operation (step 330). If the TCP or UDP
source port reservation is not successful (path n: no), the kernel
returns an error message to the application, indicating that the
operation failed (step 340).
[0054] FIG. 4 shows a flowchart of the process that is executed if
a network packet is received by the nodes 102, 104, 106. This
process will be executed on all nodes 102, 104, 106 individually.
All of the nodes 102, 104, 106 receive the same packet as they
exhibit the same MAC address.
[0055] Step 400 is the starting step of this process, wherein a
network packet is received on the specific node this flow is
running on. In the following operation, step 410, the protocols of
the network packet are decoded and the IP source and destination
addresses and the TCP or UDP source and destination port numbers
are identified.
[0056] Step 420 is a decision step, wherein the shared socket
database 108 is queried, whether the socket (designated by source
and destination IP addresses and source and destination TCP or UDP
port numbers) is bound to the node 102, 104 or 106 this
flow/procedure is running on.
[0057] If the socket is bound to said node (path y: yes), the
packet is processed in step 430. If the socket is not bound to said
node 102, 104, 106 (path n: no), within the further decision step
440 it is determined whether the individual node 102, 104 or 106
this flow is running on has the master function (master role). Only
the node with the master function 102 takes care of administrative
traffic. In this case, it is responsible for appropriate handling
of the packet (e.g. rejecting the request, ICMP traffic, ARP
packets), which is done later in step 430. If the node 102, 104,
106 this flow is running on has the master function (path y: yes),
the flow will continue with step 450. If the node 102, 104, 106
this flow is running on has no master function (path n: no), the
packet is discarded. Another node 102, 104, 106 will process it
(this flow will run into step 430 on that node).
[0058] Step 450 is a decision step, wherein the shared socket
database 108 is queried, whether the socket (designated by source
and destination IP addresses and source and destination TCP or UDP
port numbers) is bound on any node 102, 104, 106. If the socket is
bound to another node 102, 104, 106 (path y: yes), the packet is
discarded and this other node 102, 104, 106 will process the packet
(this flow will run into step 430 on that other node).
[0059] Processing the packet in step 430 can mean that the packet
is passed to the matching socket (coming from step 420) and, thus,
the appropriate application, or, in case of a packet not belonging
to any socket on any node AND in case that the node this flow runs
on is the master node (coming from step 440 via step 450)
appropriate negative response.
[0060] If the application sends data, this is done without further
synchronization. The packet is simply sent by the node 102, 104,
106 that the application sending data runs on.
[0061] FIG. 5 shows a flowchart of the process that is executed if
the application closes a socket. The closing of a socket can, e.g.
be a shutdown of a Web server, or an application terminating a
connection to another network host.
[0062] Step 500 is the starting step of this process, wherein the
application closes the socket. This is triggered through the socket
call "close", which is passed to the kernel of the OS.
[0063] In the following step 510, the OS kernel closes the socket
and removes all data associated with it. The kernel also sends a
message to the shared socket database 108 to remove this
socket.
[0064] In the following step 520, the shared socket database 108
removes all data for that socket. The pair of TCP or UDP source and
destination ports can be used by new connections from this point
on.
[0065] The flowchart and block diagrams in the Figs. illustrate the
architecture, functionality and operation of possible
implementations of systems, methods and computer program products
according to various embodiments of the present invention. In this
regard, each block in the flowchart or block diagram may represent
a module, segment or portion of code that comprises one or more
executable instructions for implementing the specified logical
function(s). It should also be noted that, in some alternative
implementations, the functions noted in the block may occur out of
the order noted in the Figs. For example, two (2) blocks shown in
succession may, in fact, be executed substantially concurrently or
the blocks may sometimes be executed in the reverse order,
depending upon the functionality involved. It will also be noted
that each block of the block diagrams and/or flowchart
illustrations and combinations of blocks in the block diagrams
and/or flowchart illustrations can be implemented by special
purpose hardware-based systems that perform the specified functions
or acts or combinations of special purpose hardware and computer
instructions.
* * * * *