U.S. patent application number 13/671369 was filed with the patent office on 2014-04-17 for system and method for supporting guaranteed multi-point delivery in a distributed data grid.
This patent application is currently assigned to ORACLE INTERNATIONAL CORPORATION. The applicant listed for this patent is ORACLE INTERNATIONAL CORPORATION. Invention is credited to Gene Gleyzer, Robert H. Lee.
Application Number | 20140108532 13/671369 |
Document ID | / |
Family ID | 50476346 |
Filed Date | 2014-04-17 |
United States Patent
Application |
20140108532 |
Kind Code |
A1 |
Lee; Robert H. ; et
al. |
April 17, 2014 |
SYSTEM AND METHOD FOR SUPPORTING GUARANTEED MULTI-POINT DELIVERY IN
A DISTRIBUTED DATA GRID
Abstract
A system and method can support guaranteed multi-point message
delivery in a distributed data grid. A messaging facility in the
distributed data grid can receive an incoming message that is
adaptive to be delivered to a plurality of nodes in the distributed
data grid. The messaging facility can deliver the incoming message
to the plurality of nodes according to an order in a list.
Furthermore, a node in the plurality of nodes operates to skip a
next node in the list to deliver the incoming message, when the
next node is dead or unavailable.
Inventors: |
Lee; Robert H.; (San Carlos,
CA) ; Gleyzer; Gene; (Lexington, MA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
ORACLE INTERNATIONAL CORPORATION |
Redwood Shores |
CA |
US |
|
|
Assignee: |
ORACLE INTERNATIONAL
CORPORATION
Redwood Shores
CA
|
Family ID: |
50476346 |
Appl. No.: |
13/671369 |
Filed: |
November 7, 2012 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
61714100 |
Oct 15, 2012 |
|
|
|
Current U.S.
Class: |
709/204 |
Current CPC
Class: |
H04L 41/0668 20130101;
G06F 11/2048 20130101; H04L 67/1095 20130101; G06F 16/2365
20190101; G06F 11/16 20130101; G06F 11/2097 20130101; G06F 11/2041
20130101; G06F 16/10 20190101; G06F 2201/82 20130101; H04L 43/0811
20130101 |
Class at
Publication: |
709/204 |
International
Class: |
G06F 15/16 20060101
G06F015/16 |
Claims
1. A method for supporting consistent message delivery in a
distributed data grid operating on one or more microprocessors,
comprising: receiving an incoming message that is adaptive to be
delivered to a plurality of nodes in the distributed data grid;
delivering the incoming message to the plurality of nodes according
to an order in a list; and allowing a node in the plurality of
nodes to skip a next node in the list for delivering the incoming
message, when the next node is dead or unavailable.
2. The method according to claim 1, further comprising: allowing
the incoming message to include a request from a client; and
providing a response to the client after the incoming message is
delivered to the plurality of nodes in the distributed data
grid.
3. The method according to claim 1, further comprising: allowing
the node to deliver the incoming message to a node that is next to
the node that is dead or unavailable in the list.
4. The method according to claim 1, further comprising: allowing
each node in the plurality of nodes to keep track of the incoming
message as the incoming message is delivered accordingly to the
list.
5. The method according to claim 1, further comprising: creating a
chain request message that stores information about the list in an
internal data structure based on the incoming message.
6. The method according to claim 5, further comprising: allowing a
node in the distributed data grid to obtain the information about
the list from the internal data structure in the chain request
message.
7. The method according to claim 1, further comprising: allowing a
first node in the list to be a primary owner of a partition and
other nodes in the list of nodes to be backup nodes for the first
node.
8. The method according to claim 7, further comprising: propagating
an update of the partition from the primary owner of the partition
to the backup nodes.
9. The method according to claim 1, further comprising: performing
a full synchronization on the plurality of nodes in the distributed
data grid.
10. The method according to claim 1, further comprising:
configuring a new node in the distributed data grid to receive the
incoming message.
11. A system for supporting guaranteed multi-point delivery in a
distributed data grid, comprising: one or more microprocessors; a
messaging facility in the distributed data grid running on the one
or more microprocessors, wherein the messaging facility operates to
perform the steps of receiving an incoming message that is adaptive
to be delivered to a plurality of nodes in the distributed data
grid; delivering the incoming message to the plurality of nodes
according to an order in a list; and allowing a node in the
plurality of nodes to skip a next node in the list for delivering
the incoming message, when the next node is dead or
unavailable.
12. The system according to claim 11, wherein: the incoming message
includes a request from a client, and wherein a response is
provided to the client after the incoming message is delivered to
the plurality of nodes in the distributed data grid.
13. The system according to claim 11, wherein: the node operates to
deliver the incoming message to a node that is next to the node
that is dead or unavailable in the list.
14. The system according to claim 11, wherein: each node in the
plurality of nodes operates to keep track of the incoming message
as it is delivered accordingly to the list.
15. The system according to claim 11, wherein: a first node in the
plurality of nodes operates to create a chain request message that
stores information about the list in an internal data structure
based on the incoming message
16. The system according to claim 15, wherein: a second node in the
distributed data grid operates to obtain the information about the
list from the internal data structure in the chain request
message.
17. The system according to claim 11, wherein: a first node in the
list operates to be a primary owner of a partition and other nodes
in the list of nodes operates to be backup nodes for the first
node.
18. The system according to claim 17, wherein: an update of the
partition is propagated from the primary owner of the partition to
the backup nodes.
19. The system according to claim 11, wherein: a new node in the
distributed data grid is confiured to receive the incoming
message.
20. A non-transitory machine readable storage medium having
instructions stored thereon that when executed cause a system to
perform the steps of: receiving an incoming message that is
adaptive to be delivered to a plurality of nodes in the distributed
data grid; delivering the incoming message to the plurality of
nodes according to an order in a list; and allowing a node in the
plurality of nodes to skip a next node in the list for delivering
the incoming message, when the next node is dead or unavailable.
Description
CLAIM OF PRIORITY
[0001] This application claims priority on U.S. Provisional Patent
Application No. 61/714,100, entitled "SYSTEM AND METHOD FOR
SUPPORTING A DISTRIBUTED DATA GRID IN A MIDDLEWARE ENVIRONMENT," by
inventors Robert H. Lee, Gene Gleyzer, Charlie Helin, Mark Falco,
Ballav Bihani and Jason Howes, filed Oct. 15, 2012, which
application is herein incorporated by reference.
COPYRIGHT NOTICE
[0002] A portion of the disclosure of this patent document contains
material which is subject to copyright protection. The copyright
owner has no objection to the facsimile reproduction by anyone of
the patent document or the patent disclosure, as it appears in the
Patent and Trademark Office patent file or records, but otherwise
reserves all copyright rights whatsoever.
CROSS-REFERENCED APPLICATIONS
[0003] The current application hereby incorporates by reference the
material in the following patent applications:
[0004] U.S. patent application Ser. No. ______, entitled "SYSTEM
AND METHOD FOR PROVIDING PARTITION PERSISTENT STATE CONSISTENCY IN
A DISTRIBUTED DATA GRID," by inventors Robert H. Lee and Gene
Gleyzer, filed ______ (Attorney Docket No.: ORACL-05359US0).
[0005] U.S. patent application Ser. No. ______, entitled "SYSTEM
AND METHOD FOR PROVIDING TRANSIENT PARTITION CONSISTENCY IN A
DISTRIBUTED DATA GRID," by inventors Robert H. Lee and Gene
Gleyzer, filed ______(Attorney Docket No.: ORACL-05359US1).
[0006] U.S. patent application Ser. No. ______, entitled "SYSTEM
AND METHOD FOR SUPPORTING ASYNCHRONOUS MESSAGE PROCESSING IN A
DISTRIBUTED DATA GRID," by inventor Gene Gleyzer, filed ______
(Attorney Docket No.: ORACL-05360US0).
[0007] U.S. patent application Ser. No. ______, entitled "SYSTEM
AND METHOD FOR SUPPORTING OUT-OF-ORDER MESSAGE PROCESSING IN A
DISTRIBUTED DATA GRID," by inventors Mark Falco and Gene Gleyzer,
filed ______ (Attorney Docket No.: ORACL-05364US0).
[0008] 1. Field of Invention
[0009] The present invention is generally related to computer
systems, and is particularly related to a distributed data
grid.
[0010] 2. Background
[0011] Modern computing systems, particularly those employed by
larger organizations and enterprises, continue to increase in size
and complexity. Particularly, in areas such as Internet
applications, there is an expectation that millions of users should
be able to simultaneously access that application, which
effectively leads to an exponential increase in the amount of
content generated and consumed by users, and transactions involving
that content. Such activity also results in a corresponding
increase in the number of transaction calls to databases and
metadata stores, which have a limited capacity to accommodate that
demand.
[0012] This is the general area that embodiments of the invention
are intended to address.
SUMMARY
[0013] Described herein are systems and methods that can support
guaranteed multi-point message delivery in a distributed data grid.
A messaging facility in the distributed data grid can receive an
incoming message that is adaptive to be delivered to a plurality of
nodes in the distributed data grid. The messaging facility can
deliver the incoming message to the plurality of nodes according to
an order in a list. Furthermore, a node in the plurality of nodes
operates to skip a next node in the list to deliver the incoming
message, when the next node is dead or unavailable.
BRIEF DESCRIPTION OF THE FIGURES
[0014] FIG. 1 is an illustration of a data grid cluster in
accordance with various embodiments of the invention.
[0015] FIG. 2 is an illustration of supporting guaranteed
multi-point message delivery in a distributed data grid in
accordance with various embodiments of the invention.
[0016] FIG. 3 is an illustration of creating a chain request
message in a distributed data grid in accordance with various
embodiments of the invention.
[0017] FIG. 4 is an illustration of handling a chain request
message in a distributed data grid in accordance with various
embodiments of the invention.
[0018] FIG. 5 is an illustration of supporting consistent message
delivery when a primary owner node of a partition becomes
unavailable in a distributed data grid in accordance with various
embodiments of the invention.
[0019] FIG. 6 illustrates an exemplary flow chart for supporting
guaranteed multi-point message delivery in a distributed data grid
in accordance with an embodiment of the invention.
DETAILED DESCRIPTION
[0020] Described herein are systems and methods that can support
guaranteed multi-point message delivery in a distributed data
grid.
[0021] In accordance with an embodiment, as referred to herein a
"distributed data grid", "data grid cluster", or "data grid", is a
system comprising a plurality of computer servers which work
together to manage information and related operations, such as
computations, within a distributed or clustered environment. The
data grid cluster can be used to manage application objects and
data that are shared across the servers. Preferably, a data grid
cluster should have low response time, high throughput, predictable
scalability, continuous availability and information reliability.
As a result of these capabilities, data grid clusters are well
suited for use in computational intensive, stateful middle-tier
applications. Some examples of data grid clusters, e.g., the Oracle
Coherence data grid cluster, can store the information in-memory to
achieve higher performance, and can employ redundancy in keeping
copies of that information synchronized across multiple servers,
thus ensuring resiliency of the system and the availability of the
data in the event of server failure. For example, Coherence
provides replicated and distributed (partitioned) data management
and caching services on top of a reliable, highly scalable
peer-to-peer clustering protocol.
[0022] An in-memory data grid can provide the data storage and
management capabilities by distributing data over a number of
servers working together. The data grid can be middleware that runs
in the same tier as an application server or within an application
server. It can provide management and processing of data and can
also push the processing to where the data is located in the grid.
In addition, the in-memory data grid can eliminate single points of
failure by automatically and transparently failing over and
redistributing its clustered data management services when a server
becomes inoperative or is disconnected from the network. When a new
server is added, or when a failed server is restarted, it can
automatically join the cluster and services can be failed back over
to it, transparently redistributing the cluster load. The data grid
can also include network-level fault tolerance features and
transparent soft re-start capability.
[0023] In accordance with an embodiment, the functionality of a
data grid cluster is based on using different cluster services. The
cluster services can include root cluster services, partitioned
cache services, and proxy services. Within the data grid cluster,
each cluster node can participate in a number of cluster services,
both in terms of providing and consuming the cluster services. Each
cluster service has a service name that uniquely identifies the
service within the data grid cluster, and a service type, which
defines what the cluster service can do. Other than the root
cluster service running on each cluster node in the data grid
cluster, there may be multiple named instances of each service
type. The services can be either configured by the user, or
provided by the data grid cluster as a default set of services.
[0024] FIG. 1 is an illustration of a data grid cluster in
accordance with various embodiments of the invention. As shown in
FIG. 1, a data grid cluster 100, e.g. an Oracle Coherence data
grid, includes a plurality of cluster nodes 101-106 having various
cluster services 111-116 running thereon. Additionally, a cache
configuration file 110 can be used to configure the data grid
cluster 100.
Guaranteed Multi-Point Message Delivery
[0025] In accordance with various embodiments of the invention, a
distributed data grid can support guaranteed multi-point message
delivery, which can provide consistency in delivering messages
among different cluster nodes in the distributed data grid.
[0026] FIG. 2 is an illustration of supporting consistent message
delivery in a distributed data grid in accordance with various
embodiments of the invention. As shown in FIG. 2, a distributed
data grid 200 can include a plurality of cluster nodes, such as
node A-D 201-204.
[0027] A cluster node, e.g. node A 201, can be either the
originator of an internal message, or a recipient of an internal
message. Additionally, the cluster node A 201 can also be a
recipient of an incoming message from a client 210. The cluster
node A 201 can use a message facility 211 to configure and deliver
a message to different cluster nodes in the distributed data grid
200, such as nodes B-D 202-204.
[0028] The cluster node A 201 can deliver the message to the
recipient nodes B-D 202-204 in a particular order, e.g. based on a
list from node B 202 to node C 203 then to node D 204. Here, the
message facility 211 on the cluster node A 201 can be used to
assign the order to the recipient nodes B-D 202-204 in the
distributed data grid 200.
[0029] Furthermore, each recipient node B-D 202-204 in the list can
keep track of how the message is to be delivered down the list. For
example, a recipient node, e.g. node B 202, can detect that node C
203, which is the next node in the list, is dead or temporarily
unavailable. Then, the node B 202 can skip the node C 203, and
deliver the message directly to node D 204, which is the node next
to the node C 203 on the list.
[0030] As shown in FIG. 2, the cluster node A 201 can receive an
incoming message from a client 210. The incoming message can
include a request from the client 210, and the client 210 may
expect a response from the distributed data grid 200. Then, a
response message can be sent back to the client in the reverse
direction, e.g. from node D 204 to node C 203 then to node B 202
before reaching node A 201. Finally, the distributed data grid 200
can provide the response message to the client 210, after the
incoming message is delivered to the recipient nodes B-D 202-204
and processed in the distributed data grid 200.
[0031] In accordance with various embodiments of the invention, the
guaranteed multi-point message delivery feature can be used for
managing partition backups. For example, a cluster node, e.g. the
node A 201, can be the owner of a partition, while nodes B-D are
backup nodes for the node A 201. Here, the partition can define
that the value of a property x is equal to 1 ("x=1"), which can be
maintained on both the primary owner node A 201 and each backup
node B-D 202-204.
[0032] Then, the node A 201 can receive a message from the client
210, which changes the value of the property x to 2 ("x=2"). Thus,
the cluster node A 201 may propagate the message, "x=2", to each
backup node B-D 202-204.
[0033] As shown in FIG. 2, the cluster node A 201 can deliver the
message, "x=2", to the recipient nodes B-D 202-204 in order.
Furthermore, when the node B 202 detects that the node C 203 is
dead or unavailable, the node B 202 can deliver the message, "x=2",
to the node D 204 directly, while skipping the node C 203. Thus,
the distributed data grid 200 can maintain a consistent view that
the value of x equals to 2.
[0034] On the other hand, an alternative approach is that the
cluster node A 201 can deliver the message to each recipient node
B-D 202-204 separately, or in parallel (as shown in dotted line in
FIG. 2).
[0035] Unlike the guaranteed multi-point message delivery feature
as described in the above, this alternative approach can be
problematic in the scenario when a cluster node, e.g. node C 203 is
dead or become unavailable. Since the delivery of the message,
"x=2", to the cluster node C 203 may not go through, the value of x
on node C may remain to be 1 without notice.
[0036] Thus, using the alternative approach, the distributed data
grid 200 may not be able to maintain a consistent view that the
value of x equals to 2. This alternative approach can cause
inconsistency, in terms of determining the value of x at a later
time point, among the different cluster nodes A-D 201-204 in the
distributed data grid 200.
[0037] Furthermore, such inconsistency may not be resolved until a
full synchronization is performed in the distributed data grid 200.
The full synchronization in the distributed data grid 200 can be
costly, since it may require the distributed data grid 200 to stop
providing services.
[0038] Additionally, the guaranteed multi-point message delivery
feature can be used complimentarily with a partition versioning
feature supported in the distributed data grid 200. For example,
when a new node E 205 is added into the distributed data grid 200,
the distributed data grid 200 can bring the state of the newly
added node E 205 current, based on the partition versioning
feature, so that the node E 205 can start receive new messages
based on the guaranteed multi-point message delivery feature, as
described above.
[0039] Additional descriptions of various embodiments of using
partition versioning feature in a distributed data grid 200 are
provided in U.S. patent application Ser. No. ______, entitled
"SYSTEM AND METHOD FOR PROVIDING PARTITION PERSISTENT STATE
CONSISTENCY IN A DISTRIBUTED DATA GRID", filed ______, which
application is herein incorporated by reference.
[0040] Furthermore, the guaranteed multi-point message delivery
feature can be used complimentarily with a poll model that is
supported in the distributed data grid 200 for processing incoming
messages asynchronizingly.
[0041] Additional descriptions of various embodiments of supporting
asynchronized message processing in a distributed data grid 200 are
provided in U.S. patent application Ser. No. ______, entitled
"SYSTEM AND METHOD FOR SUPPORTING ASYNCHRONIZED MESSAGE PROCESSING
IN A DISTRIBUTED DATA GRID", filed ______, which application is
herein incorporated by reference.
[0042] FIG. 3 is an illustration of creating a chain request
message in a distributed data grid in accordance with various
embodiments of the invention. As shown in FIG. 3, a cluster node A
301 in the distributed data grid 300 can create a chain request
message 320, e.g. using a messaging facility 311 on the cluster
node A 301.
[0043] The chain request message 320 can be either initiated by the
cluster node A 301, or be created based on an incoming message 310
received by the cluster node A 301. The chain request message 320
can include an internal data structure 321 that stores the
information about a list of recipient nodes that the chain request
message 320 will be delivered to.
[0044] As shown in FIG. 3, the cluster node A 301 can deliver the
chain request message 320 to the different cluster nodes in the
distributed data grid 300, e.g. nodes B-C 302-203, for further
processing.
[0045] FIG. 4 is an illustration of handling a chain request
message in a distributed data grid in accordance with various
embodiments of the invention. As shown in FIG. 4, a cluster node,
e.g. node B 402, in the distributed data grid 400 can receive a
chain request message 420 that contains a list of recipient nodes
in an internal data structure 421.
[0046] A messaging facility 412 in the cluster node B 402 can track
the delivery of the chain request message 420 to the rest of the
recipient nodes in the list 421. For example, when the cluster node
B 402 detects that node C 403 is dead or is unavailable, the
cluster node B 402 can access the internal data structure 421 for
the list of recipient nodes and find out that node D 204 is the
next node following node C 203. Therefore, the cluster node B 402
can deliver the chain request message 420 to node D 404
accordingly.
[0047] FIG. 5 is an illustration of supporting consistent message
delivery when a primary owner node of a partition becomes
unavailable in a distributed data grid in accordance with various
embodiments of the invention. As shown in FIG. 5, a distributed
data grid 500 can include a plurality of cluster nodes, such as
node A-D 501-504.
[0048] Initially, a partition can define that the value of a
property x is equal to 1 ("x=1"). The partition can be maintained
on the primary owner node A 501 and each backup node B-D 502-504.
Then, the node A 501 can receive a message from the client 510,
which changes the value of the property x to 2 ("x=2"). Thus, the
messaging facility 511 in the cluster node A 501 can propagate the
message, "x=2", to each backup node B-D 202-204.
[0049] As shown in FIG. 5, when the client 510 detects that the
primary owner node A 501 is dead or unavailable, the message,
"x=2", can be either having already been delivered to the cluster
node B 502, or undelivered.
[0050] When the message, "x=2", has already been delivered to the
cluster node B 502, the distributed data grid 500 can guarantee
that the message is delivered to the rest of nodes C-D 503-504, and
can ensure a consistent view that the value of x is equal to 2. On
the other hand when the message, "x=2", is undelivered before the
primary owner node A 501 becomes unavailable, the distributed data
grid 500 maintains the consistent view that the value of x is equal
to 1. Thus, the distributed data grid 500 can ensure a consistent
view of the value of x client 510 in either case.
[0051] Then, the distributed data grid 500 can provide a new
primary owner node, e.g. cluster node E 505, which can continue
maintaining the partition and handle subsequent incoming messages
from the client 510.
[0052] Alternatively, the cluster node A 501 can deliver the
message, "x=2", to each recipient node B-D 502-504 separately, or
in parallel (as shown in dotted line in FIG. 5). Unlike the
guaranteed multi-point message delivery feature as described in the
above, this alternative approach can be problematic in the scenario
when the primary owner node A 501 is dead or become unavailable.
Since the delivery of the message, "x=2", to the different cluster
nodes B-D 502-504 may not go through, the value of x on the cluster
nodes B-D 502-504 is not guaranteed to be consistent.
[0053] FIG. 6 illustrates an exemplary flow chart for supporting
guaranteed multi-point message delivery in a distributed data grid
in accordance with an embodiment of the invention. As shown in FIG.
6, at step 601, a messaging facility on a cluster node in the
distributed data grid can receive an incoming message that is
adaptive to be delivered to a plurality of nodes in the distributed
data grid. Then, at step 602, the messaging facility can be
configured to deliver the incoming message to the plurality of
nodes according to an order in a list. Furthermore, at step 603, a
node in the plurality of nodes can skip a next node in the list for
delivering the incoming message, when the next node is dead or
unavailable.
[0054] The present invention may be conveniently implemented using
one or more conventional general purpose or specialized digital
computer, computing device, machine, or microprocessor, including
one or more processors, memory and/or computer readable storage
media programmed according to the teachings of the present
disclosure. Appropriate software coding can readily be prepared by
skilled programmers based on the teachings of the present
disclosure, as will be apparent to those skilled in the software
art.
[0055] In some embodiments, the present invention includes a
computer program product which is a storage medium or computer
readable medium (media) having instructions stored thereon/in which
can be used to program a computer to perform any of the processes
of the present invention. The storage medium can include, but is
not limited to, any type of disk including floppy disks, optical
discs, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs,
RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic
or optical cards, nanosystems (including molecular memory ICs), or
any type of media or device suitable for storing instructions
and/or data.
[0056] The foregoing description of the present invention has been
provided for the purposes of illustration and description. It is
not intended to be exhaustive or to limit the invention to the
precise forms disclosed. Many modifications and variations will be
apparent to the practitioner skilled in the art. The embodiments
were chosen and described in order to best explain the principles
of the invention and its practical application, thereby enabling
others skilled in the art to understand the invention for various
embodiments and with various modifications that are suited to the
particular use contemplated. It is intended that the scope of the
invention be defined by the following claims and their
equivalence.
* * * * *